Quiz 1 Flashcards

Question 1

Q

Structured vs. Unstructured data

Answer

A

structured = quantitative/properties that vary in type of attribute/variable

unstructured = qualitative (cannot be categorized), properties that differ in amount

Question 2

Q

CCHS

Answer

A

Canadian community health survey

cross-sectional survey for health surveillance, health care utilization and health determinants

goal: support health surveillance programs at all levels, to provide single source data for health researchers, timely release of information easily available, flexible survey instrument with rapid response option

every 2 years, self reported 2x24hr food recalls, largest (n = 65000)

Question 3

Q

CHMS

Answer

A

Canadian Health Measure Survey

questionnaire data (household interview) & physical measurements for baseline NCD and exposure to infectious diseases and environmental contaminants, and biobank every 2 years (for future research)

Exclusions include military, kids <12, on reserve or institutionalized

Self reported FFQ and biomarkers

n = 5000, every 2 years

Question 4

Q

NHANES

Answer

A

National Health and Nutrition Examination Survey

Assess health and nutrition status of adults and children

questionnaire to determine prevalence of major disease and risk factors

physical measurements 2x24hr recall + biomarkers, every year

Question 5

Q

Syntax

Answer

A

Syntax = coding language used to perform data analysis operations

Question 6

Q

Variable

Answer

A

Variable = factor or attribute which can be assigned 2 or more values

Question 7

Q

Discrete vs Continuous variables

Answer

A

Discrete variables = have no intermediate value

Continuous variables = intermediate values between adjacent scale values can exist

Question 8

Q

Numeric vs string variables

Answer

A

Numeric = number based

String = character based (can include numbers)

Question 9

Q

Variable types:

Answer

A

Continuous, categorical, ordinal

Question 10

Q

Continuous variables

Answer

A

AKA scale

Cannot be string variables, must be numeric

Ratio (true zero with even separation scale) or interval (arbitrary zero)

Question 11

Q

Categorical

Answer

A

AKA nominal

Can be string or numeric

Question 12

Q

Ordinal

Answer

A

Categorial with implicit order

Can be string or numeric

Question 13

Q

Central limit theorem

Answer

A

central limit theorem = random average variables become normally distributed when observations are sufficiently large

Question 14

Q

Gaussian curve

Answer

A

Standard normal curve
Mean = median = mode = 0

Std dev =1

Question 15

Q

Parametric vs non parametric tests

Answer

A

Parametric tests - assume normal distribution of data and easier to interpret

Nonparametric tests - no normal distribution assumption

Question 16

Q

Distribution definition:

Distribution types:

Answer

A

Definition: function showing all possible data values and how often they occur

Positively skewed - tail extends to the right (right skewed)

Negatively skewed - tail extends to the left (left skewed)

Bimodal - 2 peaks, can maybe be split into 2 sets or there is an underlying factor

Uniform - same value for all variables

Question 17

Q

Descriptive statistics

Answer

A

Descriptive statistics = a set of methods used to summarize and describe the main features of a dataset, such as its central tendency, variability, and distribution.

can be used to calculate mode (not mean or median) - calculate proportions

Question 18

Q

Central tendencies

Answer

A

Mode = most frequent value (peak)

Mean = average value of all data values
easily skewed by outliers and asymmetrical data

Median = literal measure of central tendency, central datum
best for asymmetrical data
- central number or 2 numbers averaged

Question 19

Q

Population vs sample mean

Answer

A

Population = µ = xi/N

Sample = x̄ = xi/n

Question 20

Q

Variability and variance

Answer

A

Variability = dispersion, spread or scatter

Variance = average of squared differences from the mean
Population: σ^2 = Σ(x -µ)^2 /N
Sample: s^2 = Σ(x - x̄)^2 /(n-1)

Question 21

Q

Standard deviation

Answer

A

Standard deviation = square root of variance (degree individual values vary from mean)
√σ^2 or √s^2

Question 22

Q

Bar chart vs histogram

Answer

A

Bar chart: summary statistics for continuous variables categorized
Can add clusters on x axis
Compares means/medians across groups

Histogram: distribution for continuous data
Frequency on y-axis
Good for evaluating the distribution shape of data
Can stack variables on x-axis

Question 23

Q

Boxplot vs scatterplot

Answer

A

Boxplot: min/max, quartiles and interquartile range (IQR)
Can also display outliers
Box is IQR, quartiles are outer edges of IQR, median is line through IQR, whiskers are min/max

Scatterplot: displays values for 2 variables (independent x-axis and dependent y-axis)
Helpful for correlations and linear relationships
Can have more than one independent continuous variable using colors
Line drawn through to show correlation or regression

Question 24

Q

Standard error of the mean

Answer

A

Standard error of the mean = estimate of how far sample mean (x̄) is from population (µ) mean
σ = s/(√n) standard deviation of sample/root n

Measures accuracy of sample reflecting population - deviation of sample mean from population mean
Variable data requires large sample to be accurate → large SEM = inaccurate estimate

Brainscape's Knowledge GenomeTM

Quiz 1 Flashcards

Brainscape's Knowledge Genome^TM