Quiz 1 Flashcards

1
Q

Structured vs. Unstructured data

A

structured = quantitative/properties that vary in type of attribute/variable

unstructured = qualitative (cannot be categorized), properties that differ in amount

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CCHS

A

Canadian community health survey

cross-sectional survey for health surveillance, health care utilization and health determinants

goal: to provide single source data for health researchers

every 2 years, 2x24hr food recalls, largest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

CHMS

A

Canadian Health Measure Survey

questionnaire data (household interview) & physical measurements for baseline NCD and exposure to infectious diseases and environmental contaminants, and biobank every 2 years (for future research)

Exclusions include military, kids <12, on reserve or institutionalized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NHANES

A

National Health and Nutrition Examination Survey

Assess health and nutrition status of adults and children

questionnaire to determine prevalence of major disease and risk factors

physical measurements 2x24hr recall + biomarkers, every year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Syntax

A

Syntax = coding language used to perform data analysis operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Variable

A

Variable = factor or attribute which can be assigned 2 or more values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Discrete vs Continuous variables

A

Discrete variables = have no intermediate value

Continuous variables = intermediate values between adjacent scale values can exist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Numeric vs string variables

A

Numeric = number based

String = character based (can include numbers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Variable types:

A

Continuous, categorical, ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Continuous variables

A

AKA scale

Cannot be string variables, must be numeric

Ratio (true zero with even separation scale) or interval (arbitrary zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Categorical

A

AKA nominal

Can be string or numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ordinal

A

Categorial with implicit order

Can be string or numeric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Central limit theorem

A

central limit theorem = random average variables become normally distributed when observations are sufficiently large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Gaussian curve

A

Standard normal curve
Mean = median = mode = 0

Std dev =1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Parametric vs non parametric tests

A

Parametric tests - assume normal distribution of data and easier to interpret

Nonparametric tests - no normal distribution assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Distribution definition:

Distribution types:

A

Definition: function showing all possible data values and how often they occur

Positively skewed - tail extends to the right (right skewed)

Negatively skewed - tail extends to the left (left skewed)

Bimodal - 2 peaks, can maybe be split into 2 sets or there is an underlying factor

Uniform - same value for all variables

17
Q

Descriptive statistics

A

Descriptive statistics = a set of methods used to summarize and describe the main features of a dataset, such as its central tendency, variability, and distribution.

can be used to calculate mode (not mean or median) - calculate proportions

18
Q

Central tendencies

A

Mode = most frequent value (peak)

Mean = average value of all data values
easily skewed by outliers and asymmetrical data

Median = literal measure of central tendency, central datum
best for asymmetrical data
- central number or 2 numbers averaged

19
Q

Population vs sample mean

A

Population = µ = xi/N

Sample = x̄ = xi/n

20
Q

Variability and variance

A

Variability = dispersion, spread or scatter

Variance = average of squared differences from the mean
Population: σ^2 = Σ(x -µ)^2 /N
Sample: s^2 = Σ(x - x̄)^2 /(n-1)

21
Q

Standard deviation

A

Standard deviation = square root of variance (degree individual values vary from mean)
√σ^2 or √s^2

22
Q

Bar chart vs histogram

A

Bar chart: summary statistics for continuous variables categorized
Can add clusters on x axis
Compares means/medians across groups

Histogram: distribution for continuous data
Frequency on y-axis
Good for evaluating the distribution shape of data
Can stack variables on x-axis

23
Q

Boxplot vs scatterplot

A

Boxplot: min/max, quartiles and interquartile range (IQR)
Can also display outliers
Box is IQR, quartiles are outer edges of IQR, median is line through IQR, whiskers are min/max

Scatterplot: displays values for 2 variables (independent x-axis and dependent y-axis)
Helpful for correlations and linear relationships
Can have more than one independent continuous variable using colors
Line drawn through to show correlation or regression

24
Q

Standard error of the mean

A

Standard error of the mean = estimate of how far sample mean (x̄) is from population (µ) mean
σ = s/(√n) standard deviation of sample/root n

Measures accuracy of sample reflecting population - deviation of sample mean from population mean
Variable data requires large sample to be accurate → large SEM = inaccurate estimate