Basic Statistics Flashcards

1
Q

Mode (Moda)

A

The value that occurs most frequently in a given data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Interquartile Range (IQR)

A

Rozstęp ćwiartkowy
Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Standard Deviation (SD)

A

Odchylenie standardowe: sqrt(variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Variance

A

Population: (mean - xi)^2 / N where xi is each element of set

Sample: Use n - 1 instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to describe a histogram

A

4 Main Aspects:

  • Shape - Overall appearance of histogram. Can be symmetric, bell-shaped, left skewed, right skewed, etc…
  • Center - Mean or Median
  • Spread - How far our data spreads. Range, Interquartile Range (IQR), standard deviation, variance.
  • Outliers - Data points that fall far from the bulk of the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Study design and types of study

A

Encompasses everything in preparation for data-driven research process.

Types:

  • Confirmatory: Specify falsifiable hypothesis, then test it.
  • Exploratory: Collect and analyze data without first pre-specifying question.
  • Comparative: contrast one quantity to another.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dependent (example when) vs. Independent Data

A
  • Dependent data observations correlated due to feature of study design (cluster sampling or longitudinal measurement).
  • Independent data observations completely independent of each other may/may not arise from common distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

i.i.d.

A

i = independent

id = identically distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Simple Random Samples (SRS)

A

Each sampling unit of a population has an equal chance of being included in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Longitudinal Data

A

Repeated measures of same variable, collected from same unit over time → likely correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Repeated Measures Data: Wide and Long

A

Wide format: one row per subject, each measure in separate column.

Long format: one row per measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Quantitative Variables types

A
  • Continuous - could take on any value within an interval, many possible values.
  • Discrete - countable value, finite number of values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Categorical (or Qualitative) Variables

A
  • Ordinal - groups have an order or ranking.
  • Nominal - groups are merely names, no ranking.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Conducting a Population Census

A

Gather data from the whole population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Probability Sampling

A

Probability sampling refers to the selection of a sample from a population, when this selection is based on the principle of randomization, that is, random selection or chance.

Probability of selection for each unit is known.

Types: SRS, Complex (anything beside SRS - cluster, stratification, etc…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Stratification

A

Population divided into different strata, and part of sample is allocated to each stratum; → ensures sample representation from each stratum, and reduces variance of survey estimates.

17
Q

Clustering

A

Clusters of population units (e.g., counties) are randomly sampled first (with known probability) within strata, to save costs of data collection (collect data from cases close to each other geographically)

18
Q

Non-Probability Sampling

A
  • Probabilities of selection can’t be determined for sampled units,
  • Often cheap
  • Examples: opt-in web surveys, volunteers
  • Strong risk of sampling bias
19
Q

Pseudo-Randomization

A

Combine non-probability sample with a probability sample, Estimate probability of being included in non-probability sample as a function of auxiliary information available in both samples,

20
Q

Non-Probability Sampling Calibration

A

Compute weights for responding units in non-probability sample that allow weighted sampled to mirror a known population.

Example: If we got more responses from females than males (but population is 50/50), then down-weight females and up-weight males.