Basic Statistics Flashcards
Mode (Moda)
The value that occurs most frequently in a given data set.
Interquartile Range (IQR)
Rozstęp ćwiartkowy
Q3-Q1
Standard Deviation (SD)
Odchylenie standardowe: sqrt(variance)
Variance
Population: (mean - xi)^2 / N where xi is each element of set
Sample: Use n - 1 instead
How to describe a histogram
4 Main Aspects:
- Shape - Overall appearance of histogram. Can be symmetric, bell-shaped, left skewed, right skewed, etc…
- Center - Mean or Median
- Spread - How far our data spreads. Range, Interquartile Range (IQR), standard deviation, variance.
- Outliers - Data points that fall far from the bulk of the data.
Study design and types of study
Encompasses everything in preparation for data-driven research process.
Types:
- Confirmatory: Specify falsifiable hypothesis, then test it.
- Exploratory: Collect and analyze data without first pre-specifying question.
- Comparative: contrast one quantity to another.
Dependent (example when) vs. Independent Data
- Dependent data observations correlated due to feature of study design (cluster sampling or longitudinal measurement).
- Independent data observations completely independent of each other may/may not arise from common distribution.
i.i.d.
i = independent
id = identically distributed
Simple Random Samples (SRS)
Each sampling unit of a population has an equal chance of being included in the sample.
Longitudinal Data
Repeated measures of same variable, collected from same unit over time → likely correlated.
Repeated Measures Data: Wide and Long
Wide format: one row per subject, each measure in separate column.
Long format: one row per measurement.
Quantitative Variables types
- Continuous - could take on any value within an interval, many possible values.
- Discrete - countable value, finite number of values.
Categorical (or Qualitative) Variables
- Ordinal - groups have an order or ranking.
- Nominal - groups are merely names, no ranking.
Conducting a Population Census
Gather data from the whole population.
Probability Sampling
Probability sampling refers to the selection of a sample from a population, when this selection is based on the principle of randomization, that is, random selection or chance.
Probability of selection for each unit is known.
Types: SRS, Complex (anything beside SRS - cluster, stratification, etc…)