Unit 1: Exploring One-Variable Data Flashcards
Categorical Variable/Qualitative Data
A variable that takes on values that are category names or group labels. (Think: WORDS)
(EX: Dominant hand, name, college degree)
Quantitative Variable/Data
A variable that takes on numerical values for a measured or counted quantity. (Think: NUMBERS)
(EX: Age, height, count)
Can be discrete or continuous
Frequency table
gives the number of cases in each category
Relative frequency table
gives the proportion of cases in each category(percentage)
[Note: percentage, relative frequency, and rates provide the same information as proportions]
Bar Chart/Graph
A graph to display counts or proportions for a categorical variable only
Pie Chart
A chart to display proportions
Discrete Quantitative Variable
A variable that can take on a countable(finite or countably infinite) number of values.
Continuous Quantitative Variable
A variable that can take on infinitely many values, but those values cannot be counted.
Dot Plots
Best for discrete variables
Steam and Leaf plots
Stem: the number(s) on the left of the plot and the number (EX: stem of 34 is 3)
Leaf: the number on the right of the plot and the number (EX: leaf of 34 is 3)
Histogram
[NOT a bar graph]
gives a discrete Y-value but a continuous X-axis due to the bars connecting.
Population
The collection of all individuals or items under consideration in a statistical study
Sample
part of a population
Inferential statistics
Drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population
Skewed Right
More data on the right with a left tail
Skewed Left
More data on the left with a right tail
Symmetric Data
A distribution that is symmetric (peak in middle(unimodal) or peaks on each side(bimodal))
uniform data
The data is all the same
census
Information for the entire population of interest.
Sampling
How to obtain an appropriate subset of people/items from the population. There are 2 types.
[SRSWR] Simple random sampling with replacement
Where a member of the population
can be selected more than once
[SRS] Simple random sampling without replacement
Where a member of the population can be selected at most once.
Statistic vs Parameter
Statistic: value from sample
Parameter: value from population
Systematic Random Sampling
Elements from a larger population are selected at regular intervals after choosing a random starting point.
Cluster Sampling
A population is divided into groups/clusters, and entire clusters are randomly selected for study. Often used when a population is too large or widely spread out for SRS.
Stratified Sampling
A population is divided into subgroups (strata) based on a shared characteristic, and then a random sample is taken from each stratum. Focus on the representation of groups.
Mean X̅
Average of a dataset
Affected by outliers
Will move towards the tail in a skewed graph
Median
Middle value of an ordered dataset
AKA 2nd Quartile
Not affected by outliers
[Formula: (n+1)/2]
Percentile
Percent of data values less than or equal to a certain value
EX: The p-th percentile means that p% of the data falls below that value.
Standardized Score / Z-Score
A measure of how many standard deviations a data point is from the dataset of any shape(not exclusive to normal distribution)
Normal Distribution
Bell-curved
Middle is the μ
σ standard deviations
Emperical rule(68%-95%-99.7% rule)
How to read a Z-score table
Y-Axis: Ones and tenths place of z-score
X-Axis: hundredths place of z-score
The result shows the percentage area of the distribution.
Skewed Right/Positive Shape Properties
Right tail
Highest bar on the left
Mean > Median
Skewed Left/Negative Shape Properties
Left tail
Highest bar on the right
Mean < Median
Symmetrical Shape Properties
Highest bar in the middle
Mean ≈ Median ≈ Mode
Mean formula
x̄ = Σxᵢ / n
Median formula
Middle of data
Range formula
Max - Min
IQR formula
IQR = Q₃ − Q₁
Outlier rule
Outlier IF:
x < Q₁ − 1.5×IQR
OR
x > Q₃ + 1.5×IQR
Standard deviation (sample) formula
s: sample standard deviation
xᵢ: any possible outcomes
x̄: Sample mean
s = √[ (Σ(xᵢ − x̄)²) / (n − 1)]
Z-score formula
x: random variable
z = (x - μ) / σ
Chebyshev’s Theorem
Empirical Rule
A rough estimate of the standard deviation
s ≈ range / 4
SOCS
Shape, Outliers, Center, Spread
Socs
Shape: Symmetric, Skewed, Uniform, and Bell-shaped
sOcs
Outliers: values outside of an overall pattern
soCs
Center: The median or mean of a curve
socS
Spread/variability: scope of values from smallest to largest
Cluster
A subgroup of which values fall into based on category(age range, school, sex, tax rate)