Class 6 Spring π· Flashcards
What are the two types of data that can be analyzed after sampling from a larger population?
Categorical and numerical data
What are some ways to visualize data trends and distribution shape?
Table or graph (e.g., frequency plot, bar chart, histogram, box-and-whisker, scatterplot)
What summary statistics are commonly calculated?
- Central tendency (mean, median, mode)
- Dispersion (range, IQR, standard deviation)
What is the relationship between the type of table or graph and the nature of the variables?
It depends primarily on whether the variables are numerical or categorical
What shape does a normal (Gaussian) distribution always have?
Bell-shaped and symmetrical around a central mean
What does the area under the normal distribution curve always equal?
1
What does the 68-95-99.7 Rule (Empirical Rule) describe?
- About 68% falls within 1 SD of the mean
- About 95% falls within 2 SD of the mean
- About 99.7% falls within 3 SD of the mean
What is the formula for calculating a z-score?
(observation β population mean) / standard deviation
What is the standard normal distribution notation?
N(Β΅ = __, Ο = __)
How can z-scores be used in relation to different distributions?
To compare how unusual two measurements are, even when looking at different normal distributions
What does a z-score represent?
The number of standard deviations an observation is above or below the mean
True or False: Z-scores can be used for distributions of any shape.
False
What is the probability that a randomly sampled US adult is between 180 cm and 185 cm estimated by?
The shaded area under the curve
What statistical function in R is used to calculate the lower tail area based on a z-score?
pnorm()
Fill in the blank: The area under the normal curve always adds up to _______.
1
What happens to real-world data distributions compared to the normal distribution?
They will never produce a perfect curve
What does a z-score of more than 3 signify?
Very unusual observations
What is the mean and standard deviation for SAT scores?
Mean = 1500, Standard Deviation = 300
What is the mean and standard deviation for ACT scores?
Mean = 21, Standard Deviation = 5
What is the role of parameters in the normal distribution?
Parameters (mean Β΅ and standard deviation Ο) describe the normal distribution perfectly
What method is used to find the probability of a measurement falling above a particular cutoff value?
1 - pnorm(value)
What is the probability that a random student scores at least 1190 on her SATs calculated using?
1 - pnorm(z-score)
What is the mean height of male adults in the US?
70.0β
What is the standard deviation of heights for male adults in the US?
3.3β
What is the first use of z-scores?
To compare distributions by standardizing measurements/observations to z-scores
What does the second use of z-scores allow you to calculate?
The percentile of a cutoff value
What is the third use of z-scores?
To calculate the probability that a measurement will fall ABOVE a particular cutoff value
What can you calculate using z-scores related to two values?
The probability that a measurement will fall BETWEEN two values
What does the area below a cutoff value represent?
Area to the left
How do you calculate the area above a cutoff value?
1 - pnorm()
How is the area between two values calculated?
Area to left of higher value - area to left of lower value
What does the 68-95-99.7 Rule describe?
Distribution of data in a normal distribution
What percentage of data falls within 1 standard deviation of the mean according to the 68-95-99.7 Rule?
68%
What percentage of data falls within 2 standard deviations of the mean according to the 68-95-99.7 Rule?
95%
What percentage of data falls within 3 standard deviations of the mean according to the 68-95-99.7 Rule?
99.7%
What is considered βunusualβ in terms of standard deviations from the mean?
> 2 SD
What is considered βvery unusualβ in terms of standard deviations from the mean?
> 3 SD
What are the SAT score ranges for the 68-95-99.7 Rule?
- ~68%: between 1200 and 1800
- ~95%: between 900 and 2100
- ~99.7%: between 600 and 2400
What function is used in R to find a z-score from an area?
qnorm
Fill in the blank: To find the z-score for a data point that is bigger than 40% of other data points, you would use _______.
qnorm(0.4)
How can you manually input mean and standard deviation in R?
Using qnorm(area, mean=mean_value, sd=sd_value)