Statistics Flashcards
What are measures of central tendency
Median, Mean, and Mode
Median is the middle number when the numbers are in numerical order, it is more resistant to outliers than the mean
Mean = average of numbers (sum all numbers and divide by amount of int in set) - the mean is least resistant to outliers
What is variance? What is it used for?
Variance tells us how much the values in a dataset differ from the mean value.
A large variance results indicates large variability in the data
A small variance indicates small variability in data
–
Population Variance = the sum of (x - mean)^2 /n (aka the average of the dev^2
Sample Variance = the sum of (x - mean)^2 /n-1 ; -1 helps to make up for the variability found in the population compared to the sample data (aka the average of the dev^2)
If there is a constant increase in the values, the variance doesn’t change even if the mean does
What is the standard deviation and why do we need it?
Variance is a squared formula resulting in squared units which don’t make sense mathematically. So we take the sq rt of the variance to get the standard deviation (and correct units)
things to review:
t-test
z-test
ANOVA
confidence intervals (inferential)
regression analysis (inferential
correlation
r squared and RMSD
https://www.linkedin.com/learning/excel-statistics-essential-training-1/the-z-test-for-independent-samples?resume=false&u=36492188
https://www.khanacademy.org/math/ap-statistics/density-curves-normal-distribution-ap
https://www.datacamp.com/blog/statistics-interview-questions - make this into flashcards
what does Right skewed mean?
the data has a positive skew or a tail that is to the right. this means that the mean is greater than the median
what does Left skewed mean?
the data has a negative skew or a tail that is to the left. this means that the median is greater than the mean
what does skewness measure
asymmetry of a dataset around its mean
what is a histogram? what does it show?
a graphical representation of a distribution of data. it divides the data into bins or intervals to show the frequency or count of datapoints within each bin. they are used for continuous data and help to identify patterns including skewness, mode, and outliers
what is inferential statistics
it is the use of statistics to make predictions about a population based on a random sample from that population. we use this data to draw conclusions on large populations
what is descriptive statistics
it is the use of statistics to summarize and describe the features of the dataset including measures of central tendency.
what are the 4 main sampling methods?
- random sampling = every member has an equal chance of being selected
- systemically sampling = selecting every k-th member of the population starting at a randomly determined point
- stratified sampling - divides population into subgroups with random samples taken from ea
- cluster sampling - dives population into clusters and randomly selects some clusters to sample all members in
what is the central limit theorem?
sampling air states that the sampling distribution of the sample mean will approach a normal distribution as the sample size increases, regardless of the population’s distribution – provided that the samples are indeed and identically distributed
What are the differences: joint, marginal and conditional probabilities?
marginal = the prob of a single event occurring regardless of other events
joint = prob of 2 events occurring together at the same time
conditional = probability of an event occurring given that another event has occurred (not at the same time)
what is probability distribution? & what are the two types?
it describes how random variables are distributed.
The two main types are discrete ex a binomial distribution
2. continuous = ex: normal distribution
what is a normal distribution
a bell shaped curve that is symmetric around the mean. aka the mean is equal to the median
approx 68% of the data is 1 st dev
95% is 2 st dev
99.7% is 3 st deviations