Statistics Flashcards
A/B testing
A/B testing is a way to compare two versions of something to find out which version performs better
Why companies use A/B Testing?
- optimized product performance
- improve customer experience.
Descriptive Stats
- describe or summarize the main features of a dataset.
- Descriptive stats are very useful because they let you quickly understand a large amount of data.
- mean, median, etc
Summary Stats
summarize your data using a single number
2 main types of summary stats
- measures of central tendency
- measures of dispersion
Measures of central tendency
Measures of central tendency like the mean, let you describe the center of your database
measures of dispersion
- measures of dispersion like standard deviation, let you describe the spread of your dataset or the amount of variation in your data points.
- standard deviation
Inferential Stats
- allow data professionals to make inferences about a dataset based on a sample of the data.
- use samples to make inferences about populations.
2 Statistical Methods
- Descriptive
- Inferential
Population
- Population includes every possible element that you are interested in measuring.
- parameter: is a characteristic of a population
- ex. height of the entire population of giraffes is a parameter.
Sample
- sample is a subset of a population.
- A statistic is a characteristic of a sample,
- ex. The average height of a random sample of 100 giraffes is a statistic.
Parameter vs Statistic
- parameter: is a characteristic of a population (height)
- statistic is a characteristic of a sample (avg height)
Name 3 measures of central tendency
Mean, Median, Mode
Median
- median is the middle value in a dataset.
- This means half the values in the dataset are larger than the median and half are smaller.B
Mode
- most frequently occurring value in the dataset.
- A dataset can have
- no mode,
- one mode or
- more than one mode.
When to use the mean, the median, and the mode?
Mean: no outliers
Median: have outliers
Mode: categorical
1 main disadvantage of Mean
sensitive to outliers
Why use mode for categorical data?
because it clearly shows you which category occurs most frequently.
2 Measures of dispersion
- Range
- standard deviation
Range
- range is the difference between the largest and smallest value in a dataset.
- quick understanding of the overall spread of your dataset.
What does Variance measure?
- A Measure of Spread
- Variance is a way to measure how spread out a set of numbers is. It tells you how much the numbers “vary” from the average (or mean).
- average of the squared difference of each data point from the mean.
- standard deviation squared
What does Standard Deviation measure and what does a larger value indicate?
- Standard deviation measures how spread out your values are from the mean of your dataset.
- The larger the standard deviation, the more spread out your values are from the mean.
How are Measures of Position helpful?
help you determine the position of a value in relation to other values in a dataset.
3 Measures of Position
- percentiles
- quartiles
- interquartile range