PO218 Statistics Midterm Flashcards
The 6 descriptive measures
- Frequency
- Intervals
- Proportion
- Percentage
- Ratio
- Visualization
Proportion (p)
the fraction of the case distribution
- always in between 0 and 1
Data visualization
graphics that display information in a meaningful way
The 2 main categories of descriptive measures
Central tendency and Variation / Dispersion
Central tendency
most typical or representative value of a data set
- 3 measures: mean, median, and mode
Mean (average)
For continuous variables
- sum ➗ number of cases
- population mean: represented by µ
- sample mean: represented by x (with a line on top)
Median (midpoint)
For continuous or ordinal variables
- sort the data set and select the middle number (average of middle 2 numbers if even data set)
Mode (most frequently occuring)
For nominal or ordinal variables
- the value with the most number of cases
Variation / Dispersion
how a variable is either concentrated or spread out
The 5 measures of variation
- Standard deviation
- Range
- Index of qualitative variation
- Coefficient of variation
- Variation ratio
Standard deviation
for continuous variables
- how far a certain score is from the mean
Range
for continuous and ordinal variables
- difference between the highest and lowest value
- range = highest value - lowest value
Index of qualitative variation
for nominal and ordinal variables
- 0 to 1
- 0 = no variation
- 1 = variable is split evenly among categories
Coefficient of variation
for continuous variables
Variation ratio
for ordinal and nominal variables
- based on mode (most)
- 0 to 1
- 0 = no variation
- 1 = highest variation
- if vr > 0.5, the mode is questionable
Probability
the chance that an event will occur
- 0 = no chance
- 1 = guaranteed chance
Central limit theorom
the larger the sample size, the more accurate the result will reflect the population
z scores
Standard deviations based on the standard normal
- when to use: if the population variance is known, or the sample size is bigger than 30
- positive z score = above the mean
- negative z score = below the mean
Reliability factors for z scores
- 1.645 accounts for 90% of the area
- 1.96 accounts for 95% of the area
- 2.575 accounts for 99% of the area
- 3.3 accounts for 99.9% of the area
t scores
Standard deviations based on the ‘t’ distribution
- when to use: when the population standard deviation is unknown, or the sample size is below 30
Margin of error
the difference between the sample statistic and the population parameter
Standard normal
- Upper limit: sample statistic + (reliability level) (standard of error)
- Lower limit: sample statistic - (reliability level) (standard of error)
Deviant/outlier
a case or score that doesn’t follow the normal distribution (which means it isn’t cause and effect related)
Type 1 error
rejecting a correct null hypothesis
Type 2 error
accept an incorrect null hypothesis
Alpha level
probability of committing a type 1 or type 2 error
Null hypothesis
the expected mean of a variable
Alternative hypothesis
stating that the expected mean is not the null hypothesis