Test #1 Flashcards
Variable
Any characteristic of a case
Cases
The subjects of a data set (objects or people)
Histogram
connected
Compares the values of different items, uses quantitative variables
Bar chart (not connected)
Compares the values of different items, uses categorical variables
Ways to describe a bar chart/histogram
- Shape
- Center
- Spread
Outlier
Any value that falls outside the overall pattern, can affect mean and standard deviation
Mode
The most common value in a data set, the major peaks of a bar chart/histogram
Symmetric
Distribution creates a mirror image
Skewed
Distribution is concentrated to the left or right
Shape
Symmetric vs. skewed
Center
Mean vs. median
Spread
Standard deviation vs. IQR
Categorical variables
Data is words, places cases into categories
Quantitative variables
Data is numbers, measures the values of each case
Median
The middle value or midpoint of a distribution
Mean
The average value of a distribution
Best ways to describe a distribution
Measure of center and measure of spread
Q1
The median of the data which fall to the left of the overall median
Q3
Median of the data which falls to the right of the overall median
Five-number summary
Min Q1 Median Q3 Max
Boxplot
A graph of the five-number summary
IQR
Q3 - Q1 (the distance between the quartiles)
Standard deviation formula
S = square root of: 1 / number of cases - 1 E (x1-mean)squared
Standard deviation
How much distance there is from the mean, greater than 0.
Normal distribution
Bell curve, symmetric, unimodal
N(mean, standard deviation)
Unimodal
A distribution that contains one single peak
68-95-99.7 rule
68% of observations fall within 1 standard deviation of the mean
95% fall within 2 SD of the mean
99.7% fall within 3 SD of the mean
Z-score
Standardized value of x
Z-score formula
z = x - mean / standard deviation
Proportion
Decimals
Response variable (y- axis)
Dependent variable, measures outcome
Explanatory variable (x-axis)
Independent variable, explains or causes the change in the response variable
Scatterplot
Shows the relationship between 2 quantitative variables measured on the same individuals
Ways to describe a scatterplot
- Form
- Direction
- Strength
Form
Linear
Direction
Positive vs. negative vs. none
Strength
Strong vs. weak
Correlation r formula
r = 1/n-1 (x-mean of x/standard deviation of x) (y-mean of y/standard deviation of y)
Correlation r
Measures direction and strength. Between -1 and 1. Positive if positive correlation, negative if negative correlation
Regression line
A straight line that shows how the response variable changes as the explanatory variable changes. Used to predict the value of y for a given value of x.
Formula for predicting y (regression line)
y = slope (x) + intercept
Slope formula
Slope = r (standard deviation of y / standard deviation of x)
Slope
A change of one standard deviation in x corresponds to a change of r standard deviations in y
Measure of center and spread for symmetric data
Mean and standard deviation
Measure of center and spread for skewed data
Median and IQR