Chapter 2 Flashcards
Visualizing data to make sense of it
Descriptive statistics
Summarizing key aspects of data using numerical quantities
Summary statistics
Allows comparisons without referring to the sample size
Proportion (relative frequency)
Visual that involves pie slices that correspond with the proportion
Pie Chart
Visual where the height corresponds with the number of cases in each category – doesn’t touch
Bar graph/chart
Used to analyze two categorical variables
Two-way table
Visual where the cases are represented by dots stacked on each other
Dot plot
Visual where the height corresponds to the number of cases within a range of variables – Quantitative and touches
Histogram
The average of the data values
Mean
The middle value of data when arranged from smallest to largest
Median
An observed value that is noticeably different from the other values in the dataset
Outlier
What a value is called if it is relatively unaffected by extreme values
Resistant
Shows the distance between each number in the data set – measures spread and shows variability
Standard deviation
How far a value is from the mean
Deviation
The rule states that 95% of data will be between 2 standard deviations if the data is approximately a bell-curve
95% rule
How many standard deviations a value is from the mean
Z-score
Shows the direction of the data and is useful for skewed data
Five Number Summary
Graphical display of the Five Number Summary (NOT how it is written)
Boxplot
The rule states that, for bell-shaped curves, 68% is found in 1 standard deviation, 95% is found in 2, and 99.7% is found in 3
Empirical Rule
Graph used to analyze the data between two quantitative variables
Scatterplot
When the x value in a scatterplot increases, the y value increase – r>0
Positive association
When the x value in a scatterplot increases, the y value decreases – r<0
Negative association
The x and y values of a scatterplot do not influence each other – r=0
No association
The measure of strength and direction of linear association – on ascale of -1 to 1, weaker the closer to 0
Correlation
A straight line that best fits the data on a scatterplot
Regression line
The response value observed for a particular data point (y)
Observed response value
The response value that would be predicted for a given x value based on a model (yhat)
Predicted response value
The vertical distance between the observed and predicted values on a regression line
Residual
The best regression line because the residuals are as small as possible
Least Squares line
The increase in the predicted y value for every unit increase in the x value
Slope
The predicted y value when x equals 0
Intercept
How a 5 number summary is laid out
[Min, Q1, m, Q3, Max]