Test November 9th Flashcards
CSOCS (acronym)
Context (What value is being measured?)
Shape (Right/left skew, symmetrical, modes)
Outlier (Unusual points)
Center (Mean, median, general center)
Spread (Range, IQR, Standard Deviation)
Standard Deviation
(definition and symbol)
the average distance that data points are from the mean (σ)
Mean
(definition and symbol)
Average (μ)
Median
Middle value
Range
Maximum minus the minimum
Frequency Distributions
based on how often something happens
Variable
something that changes and is a categorical group
Frequency
how often something happens
Relative Frequency
the percent of which a frequency happens
Normal Curves
symmetrical and bell shaped with the mean and median both located exactly in the center
Empirical Rule
rule that states percentages in a normal distribution fit into 68–95–99.7 ykwim
Bivariate Data
data with two variables
Explanatory/Independent Variable
variable that predicts, explains or influences a trend in the response variable
Response/Dependent Variable
the measured outcome
Positive Correlation
as the x values increase, the y values also tends to increase
Correlation Coefficient/R value
a number between -1 and 1 that tells you the strength and direction of the correlation (-1 < r < 1)
z score formula
z = (x-μ)/σ, or z score = (data point - mean) / standard deviation
Mean Formula
sum of the terms / number of the terms
Standardization
a point’s location in the distribution depends on both distance from the center and the distribution’s spread or variation
Median Formula
just figure it out tf!!
Risidual
the distance between a given data point and the line, the error
Low leverage points
points that don’t skew the line much and are very close to the mean
High leverage points
points that skew the line a lot and are very far from the mean
Influential points
points that if removed, will greatly affect the slope, line and y-intercept
Constant Coefficient (Predictor Coef SE Coef T P Table)
y intercept
Income Coefficient (Predictor Coef SE Coef T P Table)
slope
S (Predictor Coef SE Coef T P Table)
standard deviation
R-Sq (Predictor Coef SE Coef T P Table)
r²
Standard deviation of the residuals (s):
Typical error between data points and the
LSRL (typical residual length)
Population
every “member” of a data set
Sample
a selection of a data set or population
Census
when you collect data on every individual in the population
Bias
a study flaw that leads to unrepresentative and/or inaccurate estimates
Undercoverage
when part of the population has a reduced chance of being included in a sample
Simple Random Sampling
a sampling method where every group of individuals have a chance of being selected
weak correlation
r² close to 0, data is far from the LSRL
strong correlation
r² close to 1, data is close to the LSRL
Correlation
measures how two variables are related
Negative Correlations
as the x values increase, the y values tend to decrease
Least Squares Regression Line (LSRL)
a straight line that roughly puts half of your data above it and half below it