Benchmark info Flashcards
describe statistics
the science of developing methods for collecting, analyzing, interpreting, and presenting data
what is the main goal of statistics
to use information from a sample to make inferences about the population
what is a variable?
a measurable attributs that can vary across entities and is represented by a column
what is an observation?
the complete set of recorded values for each entity, typically encapsulated in a single row of dataset
what is correlation?
it measures the association between two variables
when should you use mean and how is it affected by outliers?
- use with symmetric distributions without extreme values
- mean is highly sensitive to outliers
when should you use median and how is it affected by outliers?
- use for skewed distributions or data with outliers
- median is less affected bu outliers
when should you use mode and how is it affected by outliers?
- use when identifying the most common category or score
- mode is not influenced by outliers
what is variance in technical terms?
- the average of the squared differences from the mean
- the square of the standard deviation
what statistical tests can be used to find outliers?
z-score and IQR
describe histograms
- displays continuous data
- has touching bars representing frequency across intervals
what does skewness measure?
measure asymmetry of distribution
what does kurtosis measure?
“tailedness” or sharpness of the peak of a distribution
describe a scatterplot
displays the relationship between two numerical variables
what are “measures”?
- quantitative way of representing or summarizing aspects of data
- tools used to describe, analyze, and make sense of data
what is internal consistency?
the extent to which different items that are trying to measure the same variable are truly related and report similar results
what is construct validity?
whether the test accurately measures the underlying theoretical concept it is trying to measure
what is content validity?
whether the test covers all the important aspects of that construct
what is criterion-related validity?
how well one measure (predictor) correlates with a related outcome measure (the criterion) that is considered an established standard or a relevant benchmark
what is a null hypothesis?
a statement that claims there is NO difference, NO effect, NO relationship between variable
what is a type I error?
rejecting the null hypothesis when it is true
what is type II error?
failing to reject the null hypothesis when it is false
describe symbol, meaning, and interpretation of significance level
- α
- probability of making a type I error
- risk of false positive
describe symbol, meaning, and interpretation of confidence level
1-α
- prbability of not making type I error
- certainty in correctly accepting a true null hypothesis
describe symbol, meaning, and interpretation of type II error rate
β
- probability of making type II error
- risk of a false negative
when do you reject the null hypothesis?
when your statistic is larger than your critical value
what is a critical value?
a threshold that defines the boundary of a statistical test’s rejection region
what is another name for the normal distribution?
Gaussian distribution
does the normal distribution curve ever touch the horizontal axis?
no
what is the purpose of a t-test?
to see if there is a significant difference between means or populations
what is the purpose of an ANOVA?
to test for significant differences between 3 or more groups
what is a factorial ANOVA used for?
to test for the effects of two or more independent variables (factors) on a dependent variable
what does the number of factors in an ANOVA indicate?
the number of factors indicates the number of main effects
what is the purpose of correlation?
it measures the direction and strength of a linear relationship between two variables
what does the p-value of a correlation tell you?
the probability of observing a correlation at least as extreme as the calculated coefficient (assuming there is no actual correlation)
what are the limitations of correlation?
- sensitive to outiers
- curvilinear relationships (correlation can only measure linear relationships)
- restriction of range
what are the two uses of a chi-squared test?
1) test for indepndence
2) test for goodness of fit
what kind of data is typically used with a chi-squared test?
catagorical data
what is Y-hat in linear regressions?
the predicted value of Y
what are residuals in linear regressions?
observed value - predicted value
what does R-squared represent in linear regression?
how well the independent variable exaplains the variance in the dependent variable