Descriptive Statistics Flashcards
Define central tedency
the tendency for the values of a random variable to cluster round its mean, mode, or median.
Define mean, median and mode
mean - average
median - middle value of data set
mode - most common number
what are the 4 measures of variability
- Standard deviation
- Interquartile range
- Confidence intervals
- Z - scores
Define standard deviation
The dispersion of values around the mean
Define interquartile range
which is the difference between the first and third quartiles.
Define confidence intervals
a range of values so defined that there is a specified probability that the value of a parameter lies within it.
Define Z - scores
A z-score describes the position of a raw score in terms of its distance from the mean when measured in standard deviation units
What is a high and low standard deviation
Low standard deviation means data are clustered around the mean, and
high standard deviation indicates data are more spread out.
Define correlation
Correlation is a statistical measure that expresses the extent to which two variables are linearly related
Define regression
a measure of the relation between the mean value of one variable (e.g. output) and corresponding values of other variables
Defije multiple regression
explains the relationship between multiple independent or predictor variables and one dependent or criterion variable
Define p value
- P-value is the probability that a random chance generated the data or something else that is equal or rarer
- P value is a number between 0 and 1
What is the P value threshold for statistical significance
- Threshold for statistical significance is most commonly <0.05
Lower the P value = what
Greater amount of statistical significance
What does a P value of 0.05 denote
5% probability that the results happened by chance
Define linear regression
Linear regression expresses the relationship of two variables by fitting a linear equation to observed data
Explain what a linear regression graph will look like for each value
R = 0 R = -1.0 R = +1.0 R = +0.06
R = 0 will result in a circle on data plot
R = -1.0 will result in a diagonal line from top left to bottom right with dots along line
R = +1.0 will result in a diagonal line from bottom left to top right with dots along line
R = +0.6 will result in diagonal line from bottom left to top right but dots are spread a bit away from line (Same for -0.6 but different direction)
Define pearsons correlation
- Pearson’s correlation coefficient is the test statistics that measures the statistical relationship, or association, between two continuous variables.
Define high, medium and low degree of correlation for r
- High degree: If the coefficient value lies between ± 0.50 and ± 1, then it is said to be a strong correlation.
- Moderate degree: If the value lies between ± 0.30 and ± 0.49, then it is said to be a medium correlation.
- Low degree: When the value lies below + . 29, then it is said to be a small correlation.
What does correlation not tell u
does not tell you whether one variable causes the other
Define and explain regression equation
- Y = bX + C
- Y is the dependent variable
- X is the independent variable
- B is the slope or regression coefficient
- C is the intercept of the Y axis
Define forced entry regression
Produce one R value
Define stepwise regression
Produce one or more R values for variables that explain variance
Define hierarchical regression
Produces R values at each step
Define bivariate regression
analysing two variables to establish the strength of the relationship between them.
Define degrees of freedom
- Number of individual scores that are free to vary without changing the means
Define homogeneity of variance
- The spread of scores around each mean is approximately equal
What is used to determine where the difference are
Post Hoc
A significant what will tell you there is a difference between groups
F-ratio
What is Benferroni correction and when to use it
The Bonferroni correction adjusts probability (p) values because of the increased risk of a type I error when making multiple statistical tests
- The p value is divided by the number of tests
Define outliers
Observations that are distant or distinct from the other observations in a dataset
Define skewness
Measure of the asymmetry of a distribution
Describe positive and negative skew
- Positive skew, more data on the left side
- Negative skew, more data on the right side
Define leptokurtic
When scores either side are very close together
- Resulting in sharp thin bell curve
Define platykurtic
- Flat wide bell curve
- Data are far either side
what range should skewness and kurtosis values fall
-1.0 to +1.0
Define Mesokurtic
Distributions that are moderate in breadth and curves with a medium peaked height.
What are the three types of kurtosis
leptokurtic, mesokurtic, platykurtic
Define kurtosis
the sharpness of the peak of a frequency-distribution curve.
What is done to a variable during bivariate regressio
Squared
Bivariate regression can be used for what two things
- Assess the shared variance between two variables
2. Predicted a value on one variable, using the value of another variable
Multiple regression equation
- Y = bX1 + bX2 + bX3…… + c