Stats Courses 510, 620 621 Flashcards
What is the purpose of descriptive stats?
o describe, show or summarize data in a meaningful way. Descriptive statistics do not, however,
allow us to make conclusions beyond the data we have analyzed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data.
What are 2 types of descriptive statistics that are used to describe data?
- Measures of Central Tendency a. Mean
b. Median
c. Mode - . Measures of Variance
a. Range- the highest number minus the lowest number
b. Variance- the average deviation of data values from the mean in squared units
c. Standard Deviation- the square root of variance; used as an approximate indicator of the average distance that your data values are from the mean.
What are ways to display data?
● Bar Graphs ● Histograms ● Line Graphs ● Scatter Plots ● Box & Whisker Plot ● Stem & Leaf Plot
What are inferential statistics?
uses a sample of data taken from a population to describe and make inferences about the population. This is a set of
methods used to make a generalization, estimate, prediction or decision.
What are examples of inferential statistics?
● Simple Correlations & Regressions ● Multiple Correlations & Regressions ● t-tests (paired and independent) ● ANOVA (one thru three-way) ● ANCOVA
Explain Normality.
the underlying random variable of interest is distributed normally, or approximately so. Normal distributions are symmetrical with a single central peak at the mean (average) of the data. The mean, mode, and median are the same in a normal distribution.
○ The shape of the curve is described as bell-shaped with the graph falling off evenly on either side of the mean.
○ Fifty percent of the distribution lies to the left of the mean and fifty percent lies to the right of the mean.
○ The spread of a normal distribution is controlled by the standard deviation.
What is skewness?
a measure of the symmetry in a distribution (Symmetry of a distribution)
○ A symmetrical dataset will have a skewness equal to 0.
○ So, a normal distribution will have a skewness of 0.
○ Skewness essentially measures the relative size of the two tails. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.
Explain semi partial correlation.
(or part correlation), we find the correlation between X and Y holding Z constant for both X and Y.
○ Sometimes, we want to hold Z constant for just X or just Y. Instead of holding constant for both, hold for only one, therefore it’s a semipartial correlation instead of a partial.
What is multiple correlation?
(R) is a measure of the strength of the association between the independent variables and one dependent variable.
- R can be any value from 0 to +1.
- The closer R is to one, the stronger the linear association is.
- If R equals zero, then there is no linear association between the dependent variable and the independent variables.
What is multiple coefficient of determination (R-squared)?
the square of the multiple correlation coefficient.
What is regression analysis?
a statistical process for estimating the relationships among variables. When one independent variable is used in a regression, it is called a simple regression; when two or more independent variables are used, it is called a multiple regression.
Regression
Techniques for Comparing Variables for Relative Importance
● B (or b) generally refers to the unstandardised coefficient. This means that the regression coefficient is in
the original measurement units. Used for units of measurement that already a measurable unit (income, GPA).
● The β (beta) refers to the number of standard deviation changes we would expect in the outcome variable for
a 1 standard deviation change in the predictor variable. Used for items that do not have a measurable unit (levels of happiness, scores on a depression scale).
What is forward selection?
Forward selection is when a researcher add variables to the model one at a time.
● At each step, each variable that is not already in the model is tested for inclusion in the model.
● The most significant of these variables is added to the model, so long as it’s P-value is below some pre-set level.
● We begin with a model including the variable that is most significant in the initial analysis, and continue adding variables until
none of remaining variables are “significant” when added to the model
What is backward selection?
one starts with fitting a model with all the variables of interest (following the initial screen). Then the least significant variable is dropped, so long as it is not significant at our chosen critical level. We continue by successively re-fitting reduced models and applying the same rule until all remaining variables are statistically significant.
What is an independent samples t test?
when data are collected on subjects where subjects are divided into two groups. This is called an independent or parallel study. That is, the subjects in onegroup (treatment, etc) are different from the subjects in the other group. This data may be analyzed using an independent group t-test (sometimes called an independent samples t-test or parallel test.) This version of the t-test is testing the null hypothesis (two-sided)
What is a dependent samples t test or paired samples t test?
data are collected twice on the same subjects (or matched subjects) the proper analysis is a paired t-test (also called a dependent samples t-test). In this case, subjects may be measured in a before – after fashion, or in a design where a treatment is administered for a time, there is a washout period, and another treatment is administered (in random order for each subject)