Stats Courses 510, 620 621 Flashcards

1
Q

What is the purpose of descriptive stats?

A

o describe, show or summarize data in a meaningful way. Descriptive statistics do not, however,
allow us to make conclusions beyond the data we have analyzed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 2 types of descriptive statistics that are used to describe data?

A
  1. Measures of Central Tendency a. Mean
    b. Median
    c. Mode
  2. . Measures of Variance
    a. Range- the highest number minus the lowest number
    b. Variance- the average deviation of data values from the mean in squared units
    c. Standard Deviation- the square root of variance; used as an approximate indicator of the average distance that your data values are from the mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are ways to display data?

A
● Bar Graphs
● Histograms
● Line Graphs
● Scatter Plots
● Box & Whisker Plot
● Stem & Leaf Plot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are inferential statistics?

A

uses a sample of data taken from a population to describe and make inferences about the population. This is a set of
methods used to make a generalization, estimate, prediction or decision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are examples of inferential statistics?

A
● Simple Correlations & Regressions
● Multiple Correlations & Regressions
● t-tests (paired and independent)
● ANOVA (one thru three-way)
● ANCOVA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Normality.

A

the underlying random variable of interest is distributed normally, or approximately so. Normal distributions are symmetrical with a single central peak at the mean (average) of the data. The mean, mode, and median are the same in a normal distribution.
○ The shape of the curve is described as bell-shaped with the graph falling off evenly on either side of the mean.
○ Fifty percent of the distribution lies to the left of the mean and fifty percent lies to the right of the mean.
○ The spread of a normal distribution is controlled by the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is skewness?

A

a measure of the symmetry in a distribution (Symmetry of a distribution)
○ A symmetrical dataset will have a skewness equal to 0.
○ So, a normal distribution will have a skewness of 0.
○ Skewness essentially measures the relative size of the two tails. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain semi partial correlation.

A

(or part correlation), we find the correlation between X and Y holding Z constant for both X and Y.
○ Sometimes, we want to hold Z constant for just X or just Y. Instead of holding constant for both, hold for only one, therefore it’s a semipartial correlation instead of a partial.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is multiple correlation?

A

​(R) is a measure of the strength of the association between the independent variables and one dependent variable.

  • R can be any value from 0 to +1.
  • The closer R is to one, the stronger the linear association is.
  • If R equals zero, then there is no linear association between the dependent variable and the independent variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is multiple coefficient of determination (R-squared)?

A

the square of the multiple correlation coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is regression analysis?

A

a statistical process for estimating the relationships among variables. When one independent variable is used in a regression, it is called a ​simple regression;​ when two or more independent variables are used, it is called a ​multiple regression​.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression

Techniques for Comparing Variables for Relative Importance

A

● B (or b) generally refers to the unstandardised coefficient. This means that the regression coefficient is in
the original measurement units. Used for units of measurement that already a measurable unit (income, GPA).
● The β (beta) refers to the number of standard deviation changes we would expect in the outcome variable for
a 1 standard deviation change in the predictor variable. Used for items that do not have a measurable unit (levels of happiness, scores on a depression scale).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is forward selection?

A

Forward selection ​is when a researcher add variables to the model one at a time.
● At each step, each variable that is not already in the model is tested for inclusion in the model.
● The most significant of these variables is added to the model, so long as it’s P-value is below some pre-set level.
● We begin with a model including the variable that is most significant in the initial analysis, and continue adding variables until
none of remaining variables are “significant” when added to the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is backward selection?

A

one starts with fitting a model with all the variables of interest (following the initial screen). Then the least significant variable is dropped, so long as it is not significant at our chosen critical level. We continue by successively re-fitting reduced models and applying the same rule until all remaining variables are statistically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an independent samples t test?

A

when data are collected on subjects where subjects are divided into two groups. This is called an independent or parallel study. That is, the subjects in onegroup (treatment, etc) are different from the subjects in the other group. This data may be analyzed using an independent group t-test (sometimes called an independent samples t-test or parallel test.) This version of the t-test is testing the null hypothesis (two-sided)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a dependent samples t test or paired samples t test?

A

data are collected twice on the same subjects (or matched subjects) the proper analysis is a ​paired t-test​ (also called a dependent samples t-test). In this case, subjects may be measured in a before – after fashion, or in a design where a treatment is administered for a time, there is a washout period, and another treatment is administered (in random order for each subject)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is kurtosis?

A

indicates how the peak and tails of a distribution differ from the normal distribution (peakedness of a distribution)

  • Data sets with high kurtosis tend to have heavy tiles, or outliers.
  • Data sets with low kurtosis tend to have light tails or lack of outliers.
18
Q

What is an outlier?

A

an observation that lies an abnormal distance from other values in a random sample from a population

19
Q

What is homoscedasiticity?

A

the assumption is central to linear regression models. It describes a situation in which the error term is the same across all values of the independent variables

20
Q

What is heteroscedasticity?

A

the violation of homoscedasiticity is present when the size of the error term differs across values of an independent variable.

21
Q
What is the best measure of central tendency for each of the following:
Nominal
Ordinal
Interval/ratio (not skewed)
Interval/ratio(skewed)
A

Mode
Median
Mean
Median

22
Q

What are measures of variability?

Define them.

A

ways of summarizing a group of data by describing how spread out they scores are (range, variance, Standard deviation)
Range-the diff. b/w the largest and smallest value in a set of values
Variance- the average squared deviation from the population mean
SD- the average distance the values of a data set are from the mean

23
Q

Distinguish the difference between within subjects IV and between-subjects IV

A

WS IV- an IV of which all participants receive all levels.

BS IV- an IV of which each participant receives only one level

24
Q

What is ANOVA?

A

test of the statistical significance of the differences among the mean scores of 2 or more groups of 1 or more variables or factors
* extension of the t test which can only handle 2 groups at a time
* used for assessing the stat. significance of the relationship b/w categorical IV and a continuous DV
* equivalent to MR with dummy coded IVs and a continusou DV
* limitation is that it only indicates whether a stat. significant difference exists among the group means but it does not specify which diff. are stat. significant
Researchers suggest effect size indicator.

25
Q

Explain GLM, 3 forms, analysis for each form and equations.

A

HANDOUT

26
Q

What are the Assumptions of GLM?

A

Handout

27
Q

What are the 3 types of hypothesis testing?

A

Handout

28
Q

Explain the GLM: (Handout)

A

Bivariate- 1 DV and 1 IV

  • Yhat=B0 + or = B1X1
  • ANOVA
  • Simple Regression

Simple Multivariate- 1 DV and more than 1 IV

  • Yhat= B0 + or = B1X1 + or = B2X2
  • 2 Way ANOVA, 3 Way ANOVA
  • 1 Way, 2 Way, 3 Way RM ANOVA
  • Two Factor Mixed ANOVA
  • Multiple Regression
  • Analysis of Covariance

Full Multivariate- More than 1 DV and more than 1 IV

  • MANOVA
  • 2 Way MANOVA
  • Multiple Analysis of covariance MANCOVA
29
Q

Define Null Hypothesis

A

typically the hypothesis that states that there is no difference between means, or no relationship between variables within a population

30
Q

Type I error

A

The error rejecting when the null hypothesis is true.

a(alpha)- the probability of a Type I error

31
Q

Type II error

A

the error of not rejecting when the null hypothesis is false.
b(beta)- the probability of a type II error

32
Q

Type II error

A

the error of not rejecting when the null hypothesis is false.
b(beta)- the probability of a type II error

33
Q

Define alternative hypothesis.

A

the opposite of the null, it states that there is no relationship in the population.

34
Q

Critical Value

A

the value demarcating the critical region of a null hypothesis probability distribution

35
Q

Effect size

A

the magnitude of a relationship or difference between means

36
Q

Define practical significance and statistical significance.

A

Stat- the conclusion that a particular finding would be very unlikely if the null hypothesis were true
Prac-subjective but thoughtful decision of a researcher related to how important the findings would be in real world applications.

37
Q

Probability distribution

A

the probability distribution of a sample statistics hypothesis testing approaches.

38
Q

Design a Study.

A

Study Design- (Type of study, methodology, data collection)
Threats to Validity
Statistical Analysis
Internal & External validity issues

39
Q

Design a Study.

A

Study Design- (Type of study, methodology, data collection)
Threats to Validity
Statistical Analysis
Internal & External validity issues

40
Q

What is nonparametric tests?

A

Nonparametric tests​ are sometimes called distribution free statistics because they do not require that the data fit a normal distribution.
● More generally, nonparametric tests require less restrictive assumptions about the data.
● Another important reason for using these tests is that they allow for the analysis of categorical as well as rank data.
● Parametric and nonparametric are two broad classifications of statistical procedures.
● Parametric tests are based on assumptions about the distribution of the underlying population from which the sample was taken.
● Nonparametric tests do not rely on assumptions about the shape or parameters of the underlying population distribution.

41
Q

Chi Square Goodness of Fit Test

A

is the test is applied when you have one categorical variable from a single population. It is used to determine whether sample data are consistent with a hypothesized distribution.

42
Q

Chi Square Test for independence

A

is a test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables.