Sadie's Lecture 5 Flashcards

1
Q

The major criteria for selecting a statistical test are: (4)

A
  1. The level of measurement of the variables
  2. The number of variables and the number of categories (or attributes) for the nominal variables
  3. The type of sampling methods used in data collection
  4. The way the variables are distributed in the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What commonly used statistical test has dependent and independent variables that are both nominal or categorical (non-parametric test)?

A

Chi-square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What commonly used statistical test has dependent variables that must be interval or ratio (continuous) with an independent variable that is nominal?

A

T-Test and ANOVA (analysis of variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What commonly used statistical test has both dependent and independent variables that are interval or ratio (continuous).

A

Correlation analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What commonly used statistical test has both dependent and independent variables that are interval or ratio (continuous).

A

Correlation analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a One-Sample T-Test?

A

A one sample test of means compares the mean of a sample to a pre-specified value and tests for a deviation from that value.

For example…You want to know if a group of employees are more educated than the average americans…. The average years of schooling in the US is 12.7 years.
You would compare your this using a One-Sample T-Test with 12.7 years as the “test value.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

H0 is a _____ hypothesis while H1 is an ______ hypothesis.

A

null; alternative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

_______tests the null hypothesis that the means of two or more groups have statistically no difference between them.

A

One-way ANOVA (analysis of variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When using ANOVA, the dependent variable must be ______ (interval or ratio) and the independent variable must be ______.

A

When using ANOVA, the dependent variable must be continuous (interval or ratio) and the independent variable must be nominal. Example…. Salary (continuous) and sex (1 or 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In correlation analysis, correlation is a measure of the ______ of the relationship between two (continuous) variables

A

strength

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In correlation analysis, correlation is a measure of the ______ of the relationship between two (continuous) variables

A

strength

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The most frequently used measurement of correlation is called:

A

Pearson Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In calculating Pearson correlation, the variables should be ______.

A

continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Correlation Coefficient is expressed on a scale from -1.0 to +1.0. What does 1 represent?

A

Strongest POSITIVE Correlation.

For example…. The amount of food cosmo eats has a positive correlation to the amount of poop kate scoops.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Correlation Coefficient is expressed on a scale from -1.0 to +1.0. What does -1 represent?

A

Strongest INVERSE Correlation

Eg. The amount of dogs in the house has an INVERSE correlation to the amount of time Cosmo spends downstairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Correlation Coefficient is expressed on a scale from -1.0 to +1.0. What does 0 represent?

A

no correlation at all

16
Q

True or false: A correlation is not necessarily a causal relation.

A

True

17
Q

Three conditions must be met in order for a correlation to also be a causal relation:

A
  1. If X change, Y changes (correlation)
  2. Y changes after X (Time sequence)
    3, If X is removed, Y disappears (existence)
18
Q

What is “regression analysis”?

A

A broad class of widely used statistical methods to summarize the trends.

19
Q

What is the purpose of regression analysis?

A

To estimate and describe data & variables

Test hypothesis and find “laws”

20
Q

What is linear regression?

A

Linear regression is a linear model, e.g. a model that assumes a linear relationship between the input variables (x) and the single output variable (y). More specifically, that y can be calculated from a linear combination of the input variables (x).

21
Q

What is the regression equation?

A

Y= a + bX

22
Q

What does each element of the regression equation Y= a + bX mean?

A

Y = Dependent variable (goes on Y axis)
X = Independent Variable (plotted on X axis)
b is the coefficient, b = slope of the line
A is the intercept on Y axis (point where line crosses y-axis at x = 0)

23
Q

What does r represent?

A

Correlation.

For example… r=0.919

24
Q

What is R2?

A

R2 is a measurement of fitness of the regression.

25
Q

True or False: sometimes R2 is considered as the percentage of Y explained by in the regression.

A

True

26
Q

True or false: R2 represents the proportion of variation that is explained by the entire set of independent variables.

A

True

27
Q

What is R2 = r2 used for?

A

For simple regression (one independent variable)

28
Q

What does multiple linear regression deal with?

A

Multiple linear regression deals with the case that two or more independent variables contribute to the chance of the dependent variable at the same time.

29
Q

What is this an example of? Y=a+b,x+b2x2+b3x3….+b8x8

A

multiple linear regression

30
Q

What method is simple linear and multiple linear regression based on?

A

OLS (Ordinary Least Squares)

31
Q

What problems do you face when doing multiple linear regression?

A

A: similar to the problem you face when hiring ppl…

  1. Who should you hire (which variable)
  2. How many people (variables) should be hired (used)
  3. You want to be sure that you hire the right person (validity of the variable)
  4. Aware of the relations among people (Multicollinearity is a statistical concept where several independent variables in a model are correlated)
32
Q

What is multicollinearity?

A

A statistical concept where several independent variables in a model are correlated

33
Q

What is the problem with multicollinearity?

A

Independent variables are highly correlated with each other and produce imprecise and confusing results of the coefficient (aka you don’t know who is working)

34
Q

What three things should be in the report of a regression analysis?

A
  1. discussion of the validity of each variable : relevance to the dependent variable, current knowledge about this variable
  2. Fitness (R2)
  3. Coefficients and their statistical significance
35
Q

What are the limitations of the regression method?

A
  1. Validity issues
  2. Non linear relation between the dependent and the independents
  3. Multicollinearity. High correlation between the independent variables
36
Q

When doing linear regression: What should you do if the relation of the dependent and the independent variables is not linear?

A

transform it into linear