Lecture 5-8 Flashcards

1
Q

What is the central limit theorem? (CLT)

A

the larger the sample is, the sampling distribution (plotting the mean on histogram ) , the distribution will eventually result a normal distribution. Even if the variable is not normally distributed or skewed, such as Income

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Sampling Distribution

A

Similiar to a frequency distribution.
The sampling distribution charts or graphs the probability of getting an useful value such as the mean.
Relies on repeated samples, and larger sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the population parameter?

A

population-level statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the law of probability? Two types of probability? Examples?

A

Theoretical probability;
How likely an event is going to happen, theoretically. I.e. Flipping coin- 1/2 or 50% chance of heads or tails
represented from 0-1, or %
Empirical probability: What happens in actual reality. Flipping a coin- if got get 6 heads in 6 flips, your empirical probability is 100%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Law of Large Numbers in Probability?

A

When the number of tests increases, the empirical will converge with the theoretical probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does the Law of Large Numbers apply in Sampling?

A

The larger the sample size(n) is, the more likely it is to be close to the actual population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Sampling Distribution

A

Similiar to a frequency distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Review- what is inferential statistics?

A

??

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the standard error of the Mean? (shorthand Std. Error in SPSS)
What are the two factors that determine the Std. Error of the Mean?

A

Std. Error a measurement of the Error of the Sample mean from the true Population mean.

1) n/ sample size
2) the variation/ std. deviation in the sample size(i.e. income report of a sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the confidence interval mean?

A

Based on sample standard error calculation, figuring out a range of values that the True Mean of the population would be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to calculate the 95% confidence interval of a Mean?

A

By adding or subtracting 1.96 of the standard deviations from the mean

95%CI= Sample Mean +/- 1.96(Std.Er)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In his Central Limit Theorem simulation, what did the lecturer do?

A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What kind of relationship does the Sample size n have with the Std. Error of the Mean?

A

Reverse relationship. Bigger the n, smaller the Std. Error of mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why does the Std. Error of the Mean Matter?

A

You can calculate the true mean of the population based on the formula and calculation of Std. Error of mean(confidence intervals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does statistical significance actually mean? How is it determined?

A

The generalizability to the population

By running significance tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Refresher: What is a research hypothesis? What is the difference between a research and a null hypothesis?

A

The research hypothesis is a theory or assumption researchers come up with based on prior knowledge and evidence.
A research and a null hypothesis state two opposite statements and seem to contradict one another. But the purpose of both is to work towards proving the research theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why do we want to disprove Null-hypothesis? And how?

A

Based on Karl Popper’s philosophy.
We want to disprove or discredit a null hypothesis because DISproving there is No relationship between two variables is actually One way to establish there IS a relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does falsification. mean and why is it important?

A

Falsification means setting of to disprove certain hypothesis. the process of repeated attempts to “disprove” or “discredit” a hypothesis. It’s the only way to “proof” or verify a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does falsification protect us against?

A

Confirmation bias and one-sided evidence that only “supports” a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the two types of research theory? what does it affect?

A

Non-directional.

Directional: theorizes relationship and direction of relationship The type of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a significance level and when is it used?

A

It is used before doing a significance test. The level in social science by convention is the 95% Confidence Interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the two things we are most concerned about in testing a bivariate relationship?

A

Magntitude: the strength of relationship
Reliability: generalizability of relationship(statistical significance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is statistical signficance related to null hypothesis?

A

p> =0.05- reject null hypothesis. There is a relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What do Type i & ii errors mean?

A

Type i: Defining a relationship between two variables as generalizable when it isn’t. Also called False positive. 5% error rate of Type I error.
Type ii: Defining there is no relationship when there is a relationship. False negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What do Type i & ii errors mean?
Type i: We are saying there is a relationship in the population when there isn't. False positive. 5% error rate in social sciences. Type ii: We are saying there is no relationship when there is. False negative.
26
What is p value?
It's called probability value. The measuring value of statistical significance. Certain threshhold value determines if relationships between two variables are generalizable to population, or just random occurences within the sample
27
What is the p value in social sciences for rejecting null hypothesis? In other disciplines?
In social sciences, a relationship Or a difference is significant at 95 percent. This is written in REVERSE in p values, as p represents the probability of Type I errors occuring. p< .05, or p smaller than 5%. In biochemical disciplines(chemo treatment) p>= .995
28
What are the mean controversies and problems of statisitical significance tests?
statisitical significance tests 1. does not measure the Strength of a relationship 2. has type i error rate 3. statisitical significance tests depend on sample size
29
What is a chi-square test and when are chi-squares used? What is an example?
A significance test. It is used with comparing "no relationship scenario" and determine if there WAS a relationship between "categorical variables" I.e. Guy used an extremely simplified If there Is a relationship between two sexes and employment rate. Chi-square is used to distinguish between "no relationship" and "yes relationship". If p was less than 0.05
30
what is the degree of freedom?
analyzed in chi square | how many "unconstraint" or"free" cells there are in a crosstabs
31
relationship between degree of freedom and chi-square?
Shape of chi-sqaure distribtution depends on degree of freedom
32
When can chi-sqaure NOT be applied?
With dataset that are less than 5
33
What is a chi-sqaure distribution?
....
34
What is the chi-square formula?
chi-square= Sum(observed- expected value)Squared/ expected
35
What are two ways to state a null or research hypothesis?
Either by words or by a math formula | F(frequency)o= F(expected)
36
What do we focus on in this class in terms of chi-square
Pearson chi-square
37
What is the default significance cut-off level? What do we compare to?
.05.
38
How do you interpret the chi square tests?
state p-value, compare it to significance cut-off level. if it's less than .05, then there is some relationship between two variables, and we REJECT the null hypothesis.
39
Can we interpret directionality and strength of relationship based on chi-sqaure re
No
40
What logic is lambda based on?
Error reduction- calculates if knowing the frequency counts of both variables help to reduce the errors of guessing one or another. If it does help, then there is a relationship.
41
What MofA are for nominal-level variables? what is the limitation of these measures?
Phi, Cramer's V and Lambda are measures of associations for nominal variables. Because nominal have arbitrary values, the MoA can't infer the directionality of relationship between the variables.
42
What MofA are for ordinal-level variables?
Gamma, Tau-b, Tau-c, Somer's d, Speaman's rho
43
Which MofA are covered In Pairs? what does it mean to be"tied on pairs"?
Comparing the two variables
44
What is a similar pair?
Variable pairs moving in the same relationship direction - either positive or negative -rule of thumb- cannot be in the same column or row
45
What does it mean for two values to be "tied on the same independent/ dependent variable"?
both values have the SAME attribute in a variable. i.e. same income or age. They are found in the same row or column
46
How many cells can a cell form similar pairs with?
There is no limit. Again, similar pair means that both cell share the relationship with the variables in same direction. But doesn't have to share the same "strength of relationship"
47
When is MofA Spearman's rho used?
Continuous ordinal variables are involved. i.e. scale of 1-10
48
When is correlation, scatterplots and regressions used?
only with two ratio-level variables
49
What is a null hypothesis again?
The underlying assumption stating that there is NO FOUND association whatsoever between two variables. Yet the study result that fails to reject a null hypothesis should be treated EQUALLY as important
50
What are tests and concepts of inferential statistics?
Tests ran: correlations, scatterplots, regression testing | Concepts: p value, MofA, Type I & II errors,
51
What is the significance cut-off for Pearson's correlation values?
p= +/-.05
52
What are most important take-aways from this course?
- Overarching concepts - Differences bw descriptive and inferential statistics - types of descriptive statistics - Types of research questions and methodologies suited for types of analysis - Logic used in the analysis and tests
53
What is the line of best fit and when is it NOT appropriate?
Only when the relationships are linear. When the relationship is "curvilinear"(the most common type of relationship tho), the line of best fit is NOT appropriate
54
Examples of curvilinear relationships?
Income and happiness. | There is a "slow down curve" after income reaches 80,000
55
What is a type i error in a correlation? Can it be eliminated?
False positives | No, not really
56
What is an spurious relationship in stats?
Two variables that seem causally or strongly related, but really are not
57
When is line of best fit required to be plotted?
Linear relationships in Pearson's r correlations and scatterplots
58
Two variables that can be plotted on scatterplot?
Rating of personality and physical appearance
59
Can MoA and Measures of Significance determine causality?
No. They are descriptive stats.T he nature of variables of ordinal and nominal offer too little information for causal claims. Even when MoA determines the strength of a relationship, they cannot determine causality.
60
Criteria for causal claims? Which one is the hardest to satisfy?
1. Cause come BEFORE the effect (not hard) 2. Factual or empirical relationship (not hard. sign. tests and MoA can determine) 3. Cause can NOT be explained by other variables- often IMPOSSIBLE
61
What are examples of | descriptive stat tests?
measures of association, significance test, central tendencies, standard deviations
62
What does regression calculation create? What does OLS stand for? What does it do?
Regression such as the Ordinary Least Squares creates a LINEAR EQUATION for the set of data, which then creates the line of best fit
63
Is it stat significant when p is greater than .05, or .995?
The p must be smaller than .05, and the relationship must be greater than 95%
64
What is the connection between a p value and type i error/false positive?
P values tells you the chances of making a type i error or false positive
65
A false positive of what?
there is an association between two variables
66
How do we run a sigf. test for Pearson's r / correlations?
Determine the p value of the Parson's r, by a process called interpreting the t-distribution
67
How do we interpret p<0.05?
There is a stat. sigf. relationship
68
does it automatically mean there is a "strong relationship" between two variables?
No.
69
What is the OLS linear equation? what does each legend stand for?
``` y= a+bx x=iv y=dv a=baseline or constant b= slope/regression coefficient i.e. Y= (0.5)x ```
70
Can OLS regression be plotted as Pearson correlation as well?
..
71
Do you also run a sigf. test with regression tests? Why?
Yes. Always. That's the first step of establishing that a relationship even exists = rejecting the hypothesis
72
In OLS, how to interpret the unstandardized B coefficient in relation to the independent variable?
Interpret in unit of 1s
73
Can you do inferential stats without ratio-level variable?
Yes
74
Can you include a binary variable (male vs female) in OLS?
Yes. the value is either 0 or 1
75
What is the significant tests appropriate for each level of variables?
Nominal and ordinal variables: Chi-square One Ratio and one nominal/ ordinal: ANOVA Both ratio: Correlation
76
What are important limitations of significant tests pointed out by critics?
- five percent of time, making type i error of false positive - Does not tell how strong the relationship is - Testing larger sample sizes is more likely to to produce significance