Statistics Flashcards

1
Q

Can you ever be sure about disproving a hypothesis?

A

No you cannot be completely sure, however you can be arbitrarily sure if the results are statisitically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does it mean for a result to be statistically significant?

A

A result is called statistically significant if it is unlikely to have occurred by chance. Normally meaning that the p-value is less than 0.05 (5%). (However can alter the threshold)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the p-value?

A

The p-value is the probability of obtaining the given results if the null-hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a null hypothesis?

A

A null hypothesis is what is assumed to be true and is being tested against to be disproven. Functionally meaning that both data sets are from the same mechanism, wheras we are trying to prove they are different aka the alternate hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to prove/disprove something with stats

A
  • It is not possible to prove/disprove something with stats
  • You can only reject the null hypothesis given enough statistically significant data
  • Otherwise the test “didn’t find a statistically significant difference” and “fails to reject the null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a research question?

A

A statement that identifies a phenomenon to be studied.
Ex: I believe that rewards improve memorization skills

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a hypothesis?

A
  • A statement of the predicted relationship between at least two experimental variables.
  • A provisional answer to a research question
  • Ex: group chocolate will have a higher memorisation score than group with no reward
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Independent vs dependent variable

A

The dependent variable is the event studied and expected to change whenever the independent variable is altered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a controlled variable

A

The variables that are** kept constant** to prevent their influence on the effect of the independent variable on the dependent. Ideally everything besides dependent and independent variable is controlled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a confounding variable

A

Extraneous variables that correlates with both the dependent variable and the independent variable.
Example: Weather temperature correlates with both ice-cream sales and murders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The goal of experimental design

A

Experimental design aims at maximizing your chances of finding the signal and not the noise (noise being randomness, confounding variables etc, that may show correlation not causality)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Within vs. between subjects

A

Within = All participants do the same thing (everyone does A and B)
Between = Certain participants do only certain conditions (certain people do A, certain people do B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Comparison of within vs. between experiments

A

Within pros:
+ Less user variation (between groups)
+ Statisical power with less participants

Between pros
+ No baises from other conditions (eg. transfer of learning from doing A before B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is counterbalancing?

A

A method of avoiding confounding among variables/
Presenting conditions in a different order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is a latin square used for counterbalancing?

A

A latin square is an n × n array filled with n different Latin letters, each occurring exactly once in each row and exactly once in each column, where each letter corresponds to treatment/condition. Varying the order in this way avoids counfounding variables and transfer of learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sweet spot of number of trials.

A

Ideally as many trials as possible but 30 - 40 is the sweet spot.

17
Q

What does a t-test measure?

A

T-statistics tell us how many standard errors away from the mean our observed difference is.

18
Q

What is bonferroni correction?

A

To reduce type I error when testing n hypotheses, test each one against 0.05/n. This is because when conducting n tests, the chance of one of them being invalid increases by a factor of n (0.05 * n)

19
Q

Use of a t-test

A

Comparing two groups

20
Q

What is an ANOVA test used for?

A

An ANOVA is an analysis of varience and is used to compare multiple variables. Often ANOVA tests prove there is a significant difference and follow-up t-tests show where there difference is.

21
Q

What is regression?

A

A machine learning technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables. Basically drawing a line which minimises the distance to each data point.

22
Q

Ways to determine the goodness of fit for a regression

A
  1. Standard error
  2. R squared
23
Q

How to calculate standard error

A

Lower is better

24
Q

How to calculate r squared

A

Percentage, higher is better.

25
Q

What is a CHI-Square used for

A

When both the independent variable and dependent variable are categorical. Can have multiple variables if needed

26
Q

CHI-Square formula

A

Use a table to find p from X^2

27
Q

CHI-Square degrees of freedom (contingency table)

A

(number of
rows−1)∗(number of columns−1)

28
Q

How to operate a CHI-Square contigency table

A
  1. We compute the sums in all direction
  2. For each cell, multiplying that cells row and column totals and dividing by our total sample size
    e.g. case (sport, male)= (29 * 50) / 75 e.g. case (family, male= (46 * 50) / 75 …
  3. Use the Chi-square formula
  4. Calculate degree of freedom as DF =(number of
    rows−1)∗(number of columns−1) (here = 1)
  5. Use the Chi-square table to conclude!