Statistics I Flashcards

1
Q

Difference between and observational study and a survey

A

Survey requests information from the subjects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Difference between binomial and normal distribution

A

Binomial: variable is counter the number of successes in a certain number of trials Normal: Variable takes on values that occur according to the “bell shaped curve”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the t-distribution

A

Variable is based on smaple averages and you have limited data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is correlation

A

The strength and direction of the linear relationship between x and y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Census vs. sample

A

Census is the entire population, sample is only part of it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mean, median, mode

A

Mean: average

Median: equal number of data points above and below that specific data point

Mode: data point that occured the most

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Standard deviation equation

A

n = sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the empirical rule

A

68 / 95 / 99.7 rule

68% of the data lies within 1 standard deviation

95% of the data lies within 2 standard deviations

99.7% of the data lies within 3 standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Distribution of z score

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Central Limit Theorem

A

Gives you the ability to measure how much your sample mean will vary, without having to take any other sample means to compare it with

Gives you the ability to use confidence intervals and hyposthesis tests

Basically, if you keep taking samples of a set size, the resulting distribution of the means of the samples will be normal! The higher the set size, the more “normal” the distribution is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the basic definition a a z distribution

A

Mean = 0, Std dev = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Blind vs. double-blind

A

Blind: participant doesn’t know

Double-blind: participant and admin doesn’t know

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Margin of error

A

Supposed to measure the maximum amount by which the sample results are expected to differ from those of the actual population.

Often, this is referencing the confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Confidence Interval

A

The Percentage that represents the certainty that the mean is within a particular range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Hypothesis test

A

Data collected from a sample and measured against a claim about a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

p-value

A

Shows the confidence for or against the null hypothesis.

The null hypothesis is the claim that’s on trial.

The alternative hypothesis is the one you would believe if the null hypothesis was untrue.

p-value < 0.05 indicates strong evidence against the null hypothesis, so reject it

p-value > 0.05 indicates weak evidence against the null hypothesis, so you fail to reject it

p-value == 0.05 could go either way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the relationship between mean, median, and skew

A

If the mean is larger than the median, skewed right

If the mean is smaller than the median, skewed left

Skewed right has a tail off to the right

Skewed left has a tail off to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the definition of a percentile

A

The percentage fo data that is below or above the particular data point. This doesn’t have to be continuous distributions, can be discrete counting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the “five number summary” of a dataset

A

[minimum, 25 percentile (first quartile, Q1), median (50 percentile), 75th percentile (third quartile, Q3), maximum]

Innerquartile range is Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Box plot

A

Great way to represent the five number summary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the characteristics of a binomial

A
  1. fixed number of trials
  2. each trial is either a success or failure
  3. there is a probability of success that is constant for each trial
  4. trials are independent (the outcome of one doesn’t influence others)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Equation for determining the probability of a certain number of desirable outcomes in a binomial distribution

A

Where:
b = binomial probability
x = total number of “successes” (pass or fail, heads or tails etc.)
P = probability of a success on an individual trial
n = number of trials

“A coin is tossed 10 times. What is the probability of getting exactly 6 heads?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Combination and choose notation

A

“n choose r”

A coin is tossed 10 times, what is the probability of getting exactly 6 heads

10C6 is the notation for the formula

Also (10 over 6) (can’t upload two images, but number 10 over number 6 in parenthesis is another form of notation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Relationship between variance and standard deviation

A

Standard deviation is the square root of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Equations for mean and variance of binomial distributions

A

The mean of X (number of favorable occurances) is

u = np

The variance of X is

σ2 = np(1-p)

26
Q

For a normal distribution, what is significant about the two inflection points of the curve

A

The two inflection points represent where 1 standard deviation occurs

27
Q

Equation for z score

A
28
Q

How to find the corresponding x value when given a percentile

A

Go to the corresponding z value for the percentile, then use the z score / x value / std dev / mean equation to get the x value

( TI-84: DISTR > invNorm() )

29
Q

What to do if the standard binomial equations fail you (numbers too high for the factorials)

A

Approximate it with a normal distribution

The following conditions must me true

n * p >= 10

n * (1-p) >= 10

You will need to calculate the mean, std dev to get the z score, then find the percentile

(TI 84: DISTR > invNorm

30
Q

CDF vs. PDF

A

CDF: cumulative density function (eventually rises up to 1)

PDF: probability density function (doesn’t rise up to 1, like the normal distribution)

Basically CDF is good for a range of occurences. instead of a specific number of successes (i.e. “3 trials”) this function gives you the probability there will be 0 to x successes in n trials. In other words, if you put X=3 it will five you the probability for 0,1,2 and 3 trials (all together).

31
Q

Basics of the t distribution

A

Shorter and fatter than the z distribution, gets taller and skinnier with more samples

used when you only have a sample, and trying to determine facts about the population

32
Q

What is degrees of freedom

A

used to describe t distributions

Equal to sample size - 1 (n-1)

notated as t9 (9 degrees of freedom)

t30 is desired (very close to normal)

33
Q

Something to keep in mind about probability distributions

A

They can be 1 sided, or 2 sided. Be careful

34
Q

Formula for standard error of the mean

A

This is the standard deviation of the sampling distribution of the sample mean…

σx

35
Q

Relationship between confidence interval, margin of error, and critical value

A

Margin of error = Critical value x Standard deviation of the statistic

or

Margin of error = Critical value x Standard error of the statistic

Standard error is a function of sample size and standard deviation. Standard error is basically the same as standard deviation, except you can’t use population parameters because you don’t know them.

If the confidence interval is 95%, then alpha is equal to .05. Critical probability is 1-(alpha / 2)= (0.975). Critical value is the z or t score associated with that probability. Then go back to the original equation

36
Q

Central limit theorem basically says

A

All distributions are somewhat normal, and that 30 is the good transition point for sample size

37
Q

When calculating the z score when you need to use standard error:

(CTL)

A
38
Q

What is p hat (p^) ?

A

p hat is the proportion of individuals in the sample who have a particular characteristic

39
Q

What is standard error: σp^

A

where p is the sample number

40
Q

CLT needs to be large enough for

A

np and n(1-p) to be greater than or eqaul to 10

41
Q

u vs. x

A

population mean vs. sample mean

(you can have a ux)

42
Q

How do you get the percentage given a z score on the TI-84

A

You have to use normalCDF()

The range has to be that z score and an extreme z score (like -999 or 999)

43
Q

Basically, margin of error can be two things

A

Calculate margin of error for a sample proportion (sort of like binomial, approve disaprove of politicians)

or

Calculate the margin of error for a sample mean

Margin of error = Critical value x Standard deviation of the statistic

or

Margin of error = Critical value x Standard error of the statistic

44
Q

If the sample size is too small to use the CLT, what do you do

A

If you can assume it came from a normal distribution, use t-values

Trick: if the population standard dev, σ, is not given, you can use the sample standard dev and use t values

45
Q

What are standard errors

A

The building blocks of confidence intervals. A conficence interval is a statistic plus or minus a margin of error, and the margin of error is the number of standard errors you need.

The number of standard errors required is called the critical value (z*) called the z star value

46
Q

During hypothesis testing, what are you testing the p value against

A

The significance level, or alpha level (typically 0.05)

47
Q

What is a Type-I eror?

A

Rejecting the null hypothesis when you shouldn’t

48
Q

What is a Type II error?

A

Not rejecting the null hypothesis when you should have

49
Q

Equation for a t-test

A
50
Q

Equation for test statistic for a single proportion

A
51
Q

Equation for comparing two independent population averages

A
52
Q

Test for an average difference (the paired t-test)

A

d is for differences

53
Q

Equation for comparing two population proportions

A

0 because the theoretical difference between proportions is zero

54
Q

Correlation equation

A

s is sample std devs, bars are means of the samples

how to calculate:

for each (x,y) multiply the differences, then add up all of those results

The rest of the formula is clear

  • 1: negative linear relationship
    0: no relationship
    1: positive linear relationship
55
Q

What is the best fitting line (regression line)

A

The line that minimizes the sum of squares for error (SSE)

Slope is the standard deviations and r is correlation, y int is calculated using the two means

56
Q

What is a confounding variable

A

Illustration of a simple confounding case: in this graphical model, given Z, there is no association between X and Y. However, not observing Z will create fake association between X and Y. In the latter case, Z is called a confounding factor.

57
Q

Marginal distributions between two way tables

A

Pick the row or column variable, and divide each subtotal by the grand total as shown:

58
Q

How to work with joint distribution in two way tables

A

Divide each cell by the grand total. Sum of all should be 1.

59
Q

Conditional distribution in a two way table

A

“Find the conditional distribution of gender by country”

Say there’s 3 countries…

The result will be 3 totals all equal to 1, with each having a percentage of gender

If it’s find x by y… If x is a row, each row adds up to 1, if x is a column, each colum adds up to 1

60
Q

Ways you can determine independence in a two way table

A

Compare the reslts of two conditional distributions (check if they match)

Compare the marginal and conditional distributions to check for independence

^ if greater than a 2 way table, go to the Chi-square test