Statistics Flashcards

1
Q

Central Limit theorem

A

When you repeatedly sample from an underlying population with unknown characteristics: the distribution of sample means will approximate the normal distribution (if the sample size is sufficiently large, typically > 30)

The distribution of sample means follows approximately normal distribution for sufficiently large samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The law of large numbers

A

Given any random process, the difference between sample mean and underlying population mean decreases as number of samples increases (observed probability approaches theoretical)

The larger your sample, the closer your sample mean is to the population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Inferential statistics

A

Inferring the characteristics of a population given a particular sample (for any sufficiently large sample we can estimate the mean of the underlying population from the sample mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Standard deviation

A

It is a statistic that tells us how much individual values in a data set differ from the mean of that set. It measures the spread, or variability, of a set of numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Standard error of the mean

A

It measures the precision of the sample mean as an estimate of the population mean. It tells us how much the sample mean (the average of our sample data) is likely to differ from the true population mean (the average of all possible data points if we could measure them all). It is the standard deviation divided by the square root of the sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Z-score

A

A z-score, also known as a standard score, tells us how far a particular data point is from the mean in terms of standard deviations. It helps us understand how unusual or typical a particular value is within a data set. It is calculated by taking a specific data point and substracting the sample mean and then deviding it by the standard deviation. Getting a Z-score around suggests its kinda normal data point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Confidence intervals

A

For sample indicate the interval in which there is a 95% likelihood that the population mean falls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

QQ-plot

A

A QQ-plot, or quantile-quantile plot, is a type of plot used to compare the distribution of a data set to a theoretical distribution, most commonly a normal (bell curve) distribution. It’s a helpful visual tool for checking whether your data is normally distributed, which is often an important assumption in statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parametric tests

A

Can be done on normally distributed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Non-parametric tests

A

Can be done on non-normally distributed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Quasi-experiment

A

Collection of data of 2 or more naturally occurring variables in the world (e.g. shoesize and breathhold) - no random assignment of subjects!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A full experiment

A

Systematic manipulation of variables (Independent variables) to observe how they influence an outcome measure (Dependent variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T-test

A

When we want to test if two means are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Regression

A

When we want to predict a continuous dependent variable from one or more
continuous OR categorical independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Correlation test

A

When we want to test the relation between two continuous variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Degrees of freedom

A

It essentially tells us the number of values in a calculation that are free to vary

In multiple regression, degrees of freedom are the number of data points minus the number of parameters (coefficients) estimated.