Statistics Recap Flashcards

1
Q

what is the difference between a population and a sample in statistics

A

The population refers to all the data of interest while a sample is a small part of the population which is a cost feasible way to represent the population as a whole

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is cross section data

A

The display of the value of multiple instances of data at a certain time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is time series data

A

The display of one instance of data’s varying value across time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is qualitative data

A

Data containing information that cannot be represented numerically in a way understood by humans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is discrete data

A

quantitative data which have a fixed number of alternatives at least in an interval. So for example integer numbers are discrete despite being infinite

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is continous data

A

Data that have an infinite number of possible alternatives in an interval. F.ex decimal numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is nominal data

A

Qualitative data which does not imply quantity or order between the alternatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does ordinal data differ from interval data

A

Ordinal data has a meaningful order or ranking but it lacks meaningful comparable quantities like interval data. An example of ordinal data is stars in hotel reviews while interval data can be represented by temperature measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is ratio data

A

Data where each node represents a specific quantity in an order. Features of ratio data is that you can use the data in multiplication and division as well as the existence of a true zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can you use addition on interval data

A

Yes and ratio but not ordinal and nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is relative frequency distribution

A

A table where you display how large share of a sample falls within a scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is cumulative frequency distribution

A

A table where you display how many instances fall bellow a value in a scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does it mean that the data is symmetrically skewed

A

That the values are centered symmetrically around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does it mean that the data is positively skewed

A

That the data is more extreme when it is higher then the mean. In a histogram it would look like the tail of the normal distribution is larger in the positive direction. If the tail is longer in the negative it is negatively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a polygon

A

A histogram but instead of pillers there are dots in the middle of where the pillers would be and lines between the dots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an ogive

A

A polygon but the dots are on the right aka the positive extreme of the pillers and not in the middle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Does economic theory often suggest quantitative magnitudes of causal effects

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is RV an acronym fore in this course

A

Random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a probability distribution

A

A table of possible values of a random variable and their likelihood of occurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a Bernoulli distribution

A

the simplest probability distribution where the outcome is binary, either it happens with a likelihood p or it does not happen witch has the likelihood of 1-p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the probability density function

A

A function where the integral aka the area under the graph conveys the likelihood of a continuous random variable falling in a certain interval. An example of a probability density function is the normal distribution Z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the expected value of a random variable E(Y)

A

the mean of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is variance

A

A measure of spread around the mean, it is calculated by subtracting the mean from each random variable whereafter you exponentiate it by two ∑​(x​i−E(x))^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is standard deviation

A

the root of the variance

25
Q

What does skewness tell us

A

it can give us an ide of how the probability mass function look, if and in which direction it is asymmetrical

26
Q

What is kurtosis

A

its the measure of the mass in a skew tail, if the kurtosis is 3 the distribution is normal

27
Q

What is joint probability distribution

A

The noted down probability of scenarios to get a combination of two random variables

28
Q

What does it mean that two random variables are independent

A

That the outcome of one does not affect another which makes the probability of getting a specific combination the likelihood of each multiplied by each other Pr(X=x,Y=y) = Pr(X=x)*Pr(Y=y)

29
Q

What is covariance

A

A measure of if two variables tend to move together or against each other. it is calculated similar to variance but instead of multiplying the difference from the mean by itself you multiply the two differences with each other.

30
Q

is covariance unit free

A

No it has the unit of its components combined

31
Q

What is the relation of two variables if their covariance is greater than 0

A

possitive, they move together

32
Q

what is correlation

A

A measure of the strength and direction of two random variables relationship. it is calculated by dividing their covariance by their standard deviations times each other.

33
Q

What does -1 correlation

A

Perfect linear negative relationship, 1 if positive and 0 if no relation

34
Q

What is the correlation if the random variable X always is 3 when Y is 14

A

0

35
Q

If the correlation of two random variables is 0 does it mean that their relationship is constant

A

No but if it is it means that the correlation will become 0. in arrows it would be put const => 0

36
Q

what z value do you need to get 25% (not probability but linearly) in a normal distribution

A

1.96

37
Q

What is the expected z value in the normal Z distribution

A

0

38
Q

what is the variance in the normal Z distribution

A

1

39
Q

Why does a sample approximately represent a population in a simple random sample

A

because each sample is independent, but their probability distribution is identical so the outcome will be similar to the population as a whole

40
Q

What is the criteria for being an unbiased estimator

A

That the expected value of a sample is the same as that of the population

41
Q

Does a larger sample make the variance smaller

A

usually unless you get a really tight sample

42
Q

What is the central limit theorem

A

That if the sample is large its values become more and more normally distributed

43
Q

What is the variance of a random variable form a sample

A

The population variance divided by the sample size

44
Q

Is the variance of a random variable from a sample inversely proportional to the sample size

A

Yes

45
Q

What is the difference between an estimate and an estimator

A

An estimator is a function of a sample to be drawn from a population while an estimate is the value of an estimator when computed. An estimator is a random variable while an estimate is not as it is decided.

46
Q

When is an estimator consistent

A

when the expected value is the same across all samples

47
Q

When is an estimator efficient

A

When the variance is small

48
Q

What makes a good estimator to use in statistical inquiry

A

It is unbiased, efficient and consistent

49
Q

What is a hypothesis test

A

A yes or no question where you make a claim (a null hypothesis) and try to disprove it

50
Q

What is a one sided hypothesis test

A

When the null hypothesis is that the expected value is larger than something or smaller than something. A two sided hypothesis test is when you test if the expected value is in a specific interval.

51
Q

What is a p-value in hypothesis testing

A

the probability of obtaining an answer at least as extreme as you got assuming that the null hypothesis is true

52
Q

Should you reject the null hypothesis if the p value is high

A

No because it suggests that you have a value that is likely to get if the null hypothesis is true

53
Q

How do you calculate the p-value when the population variance is known

A

by the absolute value of the sample RV subtracted by the null hypothesis RV divided by the population standard deviation which is then used to limit the tail of the Z distribution where the integral is the p-value which you access using a table. (better explained visually)

54
Q

How do you estimate population variance when you only know the sample variance

A

You divide the sample variance by the sample size subtracted by one as the sample size otherwise causes a bias.

55
Q

How do you calculate the standard error of a sample

A

You divide the standard deviation of the population by the square root of the sample length. Logically it is also the square root of the sample variance.

56
Q

Do we use the z table when the population variance is unknown but estimated

A

No, we use the t-table with limited degrees of freedom which thickens the tails (increases the kurtosis) of the distribution.

57
Q

Should you use the t distribution when the sample size is large

A

No as large samples necessitates smaller degrees of freedom which makes the t-distribution approximate the z-distribution

58
Q

If the confidence level is 95% when should you reject the null hypothesis

A

If the absolute t value of the random variable is greater than 1.96 it can be rejected with a 5% significance level