Descriptive stats Flashcards

1
Q

What’s the difference between descriptive and inferential statistics?

A

Descriptive –> describe sample data based on sample statistics
Inferential –> use sample statistics to learn on population parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is micro data?

A

Data collected on individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is macro data?

A

Data collected on a group of units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a population?

A

The set of all statistical units object of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the sample?

A

A subset drawn from the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is non probability and probability sampling?

A

Non –> units are drawn from the population according to the judgement of the researcher
Probability –> units are drawn at random from the population, and every unit has the same probability to be drawn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the inferential process?

A

It consists in drawing conclusions that concern the entire population from the information provided by a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two broad variable categories?

A

Numerical and categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the subsets of numerical variables?

A

Discrete and continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the subsets of categorical variables?

A

Ordinal and nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the columns of a frequency distribution table?

A

Classes/groups; absolute frequencies and relative frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is a histogram composed?

A

Horizontal axis –> intervals
Bars –> have an area equal to its relative frequency
Vertical axis –> interval density = relative frequency/interval width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we calculate an interval density?

A

Relative frequency/interval width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does the number of intervals relate to the accuracy?

A

The higher the # of intervals, the higher the detail of the description.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the three measures of central tendency?

A

Mode, median and mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the mode?

A

The level/value of a variable that is observed with the highest frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the unique measure of central tendency for nominal variables?

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the median?

A

It is the central value. If odd–> (n+1)/2, if even it’s the median of the two central values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the mean?

A

The arithmetic average of the values. (x1+x2+….+xn)/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the deviation?

A

It’s the difference of an observed value and the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the properties of deviation?

A
  • It’s positive when the value is higher than the mean and negative when not
  • The sum of all deviations is equal to 0
22
Q

Do strange values have an impact on the median?

A

No, because it’s based on frequencies

23
Q

Do strange values have an impact on the mean?

A

Yes, because it’s computed using all values

24
Q

What are the measures of location? And to which type of data can they be computed for?

A
  • Quartiles and percentiles;

- Ordinal categorical and numerical

25
Q

What are quartiles?

A

They divide the observation in four
Q1- 25% of values are smaller than it
Q2- it’s the median
Q3- 75% of the observations are smaller than it.

26
Q

What is the percentile?

A

The value that pth observations fall below it

27
Q

What are the 5 number summary? And how can it be represented?

A

Minimum, Q1,Q2,Q3 and maximum

By means of a boxplot

28
Q

How is the boxplot composed?

A

Height –> it’s the IQR (Q3-Q1)
Upper edge –> Q3
Lower edge –> Q1
Whiskers –> connect the outliers (1.5xIQR)

29
Q

What are the properties of a symmetrical/bell-shaped distribution?

A

Q1-min = Q3-max; Median-q1=Q3-median;

median-Q1

30
Q

What are the properties of a not bell-shaped distribution?

A

Q1-min = Q3-max; Median-q1=Q3-median;

median-Q1>Q1-min; Q3-median>Max-median

31
Q

What are the properties of a right-skewed distribution?

A

It’s high on low values and low on high values
Median - Q1 > Q3-median
Mean > median

32
Q

What are the properties of a left-skewed distribution?

A

It’s high on high values and low on low values
The mean is not affected by low frequency values
Median - Q1 < Q3-median

33
Q

What are the 4 measures of variability?

A

Range, IQR, variance and standard deviation

34
Q

What does thee IQR measure?

A

The spread of the central 50% of the observations

35
Q

What is the variance?

A

It’s the average of the squared deviations. It measures the dispersion of a variable around its mean. It’s always positive

36
Q

What is the coefficient of variation?

A

CV=s/mean, it expresses the standard deviation as percentage of the mean and allow for a comparison of the behavior of two variables when they have a different mean

37
Q

What does it mean to analyze the concentration of a variable?

A

It means to assess how far from the extremes the actual distribution is

38
Q

What does it mean a variable is very concentrated?

A

It’s very far from being perfectly concentrated

39
Q

What does it mean a variable has a low concentration?

A

It’s very close from being perfectly concentrated

40
Q

Can a concentration analysis be carried out for variables with negative values?

A

No

41
Q

What is the property needed for a variable so we can carry a concentration analysis?

A

It needs to be transferable

42
Q

How is qi distributed in a case of maximum concentration?

A

Q0-Qn-1=0 and Qn=1

43
Q

What is the coordinates of the maximum concentration?

A

{(n-1)/2n,0}

44
Q

How is qi distributed in a case of minimum concentration?

A

qi=fi for every i, that is, the concentration is always the same

45
Q

What are the properties of a concentration curve?

A
  • Continuos
  • Convex
  • crosses (0,0) and (1,1)
46
Q

What is the gini index?

A

R= concentration area/maximum possible area (n-1)/2n

47
Q

When are the pietra and gini index equal to zero?

A

When fi=qi for all i, that is, when it has a minimum concentration

48
Q

If for two variables their high observations tend to occur with high values of the other too, what kind of linear association is there?

A

Positive

49
Q

If for two variables their high observations tend to occur with low values of the other too, what kind of linear association is there?

A

Negative

50
Q

What is the formula for the pearson’s correlation in dex?

A

r = cov(X,Y)/ sx sy