Describing data Flashcards

1
Q

Micro data

A

collected on individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Macro data

A

Collected on groups of units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population

A

The set of all statistical units of the interested object. Denoted by N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sample

A

Subset drawn from the population. Denoted by n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Non-probability sampling

A

Units are drawn from the population according to the judgement of the researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Probability sampling

A

Units are drawn from the population randomly. It ensures that the sample is representative of the population, by not favoring any part of N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inferential process

A

Drawing conclusions that concern the entire population from the information drawn from n.
Collection of techniques that make use of sample statistics to learn on N parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parameter

A

Numerical summary of a characteristic at N level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistic

A

Numerical summary of a characteristic at n level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical values

A

Non numerical values, can be either nominal or ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nominal categorical values

A

Non-numerical that cannot be ranked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ordinal categorical values

A

Non-numerical that can be ranked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Numerical values

A

number values, can either be discrete or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Discrete Numerical value

A

takes on a finite number of values of infinite but COUNTABLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Numerical continuous values

A

Can take any value between two numbers (ex.: height and weight)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is a frequency distribution table composed (its columns)

A

RIGHT: observed distinct values (classes/groups)
MIDDLE: absolute frequency or absolute values of the observations
LEFT: relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can you represent a freq. table (not with intervals)?

A

Pie or bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How can you represent a freq. table (with intervals)?

A

Histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can you read a histogram?

A

HORIZONTAL AXIS: Intervals –> on each interval there is a bar having area equal to its relative frequency
VERTICAL AXIS: interval density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can you calculate an interval density?

A

relative frequency/ interval length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The higher the number of intervals the ……….. is the degree of detail of the description

A

higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Mode

A

The level or value of a variable that is observed with the highest frequency = the most observed value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the unique measure for nominal variables?

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Median

A

The central value of the distribution. It divides the sample in half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How to calculate the median for odd and even numbers?

A

ODD: (n+1)/2
EVEN: any of the two middle observations, or the arithmetic avg of them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can we calculate the median from a frequency table?

A

It can either be the value in which the cumulative percentage is 50% or the first value that weights more than 50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Mean

A

Arithmetic average of all variable values. ONLY for numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Deviation

A

The difference between each observed value and the mean. Positive if higher than the mean and negative if lower

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How to calculate the mean from freq. distribution tables?

A

(Valuefrequency) + (value2freqeuency) +…… / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Do outliers affect the mean? Why?

A

Yes, because it is measured using ALL the values from the observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Do outliers affect the median? Why?

A

No, because it is measured only by using the frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are the two measures of location and for what types of variables can they be used?

A

Quartiles and percentiles.

For ordinal categorical and numerical values

33
Q

What are the quartiles and what does each represent?

A
Q1 = approximately a quarter of the observations are smaller (25th %)
Q2= Median 
Q3= approximately three quarters of the observations are smaller (75th %)
34
Q

What does the pth percentile represents?

A

It is the value such that approximately p% of the cases fall below

35
Q

What are the characteristics of a boxplot?

A

Lower edge = Q1
Upper edge = Q3
line = median
Whiskers = values above/below 1.5x IQ range. Any value above/below are outliers

36
Q

How can we compute the IQ range?

A

IQ range = Q3-Q1

37
Q

What are the 5-number summary?

A

the most extreme values in the data set (the maximum and minimum values), the lower and upper quartiles, and the median

38
Q

Bell-shaped distribution characteristics

A

Median-Q1=Q3-median
Q1-Min = max - Q3
Median - Q1 < Q1-min
Mean = median

39
Q

NOT Bell-shaped distribution characteristics

A

Median-Q1=Q3-median
Q1-Min = max - Q3
Median - Q1 > Q1-min
Mean = median

40
Q

Skewed-right distribution characteristics

A

Median-Q1>Q3-median
Q1-Min > max - Q3
high on low values and low on high values

41
Q

Skewed-left distribution characteristics

A

Median-Q1

42
Q

What are the four measures of variability?

A

Range, IQ range, variance and standard variation

43
Q

What do the measures of variability show us?

A

How the frequencies are distributed across the values and if the units are spread uniformly across the variable

44
Q

Range

A

Max - Min value

45
Q

Why is the range affected by outliers?

A

Because it only uses extreme values

46
Q

What does the IQ measures?

A

The spread of the 50% central part of the distribution

47
Q

Variance

A

The average of the squared deviations

48
Q

What does the variance measures?

A

The dispersion or variability of a variable as the spread around its mean.
It is ALWAYS positive
If all values are equal then it’s = 0

49
Q

Standard deviation

A

The square root of the variance

50
Q

Coefficient of variation

A

CV = sd/mean

NEEDED when comparing two variables, if they do not have the same mean

51
Q

What does a variable concentration analyzes?

A

It assesses how far from the extremes the actual distribution is

52
Q

What does a very concentrated variable means?

A

that the distribution is very different from the perfectly equal distribution

53
Q

What does a low concentrated variable means?

A

that the distribution is very close from the perfectly equal distribution

54
Q

For what type of variable is a concentration analysis carried out?

A

Only for POSITIVE numerical values that have the the property of TRANSFERABILITY

55
Q

Are bio metrical values, such as weight, transferable values?

A

No

56
Q

Are financial values transferable values?

A

Yes

57
Q

What is the name of the concentration curve?

A

The Lorenz curve

58
Q

What are the two variables needed for concentration and how to compute them?

A

Fi and Qi
Fi = i/n
Qi = value/n

59
Q

If Fi and Qi, for each i, are the same, what type of concentration do we have?

A

Minimun concentration

60
Q

If Qi=0 for each i except Qn, which is 1, what type of concentration do we have?

A

Maximum concentration

61
Q

In the Lorenz curve, the closer the curve is to the horizontal axis the ……….. (greater/smaller) the concentration is

A

Greater

62
Q

In the Lorenz curve, the closer the curve is to the vertical axis the ……….. (greater/smaller) the concentration is

A

Smaller

63
Q

What are the characteristics of the concentration/Lorenz curve?

A

Continuous, convex, it crosses the dots with the coordinates (0,0) and (1,1)

64
Q

Gini index

A

Denoted by R = concentration area/possible maximun area

65
Q

What is the maximun area formula?

A

(n-1)/2n

66
Q

When is the gini coefficient 1?

A

Max concentration

67
Q

When is the gini coefficient 0?

A

Min concentration

68
Q

What is a bivariate association? And how can it be done for numerical values?

A

The study of association between two values. It can be done with a cross tab/ contingency table

69
Q

What type of plot can be used to show conditional frequencies?

A

Stacked bar plot or Side bar plot

70
Q

When do we have a positive linear association between numerical values?

A

When high/low values of one variable tend to occur with high/low values of the other variable

71
Q

When do we have a negative linear association between numerical values?

A

When high/low values of one variable tend to occur with low/high values of the other variable

72
Q

With what measures can we assess linear association?

A

Covariance and Pearson index

73
Q

How is the pearson index calculated?

A

r= Cov (X,Y) /SxYx

74
Q

What does the Pearson index measures?

A

The direction and strength of the correlation between two numerical variables. It takes on values between -1 and 1

75
Q

What does the following pearson correlations indicate?

+1,-1,0

A

+1 - strong POSITIVE correlation
-1 - strong NEGATIVE correlation
0 - no linear association

76
Q

If the median is closer to Q3 what is the distribution shape?

A

Left-skewed

77
Q

If the median is closer to Q1 what is the distribution shape?

A

Right skewed

78
Q

If the median is on the center what is the distribution shape?

A

Bell-shaped