Describing data Flashcards

1
Q

Micro data

A

collected on individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Macro data

A

Collected on groups of units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Population

A

The set of all statistical units of the interested object. Denoted by N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sample

A

Subset drawn from the population. Denoted by n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Non-probability sampling

A

Units are drawn from the population according to the judgement of the researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Probability sampling

A

Units are drawn from the population randomly. It ensures that the sample is representative of the population, by not favoring any part of N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inferential process

A

Drawing conclusions that concern the entire population from the information drawn from n.
Collection of techniques that make use of sample statistics to learn on N parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parameter

A

Numerical summary of a characteristic at N level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistic

A

Numerical summary of a characteristic at n level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Categorical values

A

Non numerical values, can be either nominal or ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nominal categorical values

A

Non-numerical that cannot be ranked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ordinal categorical values

A

Non-numerical that can be ranked

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Numerical values

A

number values, can either be discrete or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Discrete Numerical value

A

takes on a finite number of values of infinite but COUNTABLE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Numerical continuous values

A

Can take any value between two numbers (ex.: height and weight)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is a frequency distribution table composed (its columns)

A

RIGHT: observed distinct values (classes/groups)
MIDDLE: absolute frequency or absolute values of the observations
LEFT: relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can you represent a freq. table (not with intervals)?

A

Pie or bar chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How can you represent a freq. table (with intervals)?

A

Histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can you read a histogram?

A

HORIZONTAL AXIS: Intervals –> on each interval there is a bar having area equal to its relative frequency
VERTICAL AXIS: interval density

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can you calculate an interval density?

A

relative frequency/ interval length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The higher the number of intervals the ……….. is the degree of detail of the description

A

higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Mode

A

The level or value of a variable that is observed with the highest frequency = the most observed value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the unique measure for nominal variables?

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Median

A

The central value of the distribution. It divides the sample in half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How to calculate the median for odd and even numbers?
ODD: (n+1)/2 EVEN: any of the two middle observations, or the arithmetic avg of them
26
How can we calculate the median from a frequency table?
It can either be the value in which the cumulative percentage is 50% or the first value that weights more than 50%
27
Mean
Arithmetic average of all variable values. ONLY for numerical values
28
Deviation
The difference between each observed value and the mean. Positive if higher than the mean and negative if lower
29
How to calculate the mean from freq. distribution tables?
(Value*frequency) + (value2*freqeuency) +...... / n
30
Do outliers affect the mean? Why?
Yes, because it is measured using ALL the values from the observation
31
Do outliers affect the median? Why?
No, because it is measured only by using the frequencies
32
What are the two measures of location and for what types of variables can they be used?
Quartiles and percentiles. | For ordinal categorical and numerical values
33
What are the quartiles and what does each represent?
``` Q1 = approximately a quarter of the observations are smaller (25th %) Q2= Median Q3= approximately three quarters of the observations are smaller (75th %) ```
34
What does the pth percentile represents?
It is the value such that approximately p% of the cases fall below
35
What are the characteristics of a boxplot?
Lower edge = Q1 Upper edge = Q3 line = median Whiskers = values above/below 1.5x IQ range. Any value above/below are outliers
36
How can we compute the IQ range?
IQ range = Q3-Q1
37
What are the 5-number summary?
the most extreme values in the data set (the maximum and minimum values), the lower and upper quartiles, and the median
38
Bell-shaped distribution characteristics
Median-Q1=Q3-median Q1-Min = max - Q3 Median - Q1 < Q1-min Mean = median
39
NOT Bell-shaped distribution characteristics
Median-Q1=Q3-median Q1-Min = max - Q3 Median - Q1 > Q1-min Mean = median
40
Skewed-right distribution characteristics
Median-Q1>Q3-median Q1-Min > max - Q3 high on low values and low on high values
41
Skewed-left distribution characteristics
Median-Q1
42
What are the four measures of variability?
Range, IQ range, variance and standard variation
43
What do the measures of variability show us?
How the frequencies are distributed across the values and if the units are spread uniformly across the variable
44
Range
Max - Min value
45
Why is the range affected by outliers?
Because it only uses extreme values
46
What does the IQ measures?
The spread of the 50% central part of the distribution
47
Variance
The average of the squared deviations
48
What does the variance measures?
The dispersion or variability of a variable as the spread around its mean. It is ALWAYS positive If all values are equal then it's = 0
49
Standard deviation
The square root of the variance
50
Coefficient of variation
CV = sd/mean | NEEDED when comparing two variables, if they do not have the same mean
51
What does a variable concentration analyzes?
It assesses how far from the extremes the actual distribution is
52
What does a very concentrated variable means?
that the distribution is very different from the perfectly equal distribution
53
What does a low concentrated variable means?
that the distribution is very close from the perfectly equal distribution
54
For what type of variable is a concentration analysis carried out?
Only for POSITIVE numerical values that have the the property of TRANSFERABILITY
55
Are bio metrical values, such as weight, transferable values?
No
56
Are financial values transferable values?
Yes
57
What is the name of the concentration curve?
The Lorenz curve
58
What are the two variables needed for concentration and how to compute them?
Fi and Qi Fi = i/n Qi = value/n
59
If Fi and Qi, for each i, are the same, what type of concentration do we have?
Minimun concentration
60
If Qi=0 for each i except Qn, which is 1, what type of concentration do we have?
Maximum concentration
61
In the Lorenz curve, the closer the curve is to the horizontal axis the ........... (greater/smaller) the concentration is
Greater
62
In the Lorenz curve, the closer the curve is to the vertical axis the ........... (greater/smaller) the concentration is
Smaller
63
What are the characteristics of the concentration/Lorenz curve?
Continuous, convex, it crosses the dots with the coordinates (0,0) and (1,1)
64
Gini index
Denoted by R = concentration area/possible maximun area
65
What is the maximun area formula?
(n-1)/2n
66
When is the gini coefficient 1?
Max concentration
67
When is the gini coefficient 0?
Min concentration
68
What is a bivariate association? And how can it be done for numerical values?
The study of association between two values. It can be done with a cross tab/ contingency table
69
What type of plot can be used to show conditional frequencies?
Stacked bar plot or Side bar plot
70
When do we have a positive linear association between numerical values?
When high/low values of one variable tend to occur with high/low values of the other variable
71
When do we have a negative linear association between numerical values?
When high/low values of one variable tend to occur with low/high values of the other variable
72
With what measures can we assess linear association?
Covariance and Pearson index
73
How is the pearson index calculated?
r= Cov (X,Y) /SxYx
74
What does the Pearson index measures?
The direction and strength of the correlation between two numerical variables. It takes on values between -1 and 1
75
What does the following pearson correlations indicate? | +1,-1,0
+1 - strong POSITIVE correlation -1 - strong NEGATIVE correlation 0 - no linear association
76
If the median is closer to Q3 what is the distribution shape?
Left-skewed
77
If the median is closer to Q1 what is the distribution shape?
Right skewed
78
If the median is on the center what is the distribution shape?
Bell-shaped