Week 2 Flashcards

1
Q

Define Descriptive / Summary Statistics (5)

A
  • A quantitive description of main features of data
  • A useful summary
  • Before actual analysis
    • What are the players of the game?
    • What are the types of variables? (discrete, continuous, categorical, dummy)
  • Get a feel for your data (are there any problems?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What to pay attention to in summary statistics? (4)

A
  • Min value
  • Max value
  • Negative values
  • Range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the measures of central tendency? (3)

A

Mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do extreme values affect the mean, median, and mode

A
  • Mean: Influenced by extreme values
  • Median: Relatively unaffected by extreme values
  • Mode: Not often affected by extreme values (unless there are identical outliers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which measure of central tendency do you use for the Nominal Variable?

A

Mode:
- The numbers in nominal variables only refer to the category
- Calculating the mean would be pointless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which measure of central tendency do you use for the Ordinal Variable?

A

Median:
- Median splits to create further categories or creates dichotomies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dichotomy

A

A division of 2 things that are being represented as different or opposed.
Using the interquartile range + median of an ordinal variable would split the data into 4 categories.
Example:
x<Q1 = Small, Q1<x<Median (Q2) = Small-Medium, Median (Q2) <x< Q3 = Middle-Large, x> Q3 = Large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which measure of central tendency do you use for the Interval (scale) or Ratio Variables?

A

Mean or Median:
Depending on the skewness, this would indicate which central tendency to go for.
Not skewed –> Mean
Skewed –> Median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Skewness

A
  • Describes the shape of the distribution –> Symmetry
  • Deviation from the normal bell-shape –> (a)symmetry of a distribution
  • Skewness = 0 –> Symmetric, Skewness not = 0 –> Asymmetric, Skewed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the name for when the skewness values go outside the -1 to +1 range?

A

Substantially skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What kind of skew is a distribution with a longer right tail?

A

Positively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a negatively skewed distribution?

A

A distribution which has a longer tail to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Kurtosis

A
  • Kurtosis describes the degree to which values are found at the tails of the distribution (compared to a normal distribution)
  • Can also be seen as how pointy a distribution is (peakedness or flatness)
  • It is important to mention whether a value has heavy (lepotkurtic) or light (platykurtic) tails.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Leptokurtic (kurtosis)

A
  • This is where there are few values by the tails and, therefore is pointy (heavy-tailed)
  • Kurtosis > 3
  • Think of “lep”tokurtic as “leap” –> Tends to be more pointy (leaping upwards)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Platykurtic (kurtosis)

A
  • This is where there are more values found at the tails of the distribution, therefore more rounded and flatter (light-tailed).
  • Kurtosis < 3
  • Think of “plat”-ykurtic as in “platform”- this is where the distribution is more rounded and flatter.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mesokurtic (kurtosis)

A

This is a normal distribution.
–> Kurtosis = 3

17
Q

What is the interquartile range?

A

This is a measure of variability.
Q1: Lower Quartile (P25, 25%)
Q2: Median (P50, 50%)
Q3: Upper Quartile (P75, 75%)

18
Q

What are the measures of variablility?

A

Variance: The average of the squared differences between each data point and the mean.
Standard Deviation: The square root of variance
High standard deviation–> High dispersion of data points
Low standard deviation –> Low dispersion of data points

19
Q

Correlation significance

A

Whether the correlation in population is significantly different from zero or not.

20
Q

Directionality problem

A

A and B can be correlated because A causes B or B causes A

21
Q

Third variable problem

A

A and B can be correlated not because A causes B or B causes A. but some unmeasured third variable, C, causes both A and B

22
Q

What is the correlation coefficient?

A

This is a value with a range between -1 and +1, where it measures the association between two variables, but NOT the causation.

23
Q

𝒚(𝒊) = 𝜷(𝟎) + 𝜷(𝟏)𝒙(𝒊) + E(i)
Define these values

A

y = Dependent variable
x = Independent variable
𝜷(𝟎) = Intercept / Constant
𝜷(1) = Slope or regression coefficient for the variable ‘x’.
E = Error term –> Everything that the model does not take into account

24
Q

What does it mean to measure the coefficient of a specific variable in a multiple regression equation? (3)

A

The coefficient of each independent variable:
- Indicates change in dependent variable
- When the given independent variable changes
- But keeping all other independent variables constant (important assumption)

25
Q

What is OLS?

A

Ordinary Least Squares (OLS):
- This is a good method of showing the LINEAR relationship between x and y.
- The relationship can be found by drawing a line of best fit.

26
Q

What do you look for in regression coefficients? (3)

A
  • Sign: The nature of the effect (positive/negative)
  • Size: of the effect (magnitude)
  • Significance: of the effect (is the coefficient statistically different from zero?)
27
Q

Give the common p-values and state when a variable becomes insignificant.

A

○ p value < 0.01 → significance at 1% level
○ p value < 0.05 → significance at 5% level
○ p value < 0.1 → significance at 10% level
○ p value > 0.1 → generally considered insignificant

28
Q

What is a regression coefficient?

A

This is an estimate of the independent variable you are measuring against the amount of change in y (DV) due to a one-unit change in x(i) while the other regressors (IVs) are held constant.
i –> In this case would refer to the independent variable we are studying.

29
Q

What is a dummy variable?

A

Dummy takes the value 1 (true) or 0 (not true) –> Indicates whether a condition is satisfied.
Generally for ‘m’ categories, you should include ‘m-1’ dummies.
The omitted one is the base/reference category.

30
Q

How do you interpret regression results for a dummy variable?

A

The same as any other variable:
- Sign: positive/negative.
- Size: the magnitude.
- Significance: depending on the p-value.
It is important to keep in mind the unit of measurement and how this may impact the dependent variable.

31
Q

How can you predict the maximum number of observations in an analysis?

A

The maximum number of observations in an analysis is always associated to the variable with the lowest number of observations.
–> When the researcher plots a regression table (model 1, 2, 3, 4, etc…) the number of observations at the bottom should not exceed the variable with the lowest number of observations in the descriptive statistics table.

32
Q

Why do regression tables have different Models?

A

This allows the researcher to separate the effect of the control variables from the main effect(s).

33
Q

How do you interpret a dummy variable in a control group on a regression table?

A
  • A control group for dummy variables always has 1 dummy variable as a reference to avoid the dummy variable trap.
  • When interpreting the dummy variable in a control group, the coefficient would represent a comparison between that specific dummy variable and the reference dummy.