Week 1 Flashcards
Dependent variable
What is being affected (changes as a result of the independent variable)
Independent variable
The variables which affect/predict the dependent variable. (Often what is changed)
Validity
The extent to which a measure correctly represents the concept of study.
Accuracy
How close to the actual value did the measurement achieve? (1 meter compared to 1.4573 meters)
Reliability
Extent to which a measure is consistent in what it is intended to measure, replicability.
Internal validity
How well the (specific and individual) study has done
External validity
Generalizability of results.
Cross-sectional data
- Many subjects at a given point in time (people, households, countries)
- I.E. –> Profits across firms in China in 2020.
Time series data
- Same single subject over a given period of time
- I.E.–> Profits of firm A between 2000-2003
Panel (longitudinal) data
- Multiple subjects, different observations for these subjects over a period of time.
—> Think of it as a mix of
Cross-sectional + Time Series
–> I.E. Profits across Chinese firms over the period 2000-2003.
Primary data
Data collected by the researcher
Secondary data
Data collected by other agencies –> financial statement data, previous surveys, etc…
Selection bias
The sample is not random and may not represent the population being studied.
This means it would impact the way you should interpret a paper or data.
What are the 4 levels of measurement?
Nominal, Ordinal, Interval (scale), ratio
Name the types of categorical variables (2)
- Nominal Variables
- Ordinal Variables
Nominal Variables (3)
- These are data measurements where the values represent a category.
- No ranking or order
- No equal or defined distance between each value:
-> The distance from 1 and 2 are different from the distance from 2 and 3.
Examples: (Genders, hair color, student nationality, binary variables (yes = 1, no = 0)
Dummy Variable Trap (2)
- If a categorical variable can take on ‘k’ different values, then you should only create ‘k-1’ dummy variables to use in the regression model.
- The dummy variable trap occurs when the researcher does not use ‘k-1’, this would affect the outcomes of the results.
Ordinal Variables (2)
- These are ordered categories in a logical order.
- There still is no ‘equal’ distance
Examples: Product quality rating (1 = poor, 2 = average, 3 = good)
Name the types of quantitative variables (2)
- Interval (scale) variables
- Ratio variables
Interval (scale) variables (3)
- There is information about differences between points on a scale
(The values or numbers for each data point has a numerical meaning) - Equal intervals represent equal distances (scaled)
- No absolute 0 –> This is where a value on the scale can achieve negative values
Example: Temperature in Celcius (You can find negative temperatures)
Ratio Variables (5)
- Equal intervals in data represent equal differences.
- There is an absolute zero–> No negative values.
- Ratio variables are either:
- ‘continuous’ (measured, infinite, with decimals) or
- ‘discrete’ (counted, integers).
Example: Weight, height, number of people, money earned
–> In these examples, you do not get negative values
Describe the research process: Testing Hypothesis (4)
- Identify and define variables
- Dependent Variable
- Independent Variable(s)
- Collect Data
- Measurement
- Analyze data
- Graphically & Descriptively
- Fit a model –> Regression
- Conclude, discuss
When measuring a dependent variable, there are often different ways of measuring that variable (such as performance: financial, operational, etc…).
How can you determine what type(s) of the DV you should include? (3)
- Type of data source (primary vs secondary)
- Type of measure (relevant to the study?)
- Level of analysis (continent? country? city? company?)
External validity may also play a role–> Generalizability may be attractive to some researchers.
How do determine the type of data source?
- What do you want to measure?
- What kind of data to use:
- Primary
- Secondary
How do you identify the type of measure for a study? (i.e. performance)
Is the researcher measuring the relevant variable that correlates to the study?
Define Endogeneity and explain its 3 causes:
Endogeneity is present within a regression if the independent variable is correlated to the error term.
This can be due to:
1. Measurement error (in x) - there is a difference between the actual value and the studies measure
2. Omitted variable- A key independent variable is not included in the regression- leading to the influence of that variable effecting the error term.
3. Reverse causality- where the dependent variable may also influence the independent variable.
How do you determine the level of analysis for a study?
- This would also be dependent on the research question / focus of the study.
- Researchers can use different levels of analysis to have a more robust analysis.
- It is important to be clear about the level of analysis you used in the interpretations and conclusion.
Define selection bias
- This is a bias which occurs in the process of selecting samples or data which have not been adequately randomized.
- If the sample/data that is selected is not appropriately randomized, the sample/data may not be an accurate representation of the population.
- This inaccurate representation of the population may lead to invalid results and conclusions.
- If the sample/data that is selected is not appropriately randomized, the sample/data may not be an accurate representation of the population.
How can you (statistically) check if your sample accurately represents the population for Quantitative data (1) and Categorical data (1)
- Quantitative Data: t-test
- Using the t-test formula: (Sample mean - Population mean) / (s / root(n))
- (s / root(n)) = Standard Error.
- p-value: Measures the probability the sample results occurred by chance.
–> Therefore: If p-value is low, sample shows good representativeness
- Categorical Data: Chi-Squared test