Lecture 5 Flashcards by Hendrik Diepers

In data sets rows capture

Obersvations (on e.g. consumers or firms)

How well did you know this?

Not at all

Perfectly

Columns display

Variables. A variable can take on different values for different subjects

How well did you know this?

Not at all

Perfectly

Dummy variables

Variables that only take on the values 0 or 1

How well did you know this?

Not at all

Perfectly

codebook

A list of all the codes used in a dataset

How well did you know this?

Not at all

Perfectly

The variables in your data set need to match the unit of analysis in a study. Specifically:

The dependent variable is measured at the level of the unit of analysis. So are mediator vairables

Independent and moderator vairables are measured at the level of the unit of analysis or at a more aggregate level

How well did you know this?

Not at all

Perfectly

Population

Entire group of people, firms, events, or things of interest for which you would like to make inferences

How well did you know this?

Not at all

Perfectly

Sample

A subset of the population of interest

How well did you know this?

Not at all

Perfectly

Why use samples in the first place

Impossible to study the entire population

How well did you know this?

Not at all

Perfectly

The sampling process consists of the following steps

1) define the population you are interested in
2) Determine the sampling frame. The sampling frame is the physical representation of the pupulation through which one can reach out to that population
3) Decide on the sampling design

How well did you know this?

Not at all

Perfectly

How to define the target population and choose the sampling frame

1) Define the target population: (Students at tisem, employees at philips etc.)

2) Determine the sampling frame

-Physical representation of the target population (Examples: students at Tisem –> Database students TiSEM)

3) Determine the sampling design

How well did you know this?

Not at all

Perfectly

Coverage error

Sampling frame =/ population

How well did you know this?

Not at all

Perfectly

Under coverage

Ture population members are excluded

How well did you know this?

Not at all

Perfectly

Miss-coverage

Non population members are included

How well did you know this?

Not at all

Perfectly

Solution to coverage error

If small, recognize but ignore
If large, redefine the population in terms of the sampling frame

How well did you know this?

Not at all

Perfectly

Probability sampling

Each element of the population has a known chance of being selected as a subject

Results generalizable to population

More time and resource intensive

How well did you know this?

Not at all

Perfectly

Nonprobability sampling

The elements of the population do not have a known chance of being selected as a subject

Less time and resource intensive

Results not generalizable to population

How well did you know this?

Not at all

Perfectly

Probability sampling: simple random sampling

Study These Flashcards

Each population element has an equal chance of being chosen

highest generalizability but costly

Systematic sampling

Study These Flashcards

Select random starting point then pick every ith element (e.g every third starting from person 5)

Simplicity (adds a degree of system or process)

Low generalizability if there happens to be a systematic difference between every nth observation

Stratified sampling (probability sampling)

Study These Flashcards

Divide the population in meaningful (homogenous) groups, then apply SRS withing each group

All groups are adequately sampled, allowing for group comparisons

More time consuming and requires homogenous subgroups

Cluster sampling

Study These Flashcards

Divide the population in heterogeneous groups, randomly select a number of groups and selsct each member within these groups

Cluster population –> sample (clusters)

Geographic clusters

Subsets of naturally occuring clusters are typically more homogeneous than heterogeneous

Classification of sampling designs

Study These Flashcards

Sampling of sampling designs

1) probability:
simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling

2) Nonprobability
Convenience sampling
Quota sampling
Judgement sampling
Snowball sampling

Convenience sampling (nonprobability sampling)

Study These Flashcards

Select subjects who are conveniently available

Convenient (inexpensive and fast)

Lower generalizability

Nonprobability sampling (quota sampling)

Study These Flashcards

Fix quota for each subgroup

E.g. do you think dog owners should pay taxes for their pet

Household with dog (mainly no)
household with no dog (mainly yes)

When minority participation is critical (good)
Lower generalizability

Nonprobability sampling: judgement sampling

Study These Flashcards

Select subjects based on t hier knowledge/professional judgement

Convenient (inexpensive and fast) when a limited # of people has the info you need

Lower generalizability

Nonprobability sampling (snowball sampling)

Do you know people who... Good for rare characteristics (experts) First participants strongly influence the sample

Measurement or operationalization means

Turning abstract conceptual variables into measurable observations

Nominal scales

A scale that allows you to classify your data into categories E.g. states in the united states that are either democrat or repubican You assign 1 to democrat and 2 to republican

Ordinal scale

Ranked or ordered Rank orders he categories in a meaningful way More information than a nominal scale; here three is more than 2 E.g. Best to worst, first to last etc.

Interval scale

Allows you to compare differences between values Meaningful differences between values, but no natural zero point E.g. IQ Compared to ranked order; 1 --> 2 is not the same as 2 --> 3 when ranking chili peppers. Iq is standardized and comparable

Ratio scales

Meaningful differences and ratios between values due to a natural zero point Ratios are meaningful for this scale E.g. Distance Zero point is possible

Measures of central tendency

Mean (average), median(central variable in an ordered group of variables) or mode( most common variable)

Measures of dispersion

Range, standard deviation, variance or interquartile range

Indiffeential statistics

Methods to draw conclusions (or to make inferences) E.g. Mean difference tests

Choosing between descriptive statistics

Nominal scale Measure of central tendency: mode Measure of dispersion --- Ordinal scale Measure of central tendency: median Measure of dispersion (interquartile range) Interval scale Measure of central tendency: mean Measure of dispersion (standard deviation, variance) Ratio Measure of central tendency: mean Measure of dispersion (standard deviation, variance)

Choosing between inferential statistics:

Check slides

When there are multiple IVs in a study, with different measurement scales:

The highest scale determines the statistical technique

Choosing inferential statistics: T-test or ANOVA

T-test: compaares two means (two levels of an IV) Anova: can compare more than two levels Choice, as such: Depends on the number of IVs Depends on the number of levels (conditions or groups) of the IV

Choosing inferential statistics: rating scales (Likert scale)

strongly disagree, disagree, undecided, agree and strongly agree

Choosing inferential statistics: rating scales (semantic differential)

Organized _ _ _ _ _ _ _ Unorganized Cold _ _ _ _ _ _ _ _ Warm Modern _ _ _ _ _ _ _ _ old fashioned Treated as interval scales

From a statistical point of view, a moderator is

Also considered an IV.

To test the moderating effect of M on the relationship between X and Y you have to include three IVs in your regression model:

The main effect of X The interaction effect between X and M (=X*M) to capture the moderating effect of M on the relationship between X and Y The main effect of M (to statistically control for the impact of M on Y; if you would not include M, the effect of X*M would not be correctly estimated)

Lecture 5 Flashcards

(41 cards)