Lecture 5 Flashcards

1
Q

In data sets rows capture

A

Obersvations (on e.g. consumers or firms)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Columns display

A

Variables. A variable can take on different values for different subjects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dummy variables

A

Variables that only take on the values 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

codebook

A

A list of all the codes used in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The variables in your data set need to match the unit of analysis in a study. Specifically:

A

The dependent variable is measured at the level of the unit of analysis. So are mediator vairables

Independent and moderator vairables are measured at the level of the unit of analysis or at a more aggregate level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Population

A

Entire group of people, firms, events, or things of interest for which you would like to make inferences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sample

A

A subset of the population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why use samples in the first place

A

Impossible to study the entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The sampling process consists of the following steps

A

1) define the population you are interested in
2) Determine the sampling frame. The sampling frame is the physical representation of the pupulation through which one can reach out to that population
3) Decide on the sampling design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to define the target population and choose the sampling frame

A

1) Define the target population: (Students at tisem, employees at philips etc.)

2) Determine the sampling frame

-Physical representation of the target population (Examples: students at Tisem –> Database students TiSEM)

3) Determine the sampling design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Coverage error

A

Sampling frame =/ population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Under coverage

A

Ture population members are excluded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Miss-coverage

A

Non population members are included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Solution to coverage error

A

If small, recognize but ignore
If large, redefine the population in terms of the sampling frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Probability sampling

A

Each element of the population has a known chance of being selected as a subject

Results generalizable to population

More time and resource intensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nonprobability sampling

A

The elements of the population do not have a known chance of being selected as a subject

Less time and resource intensive

Results not generalizable to population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Probability sampling: simple random sampling

A

Each population element has an equal chance of being chosen

highest generalizability but costly

18
Q

Systematic sampling

A

Select random starting point then pick every ith element (e.g every third starting from person 5)

Simplicity (adds a degree of system or process)

Low generalizability if there happens to be a systematic difference between every nth observation

19
Q

Stratified sampling (probability sampling)

A

Divide the population in meaningful (homogenous) groups, then apply SRS withing each group

All groups are adequately sampled, allowing for group comparisons

More time consuming and requires homogenous subgroups

20
Q

Cluster sampling

A

Divide the population in heterogeneous groups, randomly select a number of groups and selsct each member within these groups

Cluster population –> sample (clusters)

Geographic clusters

Subsets of naturally occuring clusters are typically more homogeneous than heterogeneous

21
Q

Classification of sampling designs

A

Sampling of sampling designs

1) probability:
simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling

2) Nonprobability
Convenience sampling
Quota sampling
Judgement sampling
Snowball sampling

22
Q

Convenience sampling (nonprobability sampling)

A

Select subjects who are conveniently available

Convenient (inexpensive and fast)

Lower generalizability

23
Q

Nonprobability sampling (quota sampling)

A

Fix quota for each subgroup

E.g. do you think dog owners should pay taxes for their pet

Household with dog (mainly no)
household with no dog (mainly yes)

When minority participation is critical (good)
Lower generalizability

24
Q

Nonprobability sampling: judgement sampling

A

Select subjects based on t hier knowledge/professional judgement

Convenient (inexpensive and fast) when a limited # of people has the info you need

Lower generalizability

25
Q

Nonprobability sampling (snowball sampling)

A

Do you know people who…

Good for rare characteristics (experts)

First participants strongly influence the sample

26
Q

Measurement or operationalization means

A

Turning abstract conceptual variables into measurable observations

27
Q

Nominal scales

A

A scale that allows you to classify your data into categories

E.g. states in the united states that are either democrat or repubican
You assign 1 to democrat and 2 to republican

28
Q

Ordinal scale

A

Ranked or ordered

Rank orders he categories in a meaningful way
More information than a nominal scale; here three is more than 2

E.g. Best to worst, first to last etc.

29
Q

Interval scale

A

Allows you to compare differences between values

Meaningful differences between values, but no natural zero point

E.g. IQ

Compared to ranked order; 1 –> 2 is not the same as 2 –> 3 when ranking chili peppers. Iq is standardized and comparable

30
Q

Ratio scales

A

Meaningful differences and ratios between values due to a natural zero point

Ratios are meaningful for this scale

E.g. Distance

Zero point is possible

31
Q

Measures of central tendency

A

Mean (average), median(central variable in an ordered group of variables) or mode( most common variable)

32
Q

Measures of dispersion

A

Range, standard deviation, variance or interquartile range

33
Q

Indiffeential statistics

A

Methods to draw conclusions (or to make inferences)

E.g. Mean difference tests

34
Q

Choosing between descriptive statistics

A

Nominal scale
Measure of central tendency: mode
Measure of dispersion —

Ordinal scale
Measure of central tendency: median
Measure of dispersion (interquartile range)

Interval scale
Measure of central tendency: mean
Measure of dispersion (standard deviation, variance)

Ratio
Measure of central tendency: mean
Measure of dispersion (standard deviation, variance)

35
Q

Choosing between inferential statistics:

A

Check slides

36
Q

When there are multiple IVs in a study, with different measurement scales:

A

The highest scale determines the statistical technique

37
Q

Choosing inferential statistics: T-test or ANOVA

A

T-test: compaares two means (two levels of an IV)

Anova: can compare more than two levels

Choice, as such:

Depends on the number of IVs
Depends on the number of levels (conditions or groups) of the IV

38
Q

Choosing inferential statistics: rating scales (Likert scale)

A

strongly disagree, disagree, undecided, agree and strongly agree

39
Q

Choosing inferential statistics: rating scales (semantic differential)

A

Organized _ _ _ _ _ _ _ Unorganized
Cold _ _ _ _ _ _ _ _ Warm
Modern _ _ _ _ _ _ _ _ old fashioned

Treated as interval scales

40
Q

From a statistical point of view, a moderator is

A

Also considered an IV.

41
Q

To test the moderating effect of M on the relationship between X and Y you have to include three IVs in your regression model:

A

The main effect of X

The interaction effect between X and M (=X*M) to capture the moderating effect of M on the relationship between X and Y

The main effect of M (to statistically control for the impact of M on Y; if you would not include M, the effect of X*M would not be correctly estimated)