Midterm Flashcards

Question

What are double-blind experiments?

Answer 1

An experiment where neither the scientists nor the study participants know who is receiving the treatment and who is part of the control. Often used to prevent bias in the experiment.

Answer 2

when a "fake" treatment produces a result that cannot be attributed to the placebo itself and is therefore caused by the patient's belief in the "treatment" - people think they receive treatment and affect the result (ex. the subject says pills work to cure illness even though they received just a sugar pill that did nothing.)

Answer 3

the phenomenon where study subjects behave differently because they know they are being observed by researchers.

Answer 4

Studies where the treatment is naturally assigned. Scientists don't DO anything, they just observe what is happening in nature.

Answer 5

Ethical and logistical reasons Ex. Ethical: smoking and lung cancer, it would be unethical to force a group of humans to smoke just to observe if they got lung cancer Ex, Logistical: wars occur naturally, you cannot feasibly make countries go to war just to see what happens in the UN assembly. (This is ethical too)

Answer 6

has better external validity for generalization beyond the experiment than RCT experiments. This is because the events occur naturally and are not confined to the extreme specifics of lab work. Strong external validity is good because it means the findings can be applied very broadly.

Answer 7

They have weaker internal validity. Because: pre-treatment variables may differ between treatment and control groups * confounding bias may exist due to these differences * selection bias from self-selection into treatment may occur * statistical control is needed (subclassification, variables) * unobserved confounding poses a threat

Answer 8

The extent to which the conclusions of a study can be generalized beyond the particular study

Answer 9

the extent to which causal assumptions are satisfied in the study The extent to which the effect of the treatment in a study can be attributed solely to the treatment itself and not other confounders. This is the main advantage of Randomized controlled Trials

Answer 10

1.) Cross-section comparison 2.) Within-unit effects (AKA Before and After comparison) 3.) Differences-in-differences No strategy is best!

Answer 11

When you compare treated units with control units after the treatment. * Assumption: the treated and control units are comparable * Possible unit-specific confounders may exits causing you to need statistical controls * may also have selection bias Ex.) observe New Jersey and Pennsylvania unemployment after a minimum wage increase in New Jersey. Changes in New Jersey unemployment can be attributed to the minimum wage increase if there are no changes in Pennsylvania unemployment

Answer 12

When you compare only one unit before and after treatment. the advantage here is that differences between stats do not introduce unit-based confounders. Problem is that this introduces time-varying confounders. Other changes overtime, aside from the identified treatment, may impact the results. Ex. Compare just New Jersey unemployment before a minimum wage increase with the unemployment after the increase.

Answer 13

Using what happened in the scenario without the treatment to predict what would have happened had the treatment not been implemented. Uses the parallel trends assumption. -the assumption that in the absence of treatment, the difference between the 'treatment' and 'control' group would be constant over time Fixes both uniti-specific confounders and time-varying confounders. Ex. using what happened in PA unemployment to determine what would have happened in NJ had they not introduced a higher minimum wage.

Answer 14

Is used to ensure representativeness. Is when every unit in the population has a known non-zero probability of being selected to participate in the study

Answer 15

Is used to properly randomize the sample. The bigger the sample, the more accurate the results. In simple random sampling, every unit has an equal selection probability.

Answer 16

bias is the systematic faults in the sampling system. If it is not systematic then it is just white noise and not bias 1.) frame bias 2.) selection bias 3.) Unit non-response bias 4.) Item non-response bias 5.) response bias

Answer 17

When the general population frame is non-representative

Answer 18

when the sample population is systematically not randomized

Answer 19

When people in the sample or frame population systematically do not respond/participate in the survey

Answer 20

When participants in the survey systematically do not respond to a specific item on the survey

Answer 21

When respondents lie on the survey or do not tell you the real response ex.) social desirability bias, people tell you the answer they think is the most socially correct, not their real answer.

Answer 22

List experiments are when the control group of respondents is given a list of 3 items and are asked how many of the 3 they support (or another indicator) and the treatment group is given the same list but with an extra 4th item. If the average number of "supported" items reported is increased in the treatment group compared to the control group, this indicates "support" for the 4th variable in the list. useful when the questions are sensitive or there is social pressure. Ex.) to determine if afghanis supported the Taliban a control group was given a list of 3 organizations to support, the average response was calculated. A treatment group was given the same question and list with he addition of the taliban. The increase in average supported groups was 2 in the control and 3 in the treatment. This indicates they do support the taliban.

Answer 23

mean and media

Answer 24

Range Quartiles Interquartile range

Answer 25

measures on average how far away the data points are from their mean. It is purely descriptive.

Answer 26

You square to eliminate the impact of being on the opposite side of the mean, it negates negative numbers and gives you a flat distance from the mean. Squaring prevents the numbers from canceling each other out so you don't end up with 0. You need to square root because once the numbers are squared they are no longer in the same units of the original data, the square root brings them back to the same unit.

Answer 27

In a study, observed values will fall closer to the sample mean than the true population mean. this underestimates the true population means. Subtracting 1 from the sample size makes the result more proportional to the population standard deviation. n-1 accounts for the difference in sample mean and population mean when calculating SD for the true population.

Answer 28

In a study, observed values will fall closer to the sample mean than the true population mean. this underestimates the true population means. Subtracting 1 from the sample size makes the result more proportional to the population standard deviation. n-1 accounts for the difference in sample mean and population mean when calculating SD for the true population.

Answer 29

shows how much the data points vary overall. It is the total varying amount, not the average. It is helpful for comparing samples. the total amount that the results vary

Answer 30

probably not, high variance means there is more uncertainty around the mean.

Answer 31

the unit that represents the entity you are studying ex. country, individual, household, congressional district, state

Answer 32

what uniquely identifies the observation being studied. - is a characteristic of the unit of analysis ex. country-year, state-month, individual wave

Answer 33

an empirical measure of a concept/characteristic. key rule: variables must vary across observations

Answer 34

1: Quantitative/Interval/Continuous- observations can take on an infinite number of numerical values between any two values (decimals). 2: Categorical — observations belong to one of a discrete set of categories & we assign a number to each category

Answer 35

1.) Nominal — categories are named (independent). 2.) Ordinal — categories are ranked 3.) Dichotomous variables — two values (e.g., yes/no)

Answer 36

Age is used as continuous, but it is written to look ordinal and is often observed as non-continuous

Answer 37

You can collapse continuous variables into ordinal (or nominal) variables. this does not work in the reverse ex. you can turn incomes into categories of incomes Log Transformation for continuous variables

Answer 38

what values a variable takes and how often it takes on these values

Answer 39

Categorical — frequency tables, barplots, Continuous — mean/median, SD, histogram, density plots, boxplots,

Answer 40

symmetric- looks the same on both sides, a normal bell curve distribution skewed- the data bunches on one side of the curve and creates a tail on the other.

Answer 41

right skew- the tail is on the right left skew- the tail is on the left

Answer 42

unimodal: one mode/one hump in a distribution bimodal: two modes/two humps in a distribution

Answer 43

symmetry, skewness, amount of modes, outliers and deviations from shape.

Answer 44

a plot with dots that shows a direct graphical comparison of two variables

Answer 45

When x is larger than its mean, y is likely to be larger than its mean

Answer 46

When x is larger than its mean, y is unlikely to be larger than its mean

Answer 47

data cluster tightly around a line indicates the two variables have a strong relationship

Answer 48

1.) Correlation is between −1 and 1 2.) Order does not matter: cor(x, y) = cor(y, x) 3.) Not affected by changes of scale 4.) Correlation measures linear association

Answer 49

the score given to each observation of a variable which measures the number of standard deviations an observation is above or below the mean It is a measure of deviation from the mean It is not sensitive to how the variable is scaled and or shifted.

Answer 50

for each iteration subtract the mean of the variable from the iteration value and then divide the result by the standard deviation of the variable. z score of Xi = (Xi-x̄) / SD of X

Answer 51

A scatterplot plots the relationship between two variables. reading for the results A QQ plot compares the frequencies of two distributions, use to understand if distributions are similar

Answer 52

A scatterplot plots the relationship between two variables A QQ plot compares two distributions

Answer 53

making meaningful groups in the data

Answer 54

goal: split the data into similar groups where each group is associated with its centroid, which is equal to the within-group mean. steps: 1.) choose the initial center of K amount of clusters 2.) given the identified centroid assign each observation to the centroid which is closest to that observation 3.) recompute a new centroid to the average of the points in the cluster 4.) reassign the observations to the clusters of centroids closest to them 5.) repeat steps 3 and 4 until the observations can no longer be rearranged.

Answer 55

when you find the average within a defined period of time. As time changes the average slowly changes as each unit of time is individually replaced. Ex. if you want the average over a 7 day period on the 8th day you drop day one. You keep 6 of the 7 days and add one new day's data. smoothness is determined by the window size. A smaller window creates less smooth lines ( each "day" has a bigger impact in a small window)

Answer 56

You square to eliminate the impact of being on the opposite side of the mean, it negates negative numbers and gives you a flat distance from the mean. Squaring prevents the numbers from canceling each other out so you don't end up with 0. you square root the standard deviation because otherwise, you could end up with a zero because the numbers cancel each other out. You need to square root because once the numbers are squared they are no longer in the same units of the original data, the square root brings them back to the same unit.

Answer 57

yes because it uses a best-fit line to minimize the distance from all points to the line and if one or more points are far out of the pattern, the slope of the line can change considerably Ex. palm beach vote share

Answer 58

1.) OLS regressions are linear -uses line of best fit, but it may not be appropriate -not resistant to the influence of outliers -slope is constant -true relationship may not be linear 2.) OLS allows for unreasonable predictions -only want to generate reasonable predictions -Evaluating predictions is key to assessing the relationship between variables & the strength of the model 3.) OLS correlations do not necessarily indicate causation -correlation can be driven by unobserved variables 4.) OLS regressions are versatile and robust -Models the relationship between IV and DV and allows for making predictions -allows for including additional variables in the model -Continuous DVs & continuous and/or dichotomous IVs

Answer 59

Curvilinear (i.e., quadratic) — a sign/slope shift (or reversal) — i.e., the effect of X on Y changes direction at different levels of X Diminishing returns — slope stays in same direction, but the effect of a one-unit change in X decreases (or increases) as values of X increase

Answer 60

Worried about omitted variable bias: some underlying (unobserved) factor (X2) is driving relationship between X1 and Y important to ‘control’ for other variables that we think lie in the causal path. When we control we can determine how much effect each X is having on Y -Venn diagram, find the net effect of each by removing overlapping areas.

Answer 61

measure the amount of variance in the error term

Answer 62

measures the total variation of y based on the square distance from the mean. - the deviation of data points away from the mean value

Midterm Flashcards

(86 cards)