Final exam Flashcards
What is monitoring?
Monitoring is a continuous process that tracks what is happening within a program and uses the data collected to inform program implementation and day-to-day management and decisions.
What 3 types of questions can answer evaluation?
- Descriptive questions. The evaluation seeks to determine what is taking place and describes processes, conditions, organizational relationships, and stakeholder views.
- Normative questions. The evaluation compares what is taking place to what should be taking place; it assesses activities and whether or not targets are accomplished. Normative questions can apply to inputs, activities, and outputs.
- Cause-and-effect questions. The evaluation examines outcomes and tries to assess what difference the intervention makes in outcomes. (Impact evaliation)
What is the evaluation?
Evaluations are periodic, objective assessments of a planned, ongoing, or completed project, program, or policy.
What are the 5 criteria for the program to be evaluated?
- Innovative. It is testing a new, promising approach.
- Replicable. The program can be scaled up or can be applied in a different setting.
- Strategically relevant. The program is a flagship initiative; requires substantial resources; covers, or could be expanded to cover, a large number of people; or could generate substantial savings.
- Untested. Little is known about the effectiveness of the program, globally or in a particular context.
- Influential. The results will be used to inform key policy decisions.
What is theory of change?
A theory of change is a description of how an intervention is supposed to deliver the desired results. It describes the causal logic of how and why a particular project, program, or policy will reach its intended outcomes.
what is a causal effect?
a causal effect is a change in some feature of the world that would result from a change to some other feature of the world.
the difference in the potential outcomes for some unit under two different treatment statuses
What is counterfactual comparisons?
It’s an experiment where at least one of the worlds we are comparing isn’t the real, factual world - it’s in our imaginations
What is treatment variable?
It’s X. used to describe any intervention in the world
What is outcome or dependent variable?
It’s Y
What are the fundamental problems of the casual inference?
- Individual cause effect can never be directly observed
- At any given time, we only observe any given unit in one state of affair (At a given time, a child either participates in deworming
program or not) - We can’t observe the difference: Yi(1) − Yi(0)
How do we make progress on answering casual questions if effects are fundamentally unobservable?
- Conduct a randomize trial, assigning some people for treatment and others no, and then comparing the average outcomes for people in the untreated group to the average outcomes for people in the treated group
What is potential outcome?
the potential outcome for some unit under some treatment status is the outcome that unit would experience under that treatment status
What is causal relationships?
Refers to the cause-and-effect connection between the treatment variable (X) and outcome variable (Y)
What are the 2 different conditions based on whether the individual receives the treatment?
- Treatment is the condition with the treatment (Xi=1)
- Control is the condition without the treatment (Xi=0)
What is the causal effect of X on Y?
It’s the change in the outcome variable Y caused by a change in the treatment variable X
When interpreting the sign of causal effects, we should interpret:
- a positive effect as the treatment causing an increase in the outcome variable
- a negative effect as the treatment causing a decrease in the outcome variable
- an effect of zero as the treatment causing no change in the outcome variable
What is the average treatment effect (average causal effect)?
It’s an average of the individual casual effects of X and Y across a group of individuals. It’s the average change in Y caused by a change in X for a group of individuals.
What is the randomized experiment?
it’s a type of study design in which treatment assignment is randomized
what is the purpose of randomize experiment?
by randomly assigning treatment, we ensure that treatment and control groups are, on average, identical ro each other in all observed and unobserved pre-treatment characteristics
What is the difference-in-means estimator?
it produces a valid estimate of the average treatment effect when the treatment and control groups are comparable with respect to all the variables that night affect the outcome other than the treatment variable itself
average_effect=average_treatment - average_control
What is experimental data?
data collected from a randomized experiment
what is observational data?
data collected about naturally occurring events. Studies that use observational data are called observational studies
How to set working directory to the folder containing the dataset using in R?
setwd(“way_to_the_folder”)
How to read a dataset in R?
read.csv(“name_of_dataset”)
what is head()?
shows first 5 observations
what is == in R?
the relational operator that evaluates whether two values are equal to each other. The output is a logical value: TRUE or FALSE
what is the function ifelse() stands for in R?
creates the contents of a new variable based on the values of an existing one: “if logical test is true, return this, else return that”.
Теоритично це дає можливість щось замінити на щось. Накриклад, якщо у нас текстовий показник, ми його можемо замінити на цифровий.
ifelse(data$var==”yes”, 1,0)
How to store values as a new variable?
data$var<-ifelse(data$var==”yes”, 1,0)
ifelse() is just an example
what can we do by using square breaks [] in R?
[] is the operator used to extract a selection of observations from a variable. To its left, we specify the variable we want to subset. Inside the [] we specify the creation of selection.
Example: data$var1[data$variable2==1]
what is the unit of measurement of the difference-in-means estimator if outcome variable is non-binary
in the same unit of measurement as the outcome variable
what is the unit of measurement of the difference-in-means estimator if outcome variable is binary
in percentage points (after multiplying the result by 100)
When making predictions, we distinguish between 2 types of variables:
- the predictor (X): variable that we use as the basis for our predictors
- the outcome variable (Y): variable that we are trying to predict based on the values of the predictor
what is predicted outcomes (Yˆ)?
the values of Y we predict based on (i) the fitted model that summarizes the relationship between X and Y in a dataset where we observe both X and Y for each observation and (ii) the observed values of X
How can we make a prediction?
- fit a model: (i) we observe both X and Y (ii) we summarize the relationship between the average Y and X with a model
- make predictions: (i) we observe X but not Y (ii) we compute Yˆ by plugging the observed values of X into the fitted model
What is the prediction error (residual)?
it measures how far our prediction is from the observed value; it’s the difference between the observed outcome and predicted outcome
what is the formula for linear regression model?
Yi = α + βXi + єi
Yi - is the outcome for observation i
α (the Greek letter alpha) - is the intercept coefficient
β (the Greek letter beta) - is the slope coefficient
Xi - is the value of the predictor (or independent variable) for
observation i
єi (pronounced epsilon sub i) - is the error for observation i
what are the 2 coefficients that define any line need to be estimated for fitting line?
the intercept and the slope
what is the intercept?
Yˆ when X=0. Increasing and decreasing the intercept moves the line up and down
What is the slope?
a line specifies the angle or stepless of the line. Associated with on-unit increase in X
What is the least squares method (OLS)?
indetifies the line that minimizes the ‘sum of the squared residuals”
how to find the total number of observation in the dataset? (R fuction)
dim(dataset_name)
how to create the scatter plot if two variables?
plot(data$x_var, data$y_var)
What is the correlation coefficient?
it ranges from -1 to 1 and summarizes the direction and strength of the linear association between two variables
What is R function for correlation coefficient?
cor(data$var_1, data$var_2)
what does the range from -1 to 1 of the correlation coefficient means?
- If the correlation coefficient is positive (closer to 1), it means the two variables tend to increase or decrease together. When one goes up, the other tends to go up, and when one goes down, the other tends to go down.
- If the correlation coefficient is negative (closer to -1), it means there’s an inverse relationship. When one variable goes up, the other tends to go down, and vice versa.
- A correlation coefficient close to 0 suggests a weak or no linear relationship.
how to estimate the coefficient of the linear model using the least squares model in R?
lm(data$y_var ~ data$x_var)
How to fit the line to the scatter plot by using the R function?
abline()
what is log() fuction in R?
it computes the natural logarithm of the argument specified inside the parentheses.
this a kind of a transformation that will make the variable of interest more normally distributed and in turn improve the fit of the line to the data
How to create histogram in R?
hist(data$variable)
what is fitted log-log linear model?
it’s a fitted linear model in which both Y and X have been log-transformed. In this model, we interpret ß as the predicted percentage change in the outcome associated with an increase in the predictor of 1 percent
what is coefficient of determination (Rˆ2)?
ranges from 0 to 1 and measures the proportion of the variation of the outcome variable explained by the model. The higher the Rˆ2, the better the model fits the data.
In terms of simple linear model how can we define Rˆ2?
it equivalent to the correlation between X and Y squared:
Rˆ2=cor(X,Y)ˆ2
based on this it becomes clear that the higher the correlation between X and Y (in absolute terms) the better the model fits the data.
what is the estimate decomposition?
Estimate=Estimand+Bias+Noise
what is the estimate?
the number we get as a result of our analysis
what is the estimand?
is the true quantity of interest in the population that we are trying to learn about. Out hope is that our estimate closely approximates our estimand
What is bias?
errors that occur for systematic reasons
what is noise
idiosyncratic error (random mistake) that occur because of chance
what is sampling variation?
refers to the natural differences or fluctuations that can occur when you take different samples from the same population
Randomized control trials reduce bias through
- Randomization of treatment status
- careful selection of units and measurements
what is the standard error?
tells you how much you can expect that sample mean to vary from the actual population mean.
- If the standard error is large, then the estimates would be very spread out and the estimator is relatively imprecise
- if the standard error is small then the estimates would be very close together and the estimator is relatively precise
what is hypothesis testing?
a strategy for assessing the probability of getting a result as extreme as yours under the assumption that the null hypotheses is true.
What is null hypothesis?
This is a statement of no effect or no difference. It represents the status quo or the idea that there is no change or no relationship in the population.
what is p-value
the probability that we observe a value of the test statistic at least as extreme as the one we actually observed if the (sharp) null hypothesis is true
A smaller p-value provides stronger evidence against the null
what is the confidence interval?
the range od values that is likely to include the true value of the parameter (estimand)
what is a significance level?
determines the rejection threshold of the test and characterizes the probability of false rejection of the null hypothesis
what is summary()$coef for?
provides a table with the following statistics related to a fitted linear model: estimated regression coefficients, standard errors, test statistics and two-sided p values
what is the confidential interval?
EST+/-1.96*SE
What is a cofounding variable or cofounder?
a variable that affects both the likelihood of receiving the treatment and the outcome
In the presence of confounding variables:
treatment and control groups are not comparable, correlation don’t necessarily imply causation and the DiM estimator don’t provide a valid estimate of the average treatment effect
why are there no cofounding variables in randomized experiments?
by randomly assigning treatment we break the link between any potential confounders and the treatment variables, thereby eliminating all potential confounding variables