Stata Commands Flashcards
what command lets you observe the data
browse
which command shows what variables we have
desc
what command counts the number of observations satisfying a condition, lets say for observations that are not foreign and also have miles per gallon less than 25
count if foreign != 1 & mpg < 25
what command do you use to summarize the data
sum
how to tabulate a variable to satisfy the condition that the variable is foreign (a binary variable where foreign = 1 when true)
tabulate variablename if foreign == 1
how to generate a new variable
gen newvarname = whateveryouneed
what is the command for observing correlations
corr var1 var2 var3… (as many var as u want)
what is the command for observing covariances
corr var1 var2, cov
what is the command to do a t test on a variable’s mean being 0 as the null hypothesis
ttest varname == 0
the RHS will be whatever your null hypothesis is
how to do a difference in means t test for h0 being 0
use willingness to spend as the variable and employed or unemployed as the binary variable
basically do a t test to test if the difference in willingness to spend is 0 across people who are employed or unemployed
ttest wts, by (employed)
how to run an OLS regression fo only 1 regressor
eg how does willingness to spend change with income
reg wts income
i.e. reg outcome independentvariable
how to create predicted values and then see them
use the regression of income on willingness to spend as example
(directly after a reg command)
predict nameyourvariable
br wts predictvarnameabove income
i.e.
predict nameyourvariable
br outcome varfromabove independentvar
how to create predicted residuals
(directly after reg command)
predict residual, resid
n.b. the word residual is just what u name it not the command
how to create a scatter graph
twoway scatter outcome independentvar
how to plot an OLS reglession line
twoway lfit outcome independentvar
how to make a scatter graph with a regression line through it
twoway (lfit outcome independentvar) (scatter outcome independentvar)
How to name your graph
xtitle(“name”) add this to the end of the graphing command
how to generate a regression table with standard errors to 4dp and beta to 4dp
(reg wts income) this is your regression preceding the command
then type:
outreg2 using reg_output.doc, sdec(4) bdec(4)
you can call the .doc file whatever u want
how to do a hypothesis test that a linear combination of variables has null hypothesis = 0
for example, 2*income - female + age = 0
which means 2x coefficient of income - coef of female +age
lincom 2*income -female +age
stata by default assumes null hypothesis to be 0 so dont specify
must be written after the reg command
how to perform multiple hypothesis test (multiple hypotheses)
as example test income coef = 0.005, female coef = -2, age = 0
first do the reg command (reg wts income female age)
then do:
test (income = 0.005) (female = -2) (age = 0)
these are all contingent on each other so if one fails the hypothesis test is rejected
how to perform multiple hypothesis test that the coefficient for income and female are both 0
reg command first then:
test income female
dont need to define as =0 since stata does this auto
how to standardise a variable
sum variablename (so you can see the sd)
gen variablename_standard = variablename/r(sd)
how to full standardise a variable
egen varname_full_standard = std(varname)
how to split up a variable that has parts to it e.g. a date like november 13 2022
split varname
how to rename the split up variable eg the dates, lets assume split into 3 parts
rename date_sold1 month
rename date_sold2 day
rename date_sold3 year
how to replace a comma in a split up variable where there is a comma e.g. 13, in a date
replace day = regexr(day, “,”, “ “)
replace varname = regexr(varname…..)
how to convert something stata thinks is a word into a number
destring varname, replace
how to convert months into numbers that stata recognises
replace month = “1” if month == “January”
general form:
replace (nameofvariable) = “number associated” if nameofvar == “Relevantmonth”
then:
destring month, replace
how to run a regression over only a range of variables
as example use year of house sold between 2016 and 2020
reg price bedrooms bathrooms, robust, if inrange(year, 2016, 2020)
general form:
reg outcome iv iv, robust, if inrange(varname, lower bound, upper bound)
how to use the inrange function more generally
any command… then add if inrange(variable,lowerbound,upperbound)
how to restrict a command to only be applied to data in specific areas e.g. specific suburbs
add if inlist(varname, “suburb1”, “suburb2”) at end
this will only do the command for data in those two suburbs
how to run a regression but one regressor make it a binary variable
reg outcome iv1 i.iv2, robust
how to run a regression where the outcome is in log
gen ln_outcome = ln(outcome)
reg ln_outcome iv1 iv2, robust
how to make a table which only shows a variable in the 25th percentile
use example of price and address i.e. show all addresses in the 25th percentile of price
sum varofpercentile, detail
tab varofinterest if varofpercentile == r(p25)
sum price, detail
tab address if price == r(p25)
How to tell stata you are using panel data
egen panel_id = group(varname varname)
xtset panel_id timeidvariable
how to create a binary variable for being treated (DiD regressions)
gen treated = inlist(varname, “name1”, “name2”)
e.g. gen treated = inlist(mktnam, “Lviv”, “Rvine”)
ukranian cities
how to create a binary variable for being after treatment (DiD)
gen after = timeidvariable >=timeperiod
e.g.
gen after = days_since_2014 >= 730
how to create an interaction term for DiD
gen interaction = treated*after
assuming u called your two binary variables treated and after
how to run a DiD regression for beetroots in ukraine
reg price treated after interaction, robust, if varname == “beetroots”
How to create a graph for trends prior to treatment in DiD of control and treated
twoway (lfit outcome timeidvariable if treated ==1 & after ==0 &varname == “beetroots/whatever”) (lfit outcome timeidvariable if treated ==0 & after ==0 & varname == “Beetroots/whatever”)
how do you run a regression to check for parallel trends pre treatment in DiD
gen interact_timeidvar = treated*timeidvar
reg outcome timeidvar treated interact_timeidvar, robust, if after ==0 & varname == “beetroots”
how to run a fixed effects regression
xtreg outcome interactionvarname, fe robust, if varname == “beetroots”
how to run a fixed effects regression also controlling for time
xtreg outcome interactionvarname i.timeidvariable, fe robust, if varname == “beetroots”
how to run a first difference regression
reg diff_outcome diff_fake_regressor, robust, if varname == “beetroots”
how to run a regression discontinuity (what are the three variables to generate only, not reg function) prob not assessed he will tell us the fake ones
gen fake_over = fake_regressor -1
gen over_one = fake_regressor > 1
gen interact_RD = over_one*Fake_over
what is the code for running the regression discontinuity
reg outcome fake_over over_one interact_RD, robust, if varname == “beetroots”
how to run a 2sls regression assuming we have the variables already z1, z2 and control
ivregress 2sls price (fake_regressor = z1 z2) control, robust, if varname == “Beetroots”