Stata Commands Flashcards

1
Q

what command lets you observe the data

A

browse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

which command shows what variables we have

A

desc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what command counts the number of observations satisfying a condition, lets say for observations that are not foreign and also have miles per gallon less than 25

A

count if foreign != 1 & mpg < 25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what command do you use to summarize the data

A

sum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to tabulate a variable to satisfy the condition that the variable is foreign (a binary variable where foreign = 1 when true)

A

tabulate variablename if foreign == 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to generate a new variable

A

gen newvarname = whateveryouneed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the command for observing correlations

A

corr var1 var2 var3… (as many var as u want)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the command for observing covariances

A

corr var1 var2, cov

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the command to do a t test on a variable’s mean being 0 as the null hypothesis

A

ttest varname == 0

the RHS will be whatever your null hypothesis is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to do a difference in means t test for h0 being 0
use willingness to spend as the variable and employed or unemployed as the binary variable
basically do a t test to test if the difference in willingness to spend is 0 across people who are employed or unemployed

A

ttest wts, by (employed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how to run an OLS regression fo only 1 regressor
eg how does willingness to spend change with income

A

reg wts income

i.e. reg outcome independentvariable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how to create predicted values and then see them

use the regression of income on willingness to spend as example

A

(directly after a reg command)
predict nameyourvariable
br wts predictvarnameabove income

i.e.
predict nameyourvariable
br outcome varfromabove independentvar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to create predicted residuals

A

(directly after reg command)
predict residual, resid

n.b. the word residual is just what u name it not the command

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to create a scatter graph

A

twoway scatter outcome independentvar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to plot an OLS reglession line

A

twoway lfit outcome independentvar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how to make a scatter graph with a regression line through it

A

twoway (lfit outcome independentvar) (scatter outcome independentvar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to name your graph

A

xtitle(“name”) add this to the end of the graphing command

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how to generate a regression table with standard errors to 4dp and beta to 4dp

A

(reg wts income) this is your regression preceding the command
then type:
outreg2 using reg_output.doc, sdec(4) bdec(4)

you can call the .doc file whatever u want

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how to do a hypothesis test that a linear combination of variables has null hypothesis = 0

for example, 2*income - female + age = 0
which means 2x coefficient of income - coef of female +age

A

lincom 2*income -female +age

stata by default assumes null hypothesis to be 0 so dont specify

must be written after the reg command

20
Q

how to perform multiple hypothesis test (multiple hypotheses)

as example test income coef = 0.005, female coef = -2, age = 0

A

first do the reg command (reg wts income female age)
then do:
test (income = 0.005) (female = -2) (age = 0)

these are all contingent on each other so if one fails the hypothesis test is rejected

21
Q

how to perform multiple hypothesis test that the coefficient for income and female are both 0

A

reg command first then:
test income female

dont need to define as =0 since stata does this auto

22
Q

how to standardise a variable

A

sum variablename (so you can see the sd)
gen variablename_standard = variablename/r(sd)

23
Q

how to full standardise a variable

A

egen varname_full_standard = std(varname)

24
Q

how to split up a variable that has parts to it e.g. a date like november 13 2022

A

split varname

25
how to rename the split up variable eg the dates, lets assume split into 3 parts
rename date_sold1 month rename date_sold2 day rename date_sold3 year
26
how to replace a comma in a split up variable where there is a comma e.g. 13, in a date
replace day = regexr(day, ",", " ") replace varname = regexr(varname.....)
27
how to convert something stata thinks is a word into a number
destring varname, replace
28
how to convert months into numbers that stata recognises
replace month = "1" if month == "January" general form: replace (nameofvariable) = "number associated" if nameofvar == "Relevantmonth" then: destring month, replace
29
how to run a regression over only a range of variables as example use year of house sold between 2016 and 2020
reg price bedrooms bathrooms, robust, if inrange(year, 2016, 2020) general form: reg outcome iv iv, robust, if inrange(varname, lower bound, upper bound)
30
how to use the inrange function more generally
any command... then add if inrange(variable,lowerbound,upperbound)
31
how to restrict a command to only be applied to data in specific areas e.g. specific suburbs
add if inlist(varname, "suburb1", "suburb2") at end this will only do the command for data in those two suburbs
32
how to run a regression but one regressor make it a binary variable
reg outcome iv1 i.iv2, robust
33
how to run a regression where the outcome is in log
gen ln_outcome = ln(outcome) reg ln_outcome iv1 iv2, robust
34
how to make a table which only shows a variable in the 25th percentile use example of price and address i.e. show all addresses in the 25th percentile of price
sum varofpercentile, detail tab varofinterest if varofpercentile == r(p25) sum price, detail tab address if price == r(p25)
35
How to tell stata you are using panel data
egen panel_id = group(varname varname) xtset panel_id timeidvariable
36
how to create a binary variable for being treated (DiD regressions)
gen treated = inlist(varname, "name1", "name2") e.g. gen treated = inlist(mktnam, "Lviv", "Rvine") ukranian cities
37
how to create a binary variable for being after treatment (DiD)
gen after = timeidvariable >=timeperiod e.g. gen after = days_since_2014 >= 730
38
how to create an interaction term for DiD
gen interaction = treated*after assuming u called your two binary variables treated and after
39
how to run a DiD regression for beetroots in ukraine
reg price treated after interaction, robust, if varname == "beetroots"
40
How to create a graph for trends prior to treatment in DiD of control and treated
twoway (lfit outcome timeidvariable if treated ==1 & after ==0 &varname == "beetroots/whatever") (lfit outcome timeidvariable if treated ==0 & after ==0 & varname == "Beetroots/whatever")
41
how do you run a regression to check for parallel trends pre treatment in DiD
gen interact_timeidvar = treated*timeidvar reg outcome timeidvar treated interact_timeidvar, robust, if after ==0 & varname == "beetroots"
42
how to run a fixed effects regression
xtreg outcome interactionvarname, fe robust, if varname == "beetroots"
43
how to run a fixed effects regression also controlling for time
xtreg outcome interactionvarname i.timeidvariable, fe robust, if varname == "beetroots"
44
how to run a first difference regression
reg diff_outcome diff_fake_regressor, robust, if varname == "beetroots"
45
how to run a regression discontinuity (what are the three variables to generate only, not reg function) prob not assessed he will tell us the fake ones
gen fake_over = fake_regressor -1 gen over_one = fake_regressor > 1 gen interact_RD = over_one*Fake_over
46
what is the code for running the regression discontinuity
reg outcome fake_over over_one interact_RD, robust, if varname == "beetroots"
47
how to run a 2sls regression assuming we have the variables already z1, z2 and control
ivregress 2sls price (fake_regressor = z1 z2) control, robust, if varname == "Beetroots"