Important codes Flashcards
Do file
This is where we type all of our codes
Allows you to save codes
How to set up a sheet
- Clear all (*removes all previous codes we had done before *)
- Right click the file and then click on properties and then copy location (Tell stata where we are saving the file)
- Cd “(location)”
- Use auto.data, clear (open the data set and clear means if there is already data opened on stat clear it and use the data that we have just opened)
Run all code in the do file editor
If you want specific codes then highlight the code and press the button
If there is no red once you run the code, then it’s fine
Browse
Opens spreadsheet so we can see all of our data
Browse (variable) in order to see specific variables
Use br to view many variables
Code: Describe
Shows our data set
Understand variables
Int- integer
Code: List
Shows us all of our observations one at a time
Negative: Gives us too much information
Code: list (** VARIABLES**)
Only shows observation for specific variables
Code: list (VARIABLES) in 1/(NUMBER OF OBSERVATIONS YOU WANT TO BE INCLUDED)
Restricts to certain observations
EXAMPLES
- 1/l: List on everything
- 70/l: List starts from the 70th observation to the last
(Lowercase L denotes the final observation)
Code: list (variables) if (outcome of interest)==1
If command, therefore only includes observations that comply with the outcome of interest
Needs 2 equal signs!!
List code with multiple conditions
!= (value): Not equal to
&: means and
|: means or
==0: Not equal to
List (Anything e.g. a letter)*: Lists any variable that starts with the letter or follows the conditions
Code: Count
Counts the number of observations satisfying the condition
Code: Summarise or sum
Descriptive statistics of variables
Able to add if commands
,detail: More detailed breakdown (e.g. median, Skewness)
(On the example there is 1 less rep78 observation as one of the observations has nothing)
Code: tabulate/tab (Variable of interest)
- Breaks down each discrete variable into what values they take and the frequency, percentage and cumulative percentage
- Apply to apply if
- Used for 2 variables which shows a cross tab (first variable is on the rows sand the second it the rows)
Code: gen (title of the new variable) = function
- Makes a new variable
- 1 equal sign: assigning the variable to be equal to a certain function
- 2 equal signs: Equality already holds, checking if it holds
- If you want a space (put _)
- Able to multiply variables by putting * between them (called an interaction)
- Running code again doesn’t lead to 2 duplicated variables as in the beginning we clear all
Code: Replace (variable name) = (function in which the variable changes)
- If you make a mistake, run a clear all and then do the variable again
How to remove blank values when generating a new variable?
(1: Less than or equal to 2 or 0 if rep 78 is greater than 3)
- Use replace and mention the Newley generated function and then the original function (using 2 equal signs for this one)
- Or you could add if (variable we want to code) !=.
CODE: Corr (variables )
- Tells us the correlation
- Adding “, cov” at the end tells us the covariance
How to save the new version of data
- Save (name of the file), replace
- replace means that we are replacing the previous data
- Never write the name auto as we do not want to overwrite the original data so give it a new data - Under cd write: capture log close
- If we are currently creating a log file close it and if there is not a log file, ignore that - Log using (log name).log, replace text
-Creating a log file in log form in which we can read. If a file is already with the same name we replace the text - Log close (At the bottom)
Running a t -test in which we test if a variable is equal to a certain value
ttest (variable we are testing) ==0
- Provided with standard deviation (measure of spread of all values)
- Provided with standard error (measure of precision of the average across samples)
- Provided with t value (hypothesised mean minus observed mean divided by standard error)
- Provides confidence interval
- Ha: mean !=0: Two sided alternative, this then shows the probability of getting the test statistic or one more extreme. If it’s less than 0.05 than we reject the null hypothesis.
- Ha: mean < 0 is a one sided test assuming that the mean is positive
T-test for a difference in mean
Ttest (variable we are testing), by (variable we are separating to find the difference between groups. E.g. Binary variable)
Code: Reg (outcome variable, independent variables)
Constant is worked out through: Average value of outcome minus (coefficient estimate for the independent variable * average value of the independent variable )
Code: Predict (name of the new variable E.g. y_hat)
- Needs to be run after a reg command
- predict residual, resid: Needs to be underneath a reg command
Show properties of regressors
Code: twoway
Creating a graph
- scatter (outcome variable regressor): Scatter graph
- lift (outcome variable regressor): Regression line
- graphregion(color(white)): Change background colour to white (needs to be after a comma)
Difference between 1 equal sign and 2 equal signs?
=: Assign this value to be equal
==: If equality already holds
Omitted variable bias
Auxiliary regression: Reg (omitted variable) (regressor of interest)
If you then type local coef = _b[variable]/ it will save the value of the coefficient for the variable, then you have to run your code all together
Saving in a document file
Code: outreg
Use reg to tell stata that we are creating a new document, therefore we type replace (Not typing replace adds on a new column, typically done when adding a new control)
Document tells us if something is significant through asterisk (*)
sdec (4): standard errors with 4 decimal places
Bdec (4): Beta to have 4 decimal places
Hypothesis test for coefficent
Code: Lincom
Write that the variable associated with the coefficient
Code: test
Test in which the null hypothesis is requires multiple equal signs and there is the use of the word and
First need to run the regression to show stata which regression we are talking about
Want to tester multiple hypothesis at the same time.
Can use F-test for single hypothesis as-well but just a single bracket with the equal sign and what the hypothesised value is
Standardising
Egen is standardising
R(sd) refers to the standard deviation from the previous sum command.
When running a reg and interpreting the coefficient, mention that the specific variable increasing by 1 standard deviation is equal to the outcome variable increasing by the coefficient holding fixed other variables.
We can also standardise the outcome.
Makes it easier to understand
What is normalising?
Transforming the variable to make it easier to read
Code: Egen
Creates a variable that is the average value
If instead of mean there was an sd, then it would be the standard deviation of them
Code: Insheet
Insheet using name of data.csv, clear
- Open a new dataset
- Excel format therefore we need a cvs name
Code: Split VARIABLE
Creating separate variables for the day month and year:
Rename: Change the name of a variable rename (original variable name) (new variable name)
- replace VARIABLE = regexr (variable, “what we want to take out”, “what we want to put in”)
Code: Destring variable, replace
Command tells stata that a certain variable is actually a number
Used with split command
When we browse, why are certain variables in red?
Stata thinks that this variable is just words and not numbers (E.g. Months )
Changing months in numerical values
Add destring code at the end
Code: Inrange
, if I range (variable, starting point number, ending point number)
Includes the first and last number
Code: Inlist
For words
What should your add at the end of any regression code?
,Robust to show the use of robust standard errors
Binary variables i.
I.
Create a dummy variable for each possible outcome
Code: xi
Stat also creates variables which you can browse
Functional form
Finding observation of percentiles
Scalar name of constant =_b[coefficient name]
- Display needs to be after the reg command but not scalar
Egen panel_id= group (variable variable)
Gives each possible combination of each variable a specific ID code
Xtset panelidvariable timeidvariable
Informs stata that we have panel data
Diff-in-diff
Binary variable for treated
Gen variablename = Inlist (variable we are focusing on, values that we want our binary variable to equal 1 for)
Diff and Diff
Binary variable for being after treated
Binary variable that is either 1 or 0 depending on if it’s after or before treatment
Diff and Diff regression:
Reg outcomevariable treated after interaction, robust, if variable == “specific value/word that is the treated group”
Diff and Diff
Generating the interaction term
Multiplying treated and control
Gen interaction = treated*after
Diff and Diff
Graph for the trends before treatment
Two way (lift outcome binary variable(for time) if treated ==1 & after == 0 & cname == “Treated group”) (lift outcome binary variable(for time) if treated ==0 & after == 0 & cname == “Treated group”)
Trend in outcome for the treated group prior to treatment as well as the trend in prices for the control group
Checking for parallel trends
Fixed effect estimators
Time fixed effect
Creating fake data
Gen fake_regressor = rnormal ()
Distributed with mean 0 and standard deviation of 1
Time fixed effects
Sort panel_id timevariable
First difference regression
Regression discontinuity
Image
2SLS
Ivregress 2SLS price (treatment that might have bias = instruments) control variables, robust if product == “treated group”
Panel Data Regression