Important codes Flashcards
Do file
This is where we type all of our codes
Allows you to save codes
How to set up a sheet
- Clear all (*removes all previous codes we had done before *)
- Right click the file and then click on properties and then copy location (Tell stata where we are saving the file)
- Cd “(location)”
- Use auto.data, clear (open the data set and clear means if there is already data opened on stat clear it and use the data that we have just opened)
Run all code in the do file editor
If you want specific codes then highlight the code and press the button
If there is no red once you run the code, then it’s fine
Browse
Opens spreadsheet so we can see all of our data
Browse (variable) in order to see specific variables
Use br to view many variables
Code: Describe
Shows our data set
Understand variables
Int- integer
Code: List
Shows us all of our observations one at a time
Negative: Gives us too much information
Code: list (** VARIABLES**)
Only shows observation for specific variables
Code: list (VARIABLES) in 1/(NUMBER OF OBSERVATIONS YOU WANT TO BE INCLUDED)
Restricts to certain observations
EXAMPLES
- 1/l: List on everything
- 70/l: List starts from the 70th observation to the last
(Lowercase L denotes the final observation)
Code: list (variables) if (outcome of interest)==1
If command, therefore only includes observations that comply with the outcome of interest
Needs 2 equal signs!!
List code with multiple conditions
!= (value): Not equal to
&: means and
|: means or
==0: Not equal to
List (Anything e.g. a letter)*: Lists any variable that starts with the letter or follows the conditions
Code: Count
Counts the number of observations satisfying the condition
Code: Summarise or sum
Descriptive statistics of variables
Able to add if commands
,detail: More detailed breakdown (e.g. median, Skewness)
(On the example there is 1 less rep78 observation as one of the observations has nothing)
Code: tabulate/tab (Variable of interest)
- Breaks down each discrete variable into what values they take and the frequency, percentage and cumulative percentage
- Apply to apply if
- Used for 2 variables which shows a cross tab (first variable is on the rows sand the second it the rows)
Code: gen (title of the new variable) = function
- Makes a new variable
- 1 equal sign: assigning the variable to be equal to a certain function
- 2 equal signs: Equality already holds, checking if it holds
- If you want a space (put _)
- Able to multiply variables by putting * between them (called an interaction)
- Running code again doesn’t lead to 2 duplicated variables as in the beginning we clear all
Code: Replace (variable name) = (function in which the variable changes)
- If you make a mistake, run a clear all and then do the variable again
How to remove blank values when generating a new variable?
(1: Less than or equal to 2 or 0 if rep 78 is greater than 3)
- Use replace and mention the Newley generated function and then the original function (using 2 equal signs for this one)
- Or you could add if (variable we want to code) !=.
CODE: Corr (variables )
- Tells us the correlation
- Adding “, cov” at the end tells us the covariance
How to save the new version of data
- Save (name of the file), replace
- replace means that we are replacing the previous data
- Never write the name auto as we do not want to overwrite the original data so give it a new data - Under cd write: capture log close
- If we are currently creating a log file close it and if there is not a log file, ignore that - Log using (log name).log, replace text
-Creating a log file in log form in which we can read. If a file is already with the same name we replace the text - Log close (At the bottom)
Running a t -test in which we test if a variable is equal to a certain value
ttest (variable we are testing) ==0
- Provided with standard deviation (measure of spread of all values)
- Provided with standard error (measure of precision of the average across samples)
- Provided with t value (hypothesised mean minus observed mean divided by standard error)
- Provides confidence interval
- Ha: mean !=0: Two sided alternative, this then shows the probability of getting the test statistic or one more extreme. If it’s less than 0.05 than we reject the null hypothesis.
- Ha: mean < 0 is a one sided test assuming that the mean is positive
T-test for a difference in mean
Ttest (variable we are testing), by (variable we are separating to find the difference between groups. E.g. Binary variable)
Code: Reg (outcome variable, independent variables)
Constant is worked out through: Average value of outcome minus (coefficient estimate for the independent variable * average value of the independent variable )
Code: Predict (name of the new variable E.g. y_hat)
- Needs to be run after a reg command
- predict residual, resid: Needs to be underneath a reg command
Show properties of regressors
Code: twoway
Creating a graph
- scatter (outcome variable regressor): Scatter graph
- lift (outcome variable regressor): Regression line
- graphregion(color(white)): Change background colour to white (needs to be after a comma)