Advanced Topic: PROC SQL Flashcards
What can PROC SQL be used for?
- Retrive data from and manipulate SAS tables
- Add/modify data values in table, drop columns
- Create tables
- Join tables together
- Generate reports
What is the general syntax for PROC SQL?
proc sql;
select columns
from table
(where expression)
(group by columns)
(having)
(order by columns);
quit;
Note: this order cannot be changed in SQL.
In SQL, what does the commands select, from, where, group by, having, and order by do?
Select: specifies the columns to be selected
From: specifies tables to be queried
Where: subsets the data based on conditions
Group by: classifies data into groups by specified columns
Having: works with group by to restrict groups displayed
Order by: sorts the rows by the values.
T/F: You need the run statement in PROC SQL.
False: run has no effect. PROC SQL statements executed immediately. To end, must use quit statement.
Given the following code, explain what each line is doing.
1) proc sql;
2) select empid, jobcode, salary, salary*0.06 as bonus
3) from payrollmaster
4) where salary lt 32000
5) order by 2, empid, bonus
6) quit;
1) calls PROC SQL
2) selects columns, separated by commas, and creates new column of bonus
3) gets data from the dataset called payrollmaster
4) selects only rows/obs with salary < 32,000
5) Order first by 2nd column(jobcode), then empid, and finally bonus
6) quits PROC SQL
Where do you put semicolons in PROC SQL?
Only after the last clause in the statement.
T/F: columns created in PROC SQL are automatically permanent.
False: New columns exist only for the duration of the query, unless a table/view is created.
T/F: The ORDER BY clause sorts rows in ascending order.
True: if you want descending, need to specify with DESC
What do you use if you want all columns to be referenced?
*
List some functions you can use in PROC SQL.
min, max, sum, avg, nmiss, n, mmiss, std, var, freq
Write the general code to create a new table in proc SQL.
proc sql;
create table tablename as
select columns
from dataset
group by columns;
quit;
T/F: When the create table statement is used, the query creates a report in the results window.
False: It does not, instead the new table/dataset is created and results are not printed automatically.
Describe the 4 types of joins.
1) Outer: keeps all observations from both datasets
2) Inner: keeps only the observations that match
3) Right: keeps all observations from the right side (second defined table) whether they match or do not match left side.
4) Left: keeps all observations from the left side (first defined table) whether they match or do not match right side.
Note: think of a venn diagram. outer = all, inner = overlaping part, right = all of right circle, left = all of left circle
Describe what is happening in each line of the following code.
proc sql;
1) create table payrollmaster as
2) select payrollmaster.empid, lastname, jobcode, salary, newsals.newsalary
3) from payrollmaster, newsals
4) where payrollmaster.empid = newsals.id
5) order by empid;
quit;
1) Creating a new table/dataset called payrollmaster
2) Selecting columns from both tables defined in from statement
3) Selecting payrollmaster and newsals dataset
4) Joining data on common key of id
5) Ordering data by empid
T/F: You need to sort the data before joining in PROC SQL.
False: does not need to. SQL can combine without sorting