SAS SQL queries Flashcards
PROC SQL syntax
PROC SQL;
SELECT column1, column2
FROM table1, table2
WHERE expression
GROUP BY column1, column2
ORDER BY column1, column2 DESC;
QUIT;
How do you create a new column in a SELECT clause?
This will create a new column named ‘newcolumn’ that calculates column2*0.5.
proc sql;
select column1, column2, column2*0.5 as newcolumn
from table1
quit;
Do the columns specified in a WHERE clause have to be specified in the SELECT clause?
no
Two ways to write an ORDER BY clause
1 is the column’s position in the SELECT clause
ORDER BY column1
or
ORDER BY 1
How do you join tables?
How do you select a column that appears on both tables?
Remember to set the matching columns in the WHERE clause
proc sql;
select table1.column1, column2, column3
from sasuser.table1, sasuser.table2
where table1.column1 = table2.column1
order by column3;
quit;
When do you use a GROUP BY clause?
If you use a GROUP BY clause in a PROC SQL step without a summary function, the GROUP BY clause is changed to an ORDER BY clause.
When your query contains one or more summary functions and you want to group the results
e.g.
proc sql;
select membertype, sum(milestraveled) as TotalMiles
from sasuser.frequentflyers
group by membertype;
quit;
How do you create a new table from the results of a query?
PROC SQL;
CREATE TABLE table-name AS
SELECT column1, column2
FROM table1, table2
WHERE expression
GROUP BY column1, column2
ORDER BY column1, column2 DESC;
QUIT;
How do you select all columns of a table?
use *
proc sql;
select *
from table1;
quit;
Which option lets you view the list of columns from a SELECT * statement in the SAS log?
This will list out all the columns to the SAS. Also resolves macro variables and places parentheses around expressions to show their order of evaluation.
FEEDBACK option
proc sql feedback;
select *
from table1;
quit;
How do you limit the number of rows displayed in the output?
proc sql outobs= n;
How do you eliminate duplicate rows from the output?
use DISTINCT keyword following SELECT
proc sql;
select distinct column1, column2
from table1;
What happens if you use a column you created in a SELECT clause in a WHERE clause?
How do you fix this?
Note can also rewrite the calculation in the where clause but this is inefficient
ERROR - the WHERE clause is processed before the SELECT clause, so the column isn’t found in the original table
To fix it, use the CALCULATED keyword
proc sql;
select flightnumber, boarded + transferred + nonrevenue as Total
from sasusesr.marchflights
where calculated total < 100;
quit;
How do you make a calculation based on a column you created in the SELECT clause?
Use CALCULATED keywork
proc sql;
select flightnumber, boarded + transferred + nonrevenue as Total, calculated total/2 as Half
from sasusesr.marchflights;
quit;
What 3 things end a PROC SQL step?
QUIT;
Submitting another PROC step or a DATA step
How do you add a label to a column in a PROC SQL step?
proc sql;
select column1 label =‘labelname’, colum2
from table1;
How do you add a format to a column in a PROC SQL step?
proc sql;
select column1, column2 format = format
from table1;
How do you get PROC SQL to ignore permantent labels in a table?
use the NOLABEL system option
Where do you put titles and footnotes relative to a PROC SQL step?
Either before the PROC SQL statement or between the PROC SQL statement and the SELECT statement
How do you add a column that contains a character or numeric constant?
Include the text string or number in the SELECT clause
proc sql;
select column1, column2, ‘myconstantstring’, column3
from table1;
The ANSI-standard sumary functions, such as AVG and COUNT, can be used with ____ argument(s). The SAS summary functions, such as MEAN and N, can be used with ____ argument(s).
ANSI - a single argument
SAS - either single or multiple arguments
If the summary function specifies one column as argument, then the calculation is….
If the summary function specifies multiple columns as arguments, then the calculation is….
If the summary function specifies one column as argument, then the calculation is performed down the column.
If the summary function specifies multiple columns as arguments, then the calculation is performed across columns for each row.
What do the following clauses do?
select count(*) as Count
select count(jobcode) as Count
select count(distinct jobcode) as Count
select count(*) as Count
Returns the total number of rows in a group or in a table
select count(jobcode) as Count
Returns the total number of rows in a group or in a table for which there is a nonmissing value in the selected column
select count(distinct jobcode) as Count
Returns the total number of unique values in a column
Use COUNT(*) to count rows within groups of data
This will give you the counts for each category of column1
proc sql;
select column1, count(*) as Count
from table1
group by 1;
How do you use a HAVING clause?
What are two differences between a HAVING clause and a WHERE clause?
You can have summary functions in a HAVING clause but not in a WHERE clause
Use it after a GROUP BY clause to select groups based on a condition
proc sql;
select column1, avg(column2) as AvgCol
from table1
group by column1
having condition;
What happens if you have a HAVING clause with no GROUP BY clause?
The HAVING clause treats the whole table as one group
When does data remerging occur? (3)
- The values returned by a summary function are used in a calculation
- the SELECT clause specifies a column that contains a summary function and other columns(s) that are not listed in a GROUP BY clause
- The HAVING clause specifies one or more columns or column expressions that are not included in a subquery or a GROUP BY clause
What happens during data remerging? (2 things)
PROC SQL makes two passes through the table:
- PROC SQL calculates and returns the value of summary functions, and groups data according to the GROUP BY clause
- PROC SQL retrieves any additional columns and rows that it needs to display ini the output, and uses the result from the summary function to calculate any arithmetic expressions in which the summary function participates
What is a subquery?
This will return the groups for which AvgSalary is greater than the average salary across the entire payroll
A query that is nested in, and is a part of, another query.
It can return single or multiple values to an outer query.
It can return values for multiple rows but only for a single column.
e.g.
proc sql;
select jobcode, avg(Salary) as AvgSalary
from sasuser.payrollmaster
group by jobcode
having avg(salary) >
(select avg(salary) from sauser.payrollmaster);
What are the types of subqueries?
- Noncorrelated: a self-contained subquery that executes independently of the outer query
- Correlated: a dependent subquery that requires one or more values to be passed to it by the outer query before the subquery can return a value to the outer query
When a noncorrelated subquery returns multiple values, use the following operators in a WHERE or HAVING clause:
IN
ANY or ALL
EXISTS
Explain the effect of the ANY operator:
where dateofbirth < ANY
(subquery)
This returns the values that are less than any value returned by the subquery. eg. if the subquery returns 20, 30, 40, then the outer query will select all values < 40.
eg.
proc sql;
select empid, jobcode, dateofbirth
from sasuser.payrollmaster
where jobcode in (‘FA1’, ‘FA2’)
and dateofbirth < any
(select dateofbirth
from sasuser.payrollmaster
where jobcode=’FA3’)
In this code, we’re trying to identify any flight attendants at level 1 or 2, who are older than any of the flight attendants at level 3.
Explain the effect of the ALL operator:
where dateofbirth < ALL
(subquery)
Returns values that are less than all the values returned by the subquery. e.g. if subquery returns 20, 30, 40, then the outer query will select all values < 20.
EXISTS is true if…
NOT EXISTS is true if…
EXISTS is true if the subquery returns at least one row
NOT EXISTS is true if the subquery returns no data
What type of subquery is this?
proq sql;
select lastname, firstname
from sasuser.flightattendants
where not exists
(select *
from sasuser.flightschedule
where flightattendants.empid = flightschedule.empid)
Correlated subquery
What does the NOEXEC option do?
SAS checks the syntax of all queries in that PROC SQL step for accuracy but does not execute them.
If query is valid and all referenced columns and tables exists, the SAS log displays:
Statement not executed due to NOEXEC option
proc sql noexec;
select…
What does the VALIDATE keyword do? Where do you put it?
Checks for errors in the query. If not errors, SAS log displays:
PROC SQL statement has valid syntax
Place before SELECT statement
proc sql;
validate
select…..;