Combining, Creating and Managing Tables with PROC SQL Flashcards
In all types of joins, what does PROC SQL generate first?
a Cartesian product (i.e. all possible combinations of values between the two tables)
What happens when you specify multiple tables in the FROM clause but don’t include a WHERE statement?
PROC SQL returns the Cartesian product of the tables (i.e. all possible combinations of the values in the two tables)
How to do an inner join?
proc sql;
select *
from table 1, table 2
where table1.column = table2.column;
quit;
What happens if you do an inner join with tables in which multiple rows have duplicate values of the column on which the tables are being joined?
e.g.
X A X B
2 b1 2 x1
2 b2 2 x2
All possible combinations of the duplicate rows are displayed:
X A X B
2 b1 2 x1
2 b1 2 x2
2 b2 2 x1
2 b2 2x2
How do you assign a table alias?
What are two situations that require a table alias?
proc sql;
select t1.column1 column2
from table1 as t1, table2 as t2
where t1.column1 = t2.column1;
Table aliases are required when
- a table is joined to itself: e.g. from airline.staffmaster as s1, airline.staffmaster as s2
- you need to reference columns from same-named tables in different libraries: e.g. from airline.flightdelays as af, work.flightdelays as wf
What are the types of outer joins?
- Left: all matching rows plus nonmatching rows from the first table specified in the FROM clause (the left table)
- Right: all matching rows plus nonmatching rows from the second table specified in the FROM clause (the right table)
- Full: All matching rows plus nonmatching rows in both tables
Outer join syntax
Note you can do an outer join on only two tables or views at a time.
proc sql;
select column1 colum2..
from table1
left join/right join/full join
table2
on join-conditions;
e.g.
proc sql;
select *
from one
left join
two
on one.x=two.x;
Syntax for creating an inner join with outer join syntax
proc sql;
select column1 column2..
from table1
inner join
table2
on join-conditions;
What are 3 advantages of PROC SQL joins over DATA step match-merges?
- PROC SQL joins don’t require sorted or indexed tables
- PROC SQL joins do not require that the columns in join expressions have the same name
- PROC SQL joins can use comparison operators other than the equal sign, e.g.
proc sql;
selec a.itemnumber, cost, price
from table 1 as a, table 2 as bb
where a.itemnumber = b.itemnumber and a.cost>b.price;
What is the difference between DATA step match-merge and PROC SQL joins when all the values of the merging variable match?
when only some of the values match?
When only some of the values match, how can you make a PROC SQL full outer join similar to a DATA step match-merge?
- No difference
- The DATA step overlays the two common columns by default, PROC SQL does not (so there will be some missing values in the column) see pg. 103
- Use COALESCE function, eg.
proc sql;
select coalesce(three.x, four.x) as X, a , b
from three
full join
four
on three.x = four.x;
What is an in-line view?
An in-line view exists only during query execution
An in-line view is a nested query that is specified in the outer query’s FROM clause. It selects data from one or more tables in order to produce a temporary table that the outer query then uses to select data for output. e.g.
from (select flighnumber, date
from sasuser.marchflights;)
Two potential advantages of using an in-line view?
- The complexity of the code is usually reduced, so that the code is easier to write and understand
- In some cases, PROC SQL might be able to process the code more efficiently
Create a FROM clause with a table and an in-line view and assign as aliases to both of them. Don’t forget the where statement.
from table1 as one,
(select column1, column2,
from table2) as two
where one.column1 = two.column2
Syntax to combine tables vertically
proc sql;
select *
from table1
except/intersect/union/outer union
select *
from table2;
Which set operator is evaluated first by default?
intersect
Syntax to use ALL and CORR
proc sql;
select *
from table1
set-operator all/corr
select *
from table2;
When do you use the ALL keyword?
cannot be used with OUTER UNION
When you want it to make only one pass through the data and not remove duplicate rows
When do you use the CORR keyword?
When you want it to compare and overlay columns by name instead of position
When used with EXCEPT, INTERSECT, and UNION, removes any columns that do not have the same name in both tables
When used with OUTER UNION, overlays same-named columns and displays columns that have nonmatching names without overlaying.
What set-operator is represented by the picture?
What two things does it do?
EXCEPT
Selects unique rows from the first table that are not found in the second table
overlays columns
What type of join is represented by the picture?
inner join
What type of join is represented by the picture?
full outer join
What type of join is represented by the picture?
left outer join
What type of join is represented by the picture?
right outer join
Table one
X: 1 1 1 2 3 4 6
A: a a b c v e g
Table two
X: 1 2 3 3 5
B: x y z v w
What is the output of
proc sql;
select *
from one
except all corr
select *
from two;
X: 1 1 4 6
Because of CORR, PROC SQL only looks at the column that has the same name on both tables. Because of ALL, duplicates are not eliminated. Then the numbers in table one that have a match in table two are eliminated. Note that this is a one-to-one match. Only one of the 1s in table one is eliminated.
What set operator is represented by the image?
What two things does it do?
INTERSECT
Selects unique rows that are common to both tables
Overlays columns
Adding what keyword to a PROC SQL query prevents PROC SQL from making an extra pass through the data?
ALL
Table One
X: 1 1 1 2 3 4 6
A: a a b c v e g
Table Two
X: 1 2 3 3 5
B: x y z v w
What is the output of:
proc sql;
select *
from one
intersect all corr
select *
from two;
X: 1 2 3
Because of CORR, PROC SQL only looks at the columns with the same name. Because of ALL, it does not eliminate duplicates. Then it matches up the numbers in X. Note that this is a one-to-one match, so only one of the 1s from table one is displayed in the output.
What set operator is represented in the image?
What two things does it do?
UNION
It selects unique rows from both tables
it overlays columns
Which set operator is represented in the image?
How does it work?
OUTER UNION
It concatenates the results of the queries by selecting all rows (both unique and nonunique) from both tables
It does not overlay columns
What happens if you use the CORR keyword with OUTER UNION?
It overlays columns that have the same name. Normally, OUTER UNION does not overlay columns
What is the DATA step equivalent of this PROC SQL query?
proc sql;
create table three as
select * from one
outer union corr
select * from two;
quit;
data three;
set one two;
run;
proc print data=three noobs;
run;
See pg. 156
Data step does something similar to OUTER UNION but it also automatically overlays same-named columns. You have to add corr to OUTER UNION for it to do that.
For the set operators EXCEPT, INTERSECT, and UNION, columns are overlaid based on ____________ in the SELECT clause rather than by column name. In order to be overlaid, columns in the same relative position in the two SELECT clauses must have ________________.
For the set operators EXCEPT, INTERSECT, and UNION, columns are overlaid based on the relative position of the columns in the SELECT clause rather than by column name. In order to be overlaid, columns in the same relative position in the two SELECT clauses must have the same data type.
For which set operators does PROC SQL makes two passes through the data?
For set operators that display only unique rows (EXCEPT, INTERSECT, and UNION)
- PROC SQL eliminates duplicate rows in the tables
- PROC SQL selects the rows that meet the criteria and overlays columns
Create an empty table called “discounts” by defining new columns Destination (character, width 3), BeginDate (num, date9), EndDate (num, date9), and Discount (num)
Note can specify width for character but not for numeric columns
proc sql;
create table work.discounts
(Destination char(3) label=’City’,
BeginDate num Format=date9.,
EndDate num format=date9.,
Discount num);
What are the two data types used by SAS tables?
What happens if you specify another data type?
numeric and character
PROC SQL converts the supported data tytpes that are not SAS data types to either numeric or character format
What is the PROC SQL equivalent of PROC CONTENTS?
Note that DESCRIBE TABLE writes to the SAS log, while PROC CONTENTS generates a report
DESCRIBE TABLE
It list columns and column attributes. Also gives information about indexes defined on a table.
proc sql;
describe table tablename;
Create an empty table that has the same columns and attributes as an existing table but has no rows (it is empty)
Where would you add the drop or keep options if you only want a few columns of the existing table?
table1 is the table you want to emulate
proc sql;
create table tablename
like table1;
You can add drop or keep between the name of the table being created and the like clause or after the name of table1.
eg.
proc sql;
create table tablename
(drop = column1 column2)
like table1;
Insert rows by using the SET clause - syntax
Use one set clause for each row
proc sql;
insert into tablename
set column1 = value-1, column2 = value-2
set column1 = value-3, column2 = value-4;
Insert rows by using the VALUES clause - syntax
Use one value clause for each row.
proc sql;
insert into tablename (column1, column2, column3)
values (value-1, value-2, value-3)
values (value-4, value-5, value-6);
Inserting rows from a query result - syntax
proc sql;
insert into tablename
select column1, column2, column3
from othertable;
What are integrity constraints?
Rules that you can make to restrict the data values that can be stored for a column in a table
What are general integrity constraints?
List the 4 general integrity constraints
PRIMARY KEY is a general integrity constraint if it does not have any FOREIGN KEY constraints referencing it.
General integrity constraints enable you to restrict the values of columns within a single table.
CHECK, NOT NULL, UNIQUE, PRIMARY KEY
When is a referential integrity constraint created?
A referential integrity constraint is created when a PRIMARY KEY in one table is referenced by a FOREIGN KEY in another table.
Syntax for creating a constraint in a column specification
proc sql;
create table tablename
(column1 datatype constraint,
column2 datatype constraint);
e.g.
proc sql;
create table work.employees
(ID char(5) primary key,
Name char(10),
Gender char(1) not null check(gender in (‘M’, ‘F’));
Syntax for creating a constraint by using a constraint specification
What are two advantages compared to creating a constraint in a column specification?
Insert the actual constraint type in constraint
proc sql;
create table tablename
(column1 datatype,
column2 datatype,
constraint constraint-name constraint(column1),
constraint constrain-name constraint(column2);
- can specify a name for the constraint
- can define a constraint for multiple columns
When you’re inserting rows to a table that has constraints and one of the rows does not meet the constraints, does PROC SQL
a) insert only the rows that meet the contraints
b) not insert any of the rows
B. PROC SQL does not insert any of the rows
What is the UNDO_POLICY= option?
Where do you put it in code?
What are the 3 settings for UNDO_POLICY?
It controls how PROC SQL handles updated data if any errors occur during the insertion or update process
proc sql undo_policy=option;
Settings:
REQUIRED: Default. Doesn’t add any new rows if any errors are met.
NONE: PROC SQL skips records that cannot be inserted or updated. Any data that meets the integrity constraints is added or updated
OPTIONAL: If UNDO can be done reliably, then it proceeds as if UNDO_POLICY = REQUIRED. Otherwise, it proceeds with UNDO_POLICY=NONE
How do you display a table’s integrity constraints?
proc sql;
describe table constraints tablename;
How do you update values in existing table rows (update all or a subset of rows in a column)?
Use the where statement when you want to update only a subset of the rows in the column.
proc sql;
update tablename
set column1 = updateexpression
where expression;
eg.
proc sql;
update payrollmaster
set salary = salary*1.05
where jobcode like ‘__1’;
Update values in existing table rows (case method) syntax
Here the change we’re making is multiplying column1 by a certain number depending on conditions
proc sql;
update tablename
set column1 = column1*
case
when condition = 1 then answer1
when condition = 2 then answer2
else answer3
end;
case condition
when 1 then answer1
when 2 then answer2
else answer 3
end;
(See pg. 206)
In which 3 PROC SQL statements can you use a CASE expression?
UPDATE, INSERT, and SELECT
How do you create a new column with the case clause?
proc sql;
select column1, column2,
case condition
when 1 then answer1
when 2 then answer2
else answer3
end as newcolumn
from table1, table2
where table1.column1 = table2.column1;
How do you delete rows from a table?
proc sql;
delete from tablename
where condition;
Syntax for altering columns in a table
proc sql;
alter table tablename
add newcolumn datatype, newcolumn2 datatype
drop column1
modify column2 datatype(width) format = format label=’label’;
What things can you change with a MODIFY clause?
What can you NOT do with a MODIFY clause?
You can change a column’s length (width) for character variables, informat, format, and label
You cannot change the data type of a column, and you cannot change the column’s name
How do you drop a table?
proc sql;
drop table tablename;