Indexes and Views in PROC SQL Flashcards
What is an index?
An index allows you to access observations directly. The index stores values in ascending value order for a specific variable or variables and includes information as to the location of those values within observations in the data file.
For example, suppose you want the observation with SSN (Social Security number) equal to 123-45-6789:
Without an index, SAS accesses observations sequentially in the order in which they are stored in the data file. SAS reads each observation, looking for SSN=123-45-6789 until all observations are read.
With an index on variable SSN, SAS accesses the observation directly. SAS satisfies the condition using the index and goes straight to the observation that contains the value without having to read each observation.
What are the two types of indexes?
- Simple: based on only one column
- Composite: based on two or more columns. In the index, the values of the key columns are concatenated to form a single value. Needs a new name that is not the name of any existing columns.
What are 4 cases where creating an index would improve efficiency?
- a WHERE clause that contains a comparison operator, TRIM or SUBSTR, CONTAINS, or LIKE
- a subquery returning values to the IN operator
- a correlated subquery in which the column being compared with the correlated reference is indexed
- a join in which the join expression contains the = operator, or a join in which all the columns in the join expression are indexed in one of the tables being joined.
pg. 229-230
3 benefits of using an index
- A small subset of data (<15%) can be accessed more quickly (The bigger the subset, the less advantageous it is to use an index)
- equijoins can be performed without internal sorts
- uniqueness can be enforced by creating a unique index
4 costs of using an index
- Needs more CPU time
- Needs additional input/output requests when compared to reading the table sequentially
- Uses additional memory for buffers into which the index pages and code are loaded for processing
- Requires additional disk space
Create index syntax
Unique specifies that all the values in the column are unique
proc sql;
create (unique) index index-name
on table1 (columns);
How do you find out whether a table has any indexes?
Use DESCRIBE TABLE
How do you find out whether SAS is actually using the index you created?
Note the default option = N
Use OPTIONS MSGLEVEL = I before PROC SQL
It will tell you about index usage
What are the two options to control index usage and what do they do?
IDXWHERE= and IDXNAME = in parenthesis in the from clause after the table name
IDXWHERE = YES : tells SAS to choose the best index to optimize a where expression, and to disregard the possibility that a sequential search of the table might be more resource-efficient
IDXWHERE = NO : tells SAS to ignore all indexes and satisfy the conditions of a where expression with sequential search of the table
IDXNAME = index name: tells SAS to use an index that you specify
How do you delete an index?
DROP INDEX statement
proc sql;
drop index index-name
from tablename;
Can an index be created on a view?
NO
What is a VIEW?
A PROC SQL view is a stored query expression that reads data values from its underlying files. The view contains only the logic for accessing the data, not the data itself.
Create a view syntax
The rest is normal proc sql query
proc sql;
create view viewname as
select ….
;
Get information about a view
proc sql;
describe view viewname;
How do you delete a view?
proc sql;
drop view view-name;