Advanced SAS Programming Techniques Flashcards
What is an index?
A file that stores values in ascending order for a variable or variables and includes information about the location of those values in the data file. It helps you access values directly rather than having SAS go down the rows until it finds a value.
Syntax to create an INDEX in a DATA statement (for a simple and composite index)
This creates two simple indexes
For a simple index:
data datasetname (index=(column1 column2/unique));
set datasetname;
run;
For a composite index:
data datasetname (index=(indexname=(column1 column2)));
set datasetname
run;
What option can you use to see information in the SAS log about index creation or index usage?
Default is n
options msglevel = i;
When is an index not used? (3)
With a subsetting IF statement in a DATA step
With particular WHERE expressions
If SAS thinks it is more efficient to read the data sequentially
Syntax to create or delete indexes on existing datasets with PROC DATASETS
NOLIST is optional. It suppresses the printing of the directory of SAS files in the SAS log and as ODS output.
proc datasets library = libref NOLIST;
modify datasetname;
index delete index-name;
index create index-specification;
quit;
Syntax to create or delete indexes on an existing dataset with PROC SQL
proc sql;
create unique index index-name
on table-name(column1, column2);
drop index index-name from table-name;
quit;
What information is printed out by the CONTENTS procedure or the CONTENTS statement in PROC DATASETS? (5)
- general and summary info
- engine/host dependent info
- alphabetic list of variables and attributes
- alphabetic list of integrity constraints
- alphabetic list of indexes and attributes
Syntax for PROC CONTENTS
Syntax for PROC DATASETS with CONTENTS statement (2)
nolist is optional and suppresses the printing of the directory of SAS files in the SAS log and as ODS output.
proc contents data=libref.dataset;
run;
proc datasets library=libref nolist;
contents data=dataset;
quit;
proc datasets nolist;
contents data=libref.dataset;
quit;
How do you list the contents of all files in a SAS library with PROC CONTENTS and with PROC DATASETS?
proc contents data=libref._all_;
run;
proc datasets library=libref nolist;
contents data=_all_;
quit;
What happens to an index if you sort the data in place with the FORCE option in PROC SORT?
The index file is deleted
How do you copy a data set to a new location using PROC DATASETS?
Using PROC COPY?
What happens if the data set contains an index?
A new index file is automatically created for the new data file
proc datasets library = old-libref nolist;
copy out = new-libref;
select datasetname;
quit;
proc copy out = new-libref in = old-libref
[move];
select datasetname;
run;
quit;
How do you rename an indexed dataset so as to preserve the index?
Note the index is automatically renamed as well
proc datasets library=libref nolist;
change old-datasetname = new-datasetname;
quit;
How do you rename variables within an indexed data set so as to preserve the index?
If you rename a variable for which there is a simple index, the index is also renamed. If the variable is used in a composite index, the composite index automatically references the new variable name.
proc datasets library=libref nolist;
modify datasetname;
rename oldvarname1 = newvarname1
oldvarname2 = newvarname2;
quit;
How to combine raw data files vertically using FILENAME?
filename fileref (‘file-path1’ ‘file-path2’);
COMPRESS function
default characters-to-remove is blank spaces
Allows you to remove characters from a string
compress(source-string, characters-to-remove);
today function
Returns the current date from the system clock as a SAS date value
INTCK function
INTCK(interval, start-date, end-date)
Returns the number of interval boundaries of a given kind that lie between two dates, times, or datetime values.
e.g.
monthnum = intck(‘month’, ‘25aug2000’d, ’05sep2000’d)
monthnum = 1
INTNX function
INTNX(interval, start-from, increment)
Gives the date that corresponds to the beginning of the next interval
e.g.
next = intnx(‘month’, ‘25aug2000’d, 1)
next = 01sep2000
Syntax for combining SAS data sets vertically using PROC APPEND
base names the data set to which you want to add obs
proc append base=sas-data-set data= sas-data-set;
run;
What is more efficient to combine two SAS data sets vertically: PROC APPEND or a DATA step? Why?
PROC APPEND
Because it only reads the data in the DATA= data set, not the BASE= data set.
The DATA step SET statement would read both data sets
What option can you use when you’re combining SAS data sets vertically using PROC APPEND and the DATA= dataset has more variables than the BASE= dataset?
Note that this will drop the extra variable
FORCE
proc append base=SAS-dataset data=SAS-dataset force;
run;
Can you append (combine vertically) datasets when a common variable has different lengths in each dataset?
What option can you use to do it?
Note that truncating may occur
No, you will get an error
Use the FORCE option
What happens when you’re appending datasets where a common variable has different data types?
How can you append the datasets?
What happens when you do this?
You’ll get an error.
Use the FORCE option
For the variable with the different data types, you’ll get missing values in the DATA= dataset.
What happens when you’re appending datasets with PROC APPEND where the BASE= dataset has more variables than the DATA= dataset?
You get missing values for the extra variable
ARRAY statement
ARRAY arrayname(#ofitems) [$] array-variables (actual-values)
e.g. ARRAY TIME(6) TIME1-TIME6 (36, 40, 10, 39, 97, 25)
Note: can use * to let SAS figure out the number of items
How do you convert from character to numeric? From numeric to character?
Character to numeric:
INPUT(“text”, informat)
Numeric to character:
PUT(number, format)
What is the max number of tables that can be joined at a time with PROC SQL?
256
bufno= option
A larger number of buffers can speed up execution time by limiting the number of input and output (I/O) operations that are required for a particular SAS data set. However, the improvement in execution time comes at the expense of increased memory consumption.
reuse = option
Tracks and recycles free space
By default, new observations are appended to existing compressed data sets. To track and reuse available space by deleting or updating other observations, use the REUSE= data set option when you create a compressed SAS data set.
Characteristics of hash object key component (4)
Characteristics of hash object data component (2)
Key component:
- might consist of numeric and character values
- maps key values to data rows
- must be unique
- can be a composite
Data component:
- can contain multiple data values per key value
- can consist of numeric and character values
FULLSTIMER Option
Specifies that SAS write to the SAS log a complete list of computer resources that were used for each step and for the entire SAS session.
Can aid in benchmarking
SASFILE statement
Opens a SAS data set and allocates enough buffers to hold the entire file in memory.
FILE statement
Specifies the current output file for PUT statements.