SAS P1 L5 Flashcards
What does formatting do?
Formats control the way data is displayed.
FORMAT statement added to?
syntax?
what does it end with?
SAS has?
FORMAT statement is added to PROC PRINT
Syntax:
FORMAT variable_name format_name.;
note ends in period
SAS has lots of defined formats
How many variables per format statement?
Can specify multiple variables with same format statement, or can use separate format statement for each
What are formats, and what do they do?
Formats = instructions that tell SAS how to display data values
ex: using commas, dollar sign to display
$5,950.25 vs 5090.25
What is the form of existing formats?
how indicate character format?
field width?
decimal places?
form of exisiting formats:
{$}format{w}.{d}
where {$} indicates character format (omit if num)
and {w} indicates field width
and {d} indicates decimal places (if any)
(they use < and > instead of curly brackets)
examples of existing formats that:
- write standard char data, width w
- write standard num data, width w
- write numeric values, width w with comma sep every three digits and period before decimal fraction d
- like above but has $ before
- like above but non-US, has period every 3 digits, comma before decimal
- like above non-US, but has euro sign before
$w. writes standard char data, width w
w.d writes standard num data, width w
COMMAw.d writes num val with comma sep every 3 dig, period before decimal fraction
DOLLARw.d like COMMA but has $ before
COMMAXw.d - non-US - commas every 3 digits, period before fraction
EUROXw.d like COMMAX but with euro sign before
Formats: what happens to character values if don’t fit in specified width?
character values TRUNCATED if don’t fit in specified width
Formats: what happens to numeric values if don’t fit in specified width?
numeric values ROUNDED to fit;
commas, $ dropped if need be
How does SAS store dates?
SAS stores dates as # of days from Jan 1 1960 (=0)
Dates earlier than 1960 have negative value
How would these dates be displayed?
MMDDYY6. 0 =?
MMDDYY8. 0 =?
MMDDYY10. 0 =?
DDMMYY10. 365 =?
MMDDYY6. 0 = 010160
MMDDYY8. 0 = 01/01/60
MMDDYY10. 0 = 01/01/1960
DDMMYY10. 365 = 31/12/1960
How do you make a user-defined format? (steps)
What is format associated with?
2 steps to make user-defined format:
1) PROC FORMAT to create format
2) FORMAT variable(s) format in PROC PRINT to apply format to variable
Format created isn’t associated with particular variable, rather with values to display differently
PROC FORMAT syntax?
What keywords are used in step?
Proc Format syntax:
PROC FORMAT;
VALUE format-name value-or-range1 = ‘formated-value1’
value-or-range2 = ‘formated-value2’ ….
RUN;
Keywords: PROC, FORMAT, VALUE, RUN
PROC FORMAT format name rules:
character format name:
must start with?
numeric format name: must start with?
ending?
can only use?
can’t use?
what about period?
character format name:
must start w $ + letter or underscore
numeric format name:
must start with letter or underscore
ending: cannot end in number
can only use: letters, numbers, underscores
can’t use: SAS format names
what about period? doesn’t end with period in value statement (will specify period when refer to format name in format statement)
what can value-or-range be? (3)
value-or-range can be:
individual (‘AU’), (1)
range (‘B’-‘D’) (0-5000)
list (‘U’,’V’) (1,2,3)
value-or-range:
syntax requirements for character values? case?
numeric values?
range?
list?
value-or-range:
syntax requirements for character values:
must be in quotes,
must match case
numeric values:
do NOT use quotes
range:
hyphen separates values that define end points
list: commas separate values
How reference missing value in value-or-range set?
Example?
Use . (period) to reference missing value in value-or-range set.
proc format;
value
testvar1 0-<10=’less than 10’
10-20=’between 10 and 20’
.=’has missing value’
other=’not defined’;
run;
how write formatted value?
quotes required?
length?
usually use quotes for formatted value, but not required
Length: formatted values can be ~32k char long
what is keyword OTHER for?
Keyword OTHER = ‘whatever you want’
ex: ‘not specified’
- will display this text if no other format works
- otherwise will just display as seen in database
PROC FORMAT, PROC PRINT using FORMAT
PROC FORMAT example (country, sport)
How many formats per value statement?
How many value statements per PROC FORMAT?
PROC FORMAT;
value $ctryfmt ‘AU’=’Australia’
‘US’=’United States’
OTHER=’miscoded’;
value $sports ‘FB’=’Football’
‘BK’=’Basketball’
‘BS’=’Baseball’’
RUN;
- only one format per value statement
- multiple value statements per PROC FORMAT
PROC FORMAT, PROC PRINT using FORMAT
using FORMAT in PROC PRINT example
define format for salary, Birth_Date, Country
(use SAS formats, others previously defined)
PROC PRINT data=orion.sales label;
FORMAT salary dollar10.
Birth_Date Hire_Date MONYY7.
Country $ctryfmt.; (note period after t)
RUN;
PROC FORMAT, PROC PRINT
using FORMAT & using range of values in PROC FORMAT
=> tiers example
PROC FORMAT;
value tiers 20000-49999 = ‘Tier1’
50000-99999 = ‘Tier2’
100000-250000 = ‘Tier3’
RUN;
Notes about range of values:
are first, last values included?
how exclude from range?
first value?
last value?
both?
keywords for lowest, highest values?
character values - lowest possible value?
same for numeric?
values are inclusive
- includes first and last value
- use “<” to EXCLUDE value from range
“<-“ excludes first value
“-<” excludes last value
“<-<” excludes first and last value
use keywords “low” and “high” to specify lowest, highest value
for character values, missing = lowest possible value
NOT for numeric - does not include missing value
When using format in statement, what do you need to be careful to include?
Don’t forget the period at the end of the format name!!!
No period => SAS thinks it is another variable name, not a format.