Module 8 Producing Descriptive Statistics Flashcards
What is proc means?
Gets us basic statistics for numeric values
Proc means options
maxdec, max, min, mean, mode, N, NMISS, range, stdev, sum, etc
optional VARS statement
specified which numeric variables to use in the analysis
optional BY statement
performs a separate analysis for each level of variables in the list
what must we do before using BY?
SORT THE DATA!
optional CLASS statement
separate analysis for each level but no sorting needed
optional TYPES statement
specifies a combination of CLASS variables
optional TABLES statement
calculates frequencies, crosstabulation with *, all vars used in tables must be in the class statement
What does () do in type statement?
gives the descriptive statistic required across all observations in the data set
OUTPUT OUT syntax
OUTPUT OUT = data-set statistic(variable-list) = name-list
value of TYPE
depends on the level of interaction.
The observation where TYPE has a value of zero is the grand total.
Practice: Frequencies for the variables rank, grade, race, and gender
proc freq data = one;
tables rank grade race gender/ list missing;
run;
Frequencies on all CHARACTER variables in the data set;
proc freq data=one;
tables character/ list missing;
run;
one way PROC FREQ options
Must include / in the TABLES statement
LIST: display counts in list format;
MISSING: includes missing values in frequencies and percentages;
NOCUM: suppresses cumulative frequencies;
NOPERCENT: suppresses printing of percentages;
OUT = dataset: writes out a data set containing frequencies
two-way PROC FREQ options
CROSSLIST: displays crosstabulations in list forat with totals
NOCOL: suppresses column perentages;
NOROW: suppresses row percentages
Cross tabulation
use an asterisk between variables;
proc freq data = one;
tables genderraceanothervar; /* can be more than two vars*/
run;
PROC FORMAT syntax
proc format;
value format-name range1 = ‘label1’
range2 = ‘label2’;
run;
Rules for format names
** must begin with a dollar sign ($) if the format applies to character data;
** must be a valid SAS name (up to 32 characters, including $ sign if needed);
** cannot be the name of an existing SAS format;
** cannot start or end in a number;
** The only characters allowed are underscores (_);
When to use periods when doing proc format?
The format name does not end in a period(.) when specified in a VALUE statement.
However, it must be referenced with period(.) at the end when on the FORMAT statement.
proc format example
proc format;
* format for a numeric variable;
value survresponse 1=’Yes’
2=’No’
3=’Did Not Answer’;
* format for a character variable;
value $racecode ‘W’=’White’
‘B’=’Black’
‘H’=’Hispanic’;
* format that uses value ranges of a numeric variable;
value agegroup 13 - <20 =’Teen’
20 - <65 =’Adult’
65 - HIGH=’Senior’; **Note: you can use the keywords HIGH and LOW to refer to extreme values for you range;
run;