Module 8: Producing Descriptive Statistics Flashcards
What is PROC MEANS?
A procedure that helps get basic statistics for the numeric variables in a data set.
List some options for proc means.
Maxdec = n, missing, max min, median, mean, mode, n, nmiss, range, stdev, sum
What do the options Maxdec = n, Missing, N, and Nmiss do?
Maxdec: n rounds the values to n decimals
Missing: treats missing values as a seperate group of values
N: number of non-missing values
Nmiss: number of missing values
What are optional statements that you can use in proc means?
var: specifiies varaibles to use
by: seperates into levels by by-variable(s)*
class: also performs separate analysis for each levels**
types: used to specify a combination of CLASS variables to produce
output out=: data-set output-statistic list
*needs to be sorted
**note that all type var need to be in class
To save out the output to a data set, you need to:
1) Use a noprint option in the proc means statement
2) Specify the descriptive statistics required on the OUTPUT statement
What is the type automatic variable?
Value displaying the type of interation. And zero value is the grand total.
What is the difference between one-way and two-way frequencies?
One-way freq counts are for 1 variable
Two-way freq counts are for 2 variables
Why do we use PROC FREQ?
To create tables showing the distribution of categorical variables in a data set.
How do you add options to PROC FREQ? Also list those options.
Option must appear after a slash (/) in the TABLES statement
Options: list, missing, nocum nopercent, and out =
What do these options do: list, missing, nocum, nopercent.
List: displays couns in list form
Missing: includes missing values in freq and percentages
Nocum: supresses cumulative freq
Nopercent: suppresses printing of percentages
Write the general syntax for multi-way freq/cross tabulations with PROC FREQ.
proc freq data=dataset;
tables var1*var2;
run;
What are the specific options for multi-way freqs?
Crosslist: displays crosstabulations in list format with totals
Nocol: supresses column percentages
Norow: supresses row percentages
What is the general syntax for user defined formats?
Proc format;
value format-name range = ‘label’
Note: semicolon goes at the end of the LAST range and label…
when formating multiple ranges in one value code.
What are the rules for format names?
1) Must begin with a $ if the format applies to character data
2) Must be a valid SAS name (up to 32 characters)
3) Cannot be the name of an existing SAS format
4) Cannot stort or end in a number
5) Only characters allowed are underscores
Write a proc format for numeric variables, character variables, and one with value ranges of a numeric variable.
proc format;
1) value survresponse 1 = ‘Yes’;
2) value $racecode ‘W’ = ‘White’;
3) value agegroup 13 - <20 = ‘Teen’;
Note: can use 65 - HIGH in range to indicate extreme values (or LOW)