Exam 1 Flashcards

1
Q

What does $ in an input statement mean

A

categorical variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does “cards” mean

A

have to use this or datalines (datalines is used in the infile statement) when using list input to read internal raw data; tells sas to receive data values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does dlm in the infile statement do

A

tells sas the delimiter for the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does NOOBS in proc print do

A

clears the observation # column in the output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

infile statement

A

tells sas where it will be reading your data from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what does obs do in proc print ex- (obs=20)

A

tells sas to print the first 20 observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

proc print data=x (firstobs=11 obs=20)

A

prints 11th=20th observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what does firstobs do in the infile statement

A

starts reading data from the second row of your datafile (use if the first row of the file is just variable names)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does using var statement in proc print do

A

will only print your selected variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what does varnum in proc contents do

A

puts the variables in creation order, rather than in the default alphabetical order; can make finding the variable easier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

proc sgplot data=;
histogram salary /showbins binwidth=5000;
run;

A

creates a histogram with salary on the x-axis, markings at the mid point of each bin
bin width specifices the binwidth, sas will determine the number of bins unless you use the nbins options

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

proc sgplot data=
scatter x= y=
/group=gender
run

A

creates a scatter plot with x and y variables on their respective axes, group the data by the gender variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

vbar

A

similar to a histogram. options can be explored in 8.2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

dsd

A
  1. ignores delimiters enclosed in quote marks
  2. treats 2 delimiters in a row as a missing value
  3. does not read quote marks as part of the data
    - assumes the dlm is a comma
    - prudent to use missover in case there is missing dat at the end of the dataline
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

missover

A

tells sas that if it runs out of data, don’t go to the next line, assign missing values to any remaining variables in that dataline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

truncover

A

need this when reading in data in column or formatted input and some datalines are shorter than others
-tells sas to read data for the variable until the end of the dataline or the last column specified in the format or column range (whichever comes first)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

differences between missover and truncover

A

both will assign missing values if the dataline ends before the variable’s field starts
-but when the dataline ends in the middle of a variable field, truncover will take as much as there is, whereas missover will assign a missing value to the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

dlm=’09’x.

A

specifies that it is a tab delimited file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what does sum in proc print do

A

will print the sum of the variable specified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does short in proc contents do

A

will only output the variable names

21
Q

what does @ do in the input statement

A
  • uses pointers to read in external raw data
  • tells sas the beginning column of the variable
  • all values of the variable must be aligned and space delimited to use the pointer
  • the default length of the variable is still 8
22
Q

column range input method

A
  • tells sas the length of the variable
  • good to use if the variable longer than 8 units in length
  • can make the data take up less space for shorter variables
23
Q

what kind of files can infile read

A

flat files (.txt, .csv, .dat)

24
Q

how sas stores date values

A

stores dat as the number of days from Jan. 1, 1960

25
Q

how sas stores time values

A

the number of seconds since midnight

26
Q

how sas stores datetime values

A

the number of seconds between midnight on Jan 1, 1960 and the given date and time

27
Q

$w. informat

A

reads in character data
w specifices the width
.d would specify decimal points (just use “.” with no d because it is a character variable)

28
Q

w.d informat

A

reads in standard numeric data
w is the total width from first to last number after the decimal
d is the number of digits to read in after the decimal point

29
Q

COMMAw.d informat

A

reads in numeric values and removes embedded commas, blanks, dollar signs, %, dashes, and right parentheses from the input data

  • converts a left parenthesis to a minus sign (ex (500) input varname comma5. turns into -500)
  • writes the number with comma separating every 3 digits
30
Q

DOLLARw.d informat

A
  • similar to commaw.d, will write numbers with a leading $
31
Q

DATEw. informat

A

reads in data values in the form ddmmmyy or ddmmmyyy
ex 16mar99 use date7.
16mar1999 use date9.

32
Q

DDMMYYw. informat

A

reads in date calues in the form ddmmyy or ddmmyyy
ex 160399 use DDMMYY6.
ex 16/03/99 use DDMMYY8.
ex 16031999 use ddmmyy8.
ex 16/03/1999 use ddmmyy10.
* if it were 03/30/1999 you would use mmddyy10.
*never would be four y’s just use two

33
Q

TIMEw. informat

A
reads hours, minutes, and seconds in the form hh:mm:ss.ss
ex 10:13 PM use TIME8. 
ex 11:23:07.40 use TIME11.2
ex 11:23:09.40 PM use TIME14.2 
*count the spaces
34
Q

DATETIMEw. informat

A

reads in datetime values as ddmmmyy hh:mm:ss.ss

ex 16mar1997/11:23:07.40 use datetime21.2

35
Q

where to use informat in the data step

A
  • can use it before the input statement for reading internal raw data
  • can use in the input statement
  • can use before the infile statement when reading in external raw data
36
Q

where to use format statements

A

*can specify in proc print- but this won’t change how the data is stored in the view table
* can use in the data step after informat and before input and it will store the formats
*

37
Q

proc format

A
  • this is where you can create user defined formats

* create and store the format in proc format, use in proc print with a format statement

38
Q

by statement and proc sort

A
  • for proc sort, you have to specify an out data set to save the sorted data
  • sort variable first in proc sort then can use the sorted variable in proc print
39
Q

using the where statement

A

use where to subset the data in proc print

can also be used in the data step

40
Q

to modify a sas dataset

A

have to use the set statement

41
Q

subsetting your data

A
  • where chooses the observations you want from an existing dataset
  • output the observations that you want to a new dataset
  • delete the observations you don’t want so they don’t appear in a new dataset
  • keep variables you want
  • drop varaiables you don’t want
42
Q

if then

A

can use if then statements with output/delete to subset the data

43
Q

if then else

A

most efficient way to use the statement with mutually exclusive observations of the variable

44
Q

renaming vs labeling

A
  • renaming will change the name of the variable and is what you need to reference when you reference variables
  • labels are what will show up in the data table, easiest practice is to just remove the labels (label varname “ “)
45
Q

proc means

A
  • can use options to get certain statistics (n nmiss std mean q1 q3)
  • can specify certain variables you want statistics ran for
  • can sort and then use the sorted variable in proc means to get grouped analysis
  • if you don’t want to sort, you can use the class statement to do a grouped analysis
46
Q

Combining datasets by stacking

A
  • just use set statement and it will stick the datasets on top of each other in the order you specified
  • increases the number of observations by combining vertically
  • good for when datasets are structured the same but have different observations
  • problems with stacking- need variables to have same format (numeric vs character) - length of the variable needs to be the same in both datasets, if it isnt you need to set the dataset with longer length variable first or it will get truncated (can also use a format statement to define the length before the set statement)
47
Q

Combining datasets by merging

A
  • merges horizontally
  • good for when you need to combine datasets that have different variables
  • have to sort by the unique identifying variable, make sure none of the variable names in the two datasets are the same, then combine by the sorted variable
  • use of in option to see which observations are a part of which datasets
48
Q

proc freq

A
  • creates a contingency table (1 or two way)

* review what each of the four values in the table mean