skills Flashcards
01 rate versus percent
Rate is per capita, and used to analyze groups of different size. It is per person versus per one hundred people. You divide the number of observations by the total number in the population.
1 how to identify the individuals and variables in a dataset
- Individuals are the objects described by a set of data. Individuals may be people, but they may also be animals or things.
- A variable is any characteristic of an individual. A variable can take different values for different individuals.
1 how to identify categorical VS quantitative variables (and the units of measurement for each quant. var.)
quants are numbers that can be measured, categories can contain words and choices
1 • SPSS • how to make a pie chart
SPSS
- Graphs → Chart Builder
- Gallery → Pie/Polar; Pie Chart → Chart Preview
- drag to determine “slice by” and “angle variable” in chart preview
- In Element Properties, under Statistics, “Statistic:” select “value”
- (Apply if necessary)
- Done
- Double click to edit in Chart Editor
1 how to recognize when a pie chart can and cannot be used
A pie chart must include all the categories that make up a whole. Use a pie chart only when you want to emphasize each category’s relation to the whole.
1 • SPSS • how to make a bar graph of a cat. var. dist.
SPSS
- Graphs → Chart Builder
- Gallery → Bar; Bar Chart → Chart Preview
- Drag to axes
- Element Properties → Statistics → Statistic → Value
- double click to open chart editor
- Properties → Categories
1 • SPSS • how to make a histogram of a quant. var. dist.
- Variable view
- Analyze –> Descriptive Statistics –> Frequencies
- Move desired variable to “Variables” –> click CHARTS
- Under chart type, select histogram
- optional: select show normal curve
- Continue –> OK
1 How to describe skewness
- A distribution is skewed to the right if the right side of the histogram(containing the half of the observations with larger values) extends much farther out than the left side.
- It is skewed to the left if the left side of the histogram extends much farther out than the right side.
- the mean moves towards the skew. a larger mean is right skewed and positive, and vice versa.
1 how to describe a histogram
shape, center and variability or spread. roughly symmetric, distinctly skewed or neither. Existence of outliers.
1 •SPSS • how to make and assess a stem plot,
incl. round leaves and split stems
MAKE
- Analyze –> descriptive statistics –> explore
- move variable to dependent list –> click plots
- Select “Stem and leaf” under descriptive –> continue –> OK
ASSESS
- how many peaks does it have?
- if a long tail goes (down) toward the larger numbers it is “right-skewed”
1 • SPSS • how to make a time plot, and recognize trends and cycles
SPSS
- Graphs –> Chart Builder
- Gallery: select scatter/dot
- Then drag simple scatter to chart preview section
- define axes by dragging (time unit goes to X)
- THEN ADD INTERPOLATION LINE
- click button in attachment in chart editor
- select straight from line type
2 • SPSS • how to find the mean x̄ and standard deviation s for a set of observations
MANUAL
basically the average
SPSS
- Analyze –> descriptive statistics –> frequencies
- moved desired variable to “variables,” –> click “statistics”
- Under percent values, click quartiles, and all other checkboxes needed
2 how to find the Median M and quartiles Q1 and Q3 for a set of observations
To find the median
- arrange all observations in order of size from smallest to largest
- if the number of obs. is ODD, the M is the center obs.
- if the n is EVEN, the M is midway btw the two center obs
- or use (n+1)/2 <– note: this just gives the location of the median
2 know whether the mean or the median are resistant, and know what skewness does
the median is more resistant that the mean, and skewness moves the mean away from the median toward the long tail
2 how to assess a box plot
- look at center, spread, symmetry and skewness
2 • SPSS • how to find the five number summary and draw a box plot
to find the five number summary
- list the lowest and highest observation, the median, and then the medians of each half
To make a box plot in SPSS
- Variable view
- Analyze –> Descriptive Statistics –> Explore
- Move variable to dependent list, click plots
- Under Boxplots –> Factor levels together
2 know how to find the variance and standard deviation
- the standard deviation and variance measure the variability by looking at how far the observations are from their mean
- find the variance by averaging the squares of the difference between each observation and the mean and then dividing by the number of observations minus one
- the standard deviation is the square root of the variance
- you divide by n-1 because the deviations always sum to 0 and you need to not be dividing by zero (kind of)
2 know the basic properties of standard deviation
- s ≥ 0 always
- s = 0 only when all observations are identical and increases as the spread increases
- s has the same units as the original measurements
- s is pulled strongly by outliers and skewness
3 how to approximately locate the mean and median on a density curve
- the median is the equal areas point
- the mean is the balance point
3 how to use the 68-95-99.7 rule and symmetry to state what percent of observations of a normal distribution fall between two points.
- The mean and standard deviation will be listed as N(?,?)
- 68% will fall within one standard deviation σ
- 95% will fall within 2σ
- 99.7% will fall within 3σ
3 how to find the z-score or “standardized value,” and what it means for a normal distribution
The z-score, or standardized value, tells us how many standard deviations the original observation falls away from the mean, and in which direction.
to find it you subtract the mean of the distribution from X and then divide by the standard deviation, as attached.
3 how to calculate the proportion of values above, below or between certain numbers when given a stated mean μ and standard deviation σ
- state the problem and draw a picture
- use Table A backward. Find the given proportion in the body of the table and then read the corresponding z from the left column and top row.
- unstandardize z back to X… x lies the z amount standard deviations away from the mean, so : x = (the mean) plus (the standard deviation) times (the z-score that we are given)
3 • SPSS • how to determine area to the left in a normal distribution (i.e. probability a score will fall within a certain range)
SPSS
- new data set, variable view
- create variable “[variable]” you want to see what is to the left of
- data view, enter desired area to the left (e.g. 0.95 for 95th%)
- Transform –> compute variable
- Target variable: Prob
- Function Group: CDF & Noncentral CDF
- Functions and Special variables: Cdf.Normal
- CDF.NORMAL(?,?,?) <– first ? =”[variable]”, Mean and SD
- Click OK
- You get a two decimal amount, click to see more decimals
3 • SPSS • how to determine percentiles for normal distributions
SPSS
- new data set, variable view
- create variable “area”
- data view, enter desired area to the left (e.g. 0.95 for 95th%)
- Transform –> compute variable
- Target variable: “percentile”
- Put cursor in Numeric Expression then Function Group menu select “Inverse DF”
- From Functions and special variables menu, double -click “Idf.Normal”
- “IDF.NORMAL(?,?,?)” will appear in Numeric Expression box
- select first question mark and place the variable “area” there
- Replace second question mark with Mean
- Replace third question mark with standard deviation
- Click OK
- Data editor will now display percentile
3 how to use the standard normal table
- draw a picture of the distribution (any area to the right ( x ≥ ? ) are 1-area to the left)
- Standardize. The proportion minus mean mu divided by standard deviation sigma = a number
- find that number on the table BUT if it’s an x ≥ problem, then subtract that number from 1.
- to find sections, repeat above and do the math converting x to z or see 3.7
3 how to calculate the point having a stated proportion of all values above it or below it, when given a stated mean µ and standard deviation σ
4 how to identiify explanatory VS response variables
- A response variable measures an outcome of a study. An explanatory variable may explain or influence changes in a response variable.
4 • SPSS • how to make a scatterplot to display the relationship between two quant. variables and know which scale to put the explanatory variable on.
SPSS
- Analyze → Correlate → Bivariate
- move variables into window
- Graphs → Legacy Dialog → Scatter/Dot
- Simple Scatter → Define
- put explanatory on X, response on Y, OK
- Elements → Fit Line at Total
- Analyze → Regression → Linear
- put explanatory in independent, response in dependent, OK
4 how to describe a scatterplot
you describe its direction, form and strength, positive or negative association and outliers
4 how to add a categorical variable to a scatterplot by using a different plotting symbol or color
when making the scatterplot, after you determine the axes, move a categorical variable into the “set markers by” box, go into chart editor and select the little legend symbols to edit shape and color
4 • SPSS • how to find the correlation r
correlation measures the strength and direction of the linear relationship
SPSS: look at regression cards
- The values for each individual are x1 and x2, y1 and y2, etc
- the mean is x̄ (or y-bar)
- the standard deviation is sx and sy
- so correlation is (([the first x individual value] minus [x’s mean] divided by [x’s standard deviation]) + the next and next etc) divided by x’s standard deviation, then also for y, added all up ALL divided by the number of individuals n-1
4 how to judge whether it’s appropriate to use correlation to describe the relationship between two variables
5 how to draw a graph of a regression line when you are given its equation
5 • SPSS • how to use the regression line to predict y for a given x, and also recognize extrapolation
- Analyze –> regression –> Linear
- Response variable –> dependent, explanatory –> independent
- save… Predicted values “Unstandardized”
- Click OK for basic linear regression output
- “Model Summary” (2nd table): R = absolute value of small r… R Square = the square thereof and stanard error
- Coefficients (4th (bottom) table) “1 (Constant)” at “B” is the Y intercept
- “[Explanatory variable] at “B” is the slope
- back to dataset see a new column of predicted values based on slope
5 • SPSS • how to explain what the slope b and the intercept a mean in the equation
ŷ=a +bx
Use SPSS to calculate (resression cards has it)
- b is the slope, i.e. the amount by which Y changes when X increases by 1
- a is the y-intercept, the value of Y when X = 0
- find the slope and intercept attached
5 how to use a calculator to find the least squares regression line of a response variable y on an explanatory variable x
5 • SPSS • how to find the slope and the intercept of the least squares regression line from the means and standard deviations of x and y and their correlation
- Analyze –> regression –> Linear
- Response variable –> dependent, explanatory –> independent
- Click OK for basic linear regression output
- “Model Summary” (2nd table): R = absolute value of small r… R Square = the square thereof and stanard error
- Coefficients (4th (bottom) table) “1 (Constant)” at “B” is the Y intercept
- “[Explanatory variable] at “B” is the slope
5 • SPSS • how to calculate the residuals and plot them against the explanatory variable x.
- Analyze –> regression –> Linear
- Response variable –> dependent, explanatory –> independent
- save… –> unstandardized
- Click OK for basic linear regression output
- “Model Summary” (2nd table): R = absolute value of small r… R Square = the square thereof and stanard error
- Coefficients (4th (bottom) table) “1 (Constant)” at “B” is the Y intercept
- “[Explanatory variable] at “B” is the slope
- back to dataset a new variable has been added
- graph –> legacy dialogue –> scatter
- Simple –> explanatory on the X, new residuals on the Y
5 how to use r<em>2</em> the square of the correlation, to describe how much of the variation in one variable can be accounted for by a straight line relationship with another variable
5 recognize lurking variables
a variable that is not included as an explanatory or responsevariable in the analysis but can affect the interpretation of relationships betweenvariables
8 how to identify the population in a sampling situation
8 how to recognize bias
often due to voluntary response and other inferior sampling methods
8 recognize the presence of undercoverage and non-response, as well as poor wording, in a sample survey
8 • SPSS • how to use software or table B of random digits to select an SRS from a population
SPSS
- Data –> Select Cases
- select “Random Sample of cases. click Sample
- decide whether you want a percentage of fixed number of cases, continue
- Decide on “output”
- makes a filter variable column with zeroes for filtered or 1’s
- go to variable view and rename filter to variable 1 or whatever
8 • SPSS • how to use software or table B of random digits to select a stratified random sample from the population when the strata are identified
SPSS
- assign a random number to each subject
- Transform –> compute variable
- Function Group “Random Numbers”
- Functions box: Rv.Uniform (creates (?,?) (create random numbers that fall between these two numbers (0,1)
- Target variable: “random”
- Dataset now has new variable, now sort
- Data –> sost cases –> move radnom to sort by
- make new variable “treatment group” put a 1, 2, 3 next to equal amounts of subject (i.e. first ten second ten third ten)