Ch11 Quantitative analysis Flashcards
Codebook
a document that describes the procedure for coding variables and their location in a format for computers
4 ways to get quantitative raw data into computer
- code sheet - paper with printed grid - record info so it can be easily entered
- Direct entry - a method of entering data into a computer by typing data without code or optical scan sheets
- optical scan - gather the information then enter it into optical scan sheets by filling the correct dots
- bar code - gather the information, then convert it into different widths of bars that are associated with specific
possible code cleaning
cleaning data using a computer in which the researcher looks for responses or answer categories that cannot have cases. also called wild code checking
contingency cleaning
cleaning data using a computer in which the researcher looks at the combination of categories for two variables for logically impossible cases
descriptive statistics
describe numerical data
univariate statistics
one variable - easiest way to describe numnerical data of one variable is with a frequency distribution = used with any type of data
bimodal
a distribution with two modes
multimodal
distribution with more than one mode
skewed distribution
distribution of cases among the categories of a variable that is not normal
z scores
compare two or more distributions or group - standardized score
number of standard deviations it is above or below the mean
bivariate statistics
only involve two variables
correlation
means that things go together are associated
independence
opposite of correlation - no association between two variables
scattergram/ scatterplot
A diagram to display the statistical relationship between two variables based on plotting each case’s values for both of the variables
Precision
Amount of spread in points on the graph
Cross-tabulation
placing data for two variables in a contingency table to show the number or percentage of cases at the intersection of categories of the two variables
contingency table
a table that shows the cross-tabulation of two or more variables.
it ususally shows bivariate quantitative data for variables in the formof percentages across rows or down columns for the categories of one variable.
Three ways to percentage a table
by row
by column
the total - total columns or marginals
measure of association
a single number that expresses the strength, and often the direction of a relationship.
it condenses information about a bivariate relationship into a single number.
5 measures of association - gamma
used for ordinal level data
based on comparing pairs of variable categories and seeing whether a case has the same rank on each -1 to 1 and 0 is no association
5 measures of association -lambda
nomial level data
it is based on a reduction in errors based on the mode and ranges between 0 - nothing and 1 - strongest possible relationship
5 measures of association -tau
ordinal level data
Takes care of problems that occur with gamma
several statistics named tau and one is kendalls tau -1 to 1 0 = nothing
5 measures of association -rho
Pearson’s product moment correlation coefficient
when they use the term correlation
can only be used for interval and ratio
Used for the mean and SD
5 measures of association -chi squared
two different uses
Can be used as a measure of association in descriptive statistics
or can be used in inferential statistics
Assumptions of linerarity
Pearsons - two variables assumed linear
many relationships that are not linear
control variables
a third variable that shows whether a bivariate relationship holds up to alternative explanations
it can occur before or between other variables
trivariate tables
meet conditions for causality - control for - get rid of an alternative explanation for a causal relationship
trivariate tables - consist of multiple bivariate tables - has a bivariate table of the independent and dependent variable for each category of the control variables - new tables called partials - tables that show the association between the independent and dependent variables for each category of a control variable
trivariate tables have three limitations
- difficult to interpret if a control variable has numerous categories
- control variables can be at any level of measurement, but interval or ratio control variables must be grouped and how cases are grouped can affect the interpretation of effects
- total number of cases is a limiting factor because the cases are divided among cells in partials
Linear Regression
interval/ ratio level data
controls for many alternative explanations and variables simultaneously
widely used in social science
Tells reader 2 things:
1. the results have a measure called R^2 - which tells how well a set of variables explains a dependent variable - independent variable accounts for a large percentage of variation in a dependent variable
- the regression results measure the direction and size of the effect of each variable on a dependent variable
Univariate
frequency distribution
measure of central tendency
standard deviation
z-score
purpose is to describe one variable
bivariate
correlation
percentage table
chi-square
Purpose - describe a relationship or the association between two variables
multivariate
see how several independent variables have an effect on a dependent variable 7
statistical significance
results not likely due to chance
a way to discuss the likelihood of finding a statistical relationship in a sample is due to random factors rather than due to the existence of an actual relationship in the entire population
Type 1 error
relationship exists when in fact none exists
type 2 error
when a researcher says a relationship does not exist but in reality it does - falsely accepting null hypothesis