SOC200 - Quantitative Analysis (Chapter 14 +15) Flashcards
QUANTITATIVE ANALYSIS
approach to analyzing social science data in which:
observations represented + manipulated numerically
to describe + explain phenomena represented by those observations
QUANTITATIVE ANALYSIS
increase of cheap processing power in recent decades has increased possibilities of quantitative analysis
QUANTITATIVE ANALYSIS
convenience has increased demand among researchers, governments for quantitatively analyzed data
Computers a must: Better tools, Execution makes it a must
Coding in Quantitative Analysis
For computers to recognize the data you want to analyze, all elements comprising your data must be assigned a distinct number
levels of data will affect type of coding type of analysis you can use
Software needs to recognize data into categories
Coding in Quantitative Analysis
Nominal, Ordinal, Interval/Ratio
Coding Nominal Level Data
can code categories of nominal data with any numbers you want, BUT you have to use analyses that will NOT treat values as count data
code each category with numbers
Missing values – signify space in data with some number: assigning nonresponse with unique code
Restricted to certain types of analysis based on data level
Can’t use mean for nominal data
Coding Ordinal Level Data
can code the categories of ordinal data with any numbers you want BUT for convenience, categories are usually coded with consecutive numbers
make it more intuitive and simple
Coding Ordinal Level Data
Though ordinal data have rank, you have to use analyses that will NOT treat the values as count data because rank may not be equal distances between each category
Same coding options as nominal
Coding Interval/Ratio Data
Each category has value comprised of continuous number
data has the most potential for analysis
possesses an inherent number that you can use for coding
Coding Interval/Ratio Data
can later collapse it into ordinal/nominal form for less sophisticated analysis # representing an actual value Still need to specify missing value: negative number because it couldn’t possibly be part of the data
Ultimate Goal of Coding in QA
To reduce broad array of info to more limited + manageable set of attributes that will make up variable
Important Guideline: coding to maintain great deal of detail helps keep your options open in a later analysis
Main Approaches to Coding in Quantitative Analysis: Approach 1
well-developed coding scheme derived from research purpose
using existing coding scheme can save you time + effort, developed by someone else
Main Approaches to Coding in Quantitative Analysis: Approach 2
Generating codes directly from observing data
inductive approach
The Codebook – The Ultimate Reference to your Data
searcher’s reference for how to code data they are collecting (when the researcher is actually collecting data)
The Codebook – The Ultimate Reference to your Data
researcher’s reference for locating variables + interpreting codes in data during analysis (when the researcher is analyzing secondary data)
Reference for what the data means
Common Codebook Contents
Variable identified with abbreviated name
Should contain full definition of variable
Should explain attributes comprising each variable
Should indicate numeric label assigned to each attribute for data manipulation purposes
Common Codebook Contents
Name: variable abbreviated
Label attached to each value
know what type of data we have (nominal, ordinal, interval)
N: Total # of cases + frequencies of each category
Percentages of frequencies
Properties: location in spreadsheet + type of data
Data “Cleaning”
Detecting, correcting/eliminating coding errors in data
Missing values that shouldn’t be there
1. Values have to be the ones specified in codebook
2. Understand your data: checking for values logically impossible given data
Contingency Cleaning
checking that cases which have logical limits to certain responses, have data that falls within those limits
Contingency Cleaning
Some programs check for errors during/after data entry
Run frequency distribution to check for outlying/odd frequencies of responses
Univariate Analysis
univariate data: single variable
univariate analysis: report distribution of cases of single variable
Three ways of presenting univariate data:
Distributions – charts/tables showing frequencies of the
categories of a single variable
Central Tendency – “typical” value in your variable
Dispersion – how close data is clustered around its
“typical” value
Distributions
Frequency Distributions: show # of cases that have each attribute of variable
Valid Percent: missing values not taken into account
Frequency Distribution as a Bar Chart
frequencies by themselves are meaningless
need some basis/context for assessing frequencies (percentage of total cases)
Easier to look at
Central Tendency
Which ones you can logically use depends on whether your univariate data is nominal, ordinal, or interval/ratio level + goal of your analysis
Mode – Nominal + ordinal: few categories
Mean Value
summing values of your observations + dividing by total # of observations
Ideal for continuous (interval/ratio level) data (age, temperature, dollars, speed, height, weight)
The Mean Value
Problem:can become inaccurate measure of typical value of variable if some cases have extreme values
mode
expressing “typical” value of your single variable
most frequently observed value
Can be used with any of the four levels of data
Median Value
value represents 50% of the cases in ranked distribution are above this value, and 50% are below it
Median Value
- Rank all cases by the value of the variable
- find the case 50% above this case + 50% below
Easy to find value when list of cases equals odd number because it will be the value of the middle case
Median Value
even number of cases - middle pair of numbers + find value half way betw them by adding them up + dividing by two
Important Note on Using Measures of Central Tendency
important to be familiar with the distribution of the data to help you decide on the most meaningful measure of central tendency. Remember, extreme values can affect the mean
Measures of Dispersion
how closely values of variable are clustered around “typical value” in the variable (a mean, median, or mode)
Scattered/clustered
How tightly distributed data
The Range
distance separating min + max
The hourly wages in Canada excluding Toronto ranged from a low of $2.00 to a high of $173.08 in 2009
Standard Deviation
standard error of sampling distribution: how closely values in sample clustered around pop mean
how closely the values in the sample are clustered around sample mean
STANDARD DEVIATION AND STANDARD DEVIATION INCREMENTS
Probability theory: certain proportion of data in the sample will fall within a certain distance from its mean value
Subgroup Comparisons and Bivariate Analysis
Subgroup comparison of data involves description of 2/more groups simultaneously for comparison purposes
Bivariate: look at relationship from 1 variable to another
Subgroup Comparisons and Bivariate Analysis
subgroup comparison is more descriptive. Bivariate analysis seeks to show empirical relationships.
tables comparing bivariate/multivariate data - contingency tables (pattern in 1 variable is thought to be contingent on other)
Table Preparation and Interpretation in a Bivariate Analysis
general agreement that independent variable will appear in columns along top row of table, while dependent variables appear in rows comprising fist column
Depends on what you are comparing
Table Preparation and Interpretation in a Bivariate Analysis
No standard agreement on displaying percentages in a bivariate table, so use the following general guideline
Tables percentaged down (each column = 100%) should be read across
Tables percentaged across (each row = 100%) should be read down
Logic of Multivariate Analysis
seeing causal/explanatory relationship
relationship betw independent + dependent variable is examined with regard to more than 1 IV
When you add 3rd variable, does it change relationship betw DV + IV
Constructing and Reading Multivariate Tables
What else could determine whether a person is employed part or full time?
Perhaps one’s student status also affects this:
Females still more prevalent in part time workers regardless of student status
Student status has a big impact on part time status