SOC200 - Quantitative Analysis (Chapter 14 +15) Flashcards by Darlin Veloso

QUANTITATIVE ANALYSIS

approach to analyzing social science data in which:
observations represented + manipulated numerically
to describe + explain phenomena represented by those observations

How well did you know this?

Not at all

Perfectly

QUANTITATIVE ANALYSIS

increase of cheap processing power in recent decades has increased possibilities of quantitative analysis

How well did you know this?

Not at all

Perfectly

QUANTITATIVE ANALYSIS

convenience has increased demand among researchers, governments for quantitatively analyzed data
Computers a must: Better tools, Execution makes it a must

How well did you know this?

Not at all

Perfectly

Coding in Quantitative Analysis

For computers to recognize the data you want to analyze, all elements comprising your data must be assigned a distinct number
levels of data will affect type of coding type of analysis you can use
Software needs to recognize data into categories

How well did you know this?

Not at all

Perfectly

Coding in Quantitative Analysis

Nominal, Ordinal, Interval/Ratio

How well did you know this?

Not at all

Perfectly

Coding Nominal Level Data

can code categories of nominal data with any numbers you want, BUT you have to use analyses that will NOT treat values as count data
code each category with numbers
Missing values – signify space in data with some number: assigning nonresponse with unique code
Restricted to certain types of analysis based on data level
Can’t use mean for nominal data

How well did you know this?

Not at all

Perfectly

Coding Ordinal Level Data

can code the categories of ordinal data with any numbers you want BUT for convenience, categories are usually coded with consecutive numbers
make it more intuitive and simple

How well did you know this?

Not at all

Perfectly

Coding Ordinal Level Data

Though ordinal data have rank, you have to use analyses that will NOT treat the values as count data because rank may not be equal distances between each category
Same coding options as nominal

How well did you know this?

Not at all

Perfectly

Coding Interval/Ratio Data

Each category has value comprised of continuous number
data has the most potential for analysis
possesses an inherent number that you can use for coding

How well did you know this?

Not at all

Perfectly

Coding Interval/Ratio Data

can later collapse it into ordinal/nominal form for less sophisticated analysis
# representing an actual value
Still need to specify missing value: negative number because it couldn’t possibly be part of the data

How well did you know this?

Not at all

Perfectly

Ultimate Goal of Coding in QA

To reduce broad array of info to more limited + manageable set of attributes that will make up variable
Important Guideline: coding to maintain great deal of detail helps keep your options open in a later analysis

How well did you know this?

Not at all

Perfectly

Main Approaches to Coding in Quantitative Analysis: Approach 1

well-developed coding scheme derived from research purpose

using existing coding scheme can save you time + effort, developed by someone else

How well did you know this?

Not at all

Perfectly

Main Approaches to Coding in Quantitative Analysis: Approach 2

Generating codes directly from observing data

inductive approach

How well did you know this?

Not at all

Perfectly

The Codebook – The Ultimate Reference to your Data

searcher’s reference for how to code data they are collecting (when the researcher is actually collecting data)

How well did you know this?

Not at all

Perfectly

The Codebook – The Ultimate Reference to your Data

researcher’s reference for locating variables + interpreting codes in data during analysis (when the researcher is analyzing secondary data)
Reference for what the data means

How well did you know this?

Not at all

Perfectly

Common Codebook Contents

Variable identified with abbreviated name
Should contain full definition of variable
Should explain attributes comprising each variable
Should indicate numeric label assigned to each attribute for data manipulation purposes

How well did you know this?

Not at all

Perfectly

Common Codebook Contents

Name: variable abbreviated
Label attached to each value
know what type of data we have (nominal, ordinal, interval)
N: Total # of cases + frequencies of each category
Percentages of frequencies
Properties: location in spreadsheet + type of data

Data “Cleaning”

Detecting, correcting/eliminating coding errors in data
Missing values that shouldn’t be there
1. Values have to be the ones specified in codebook
2. Understand your data: checking for values logically impossible given data

Contingency Cleaning

checking that cases which have logical limits to certain responses, have data that falls within those limits

Contingency Cleaning

Some programs check for errors during/after data entry

Run frequency distribution to check for outlying/odd frequencies of responses

Univariate Analysis

univariate data: single variable

univariate analysis: report distribution of cases of single variable

Three ways of presenting univariate data:

Distributions – charts/tables showing frequencies of the
categories of a single variable
Central Tendency – “typical” value in your variable
Dispersion – how close data is clustered around its
“typical” value

Distributions

Frequency Distributions: show # of cases that have each attribute of variable
Valid Percent: missing values not taken into account

Frequency Distribution as a Bar Chart

frequencies by themselves are meaningless
need some basis/context for assessing frequencies (percentage of total cases)
Easier to look at

Central Tendency

Which ones you can logically use depends on whether your univariate data is nominal, ordinal, or interval/ratio level + goal of your analysis Mode – Nominal + ordinal: few categories

Mean Value

summing values of your observations + dividing by total # of observations Ideal for continuous (interval/ratio level) data (age, temperature, dollars, speed, height, weight)

The Mean Value

Problem:can become inaccurate measure of typical value of variable if some cases have extreme values

mode

expressing “typical” value of your single variable most frequently observed value Can be used with any of the four levels of data

Median Value

value represents 50% of the cases in ranked distribution are above this value, and 50% are below it

Median Value

1. Rank all cases by the value of the variable 2. find the case 50% above this case + 50% below Easy to find value when list of cases equals odd number because it will be the value of the middle case

Median Value

even number of cases - middle pair of numbers + find value half way betw them by adding them up + dividing by two

Important Note on Using Measures of Central Tendency

important to be familiar with the distribution of the data to help you decide on the most meaningful measure of central tendency. Remember, extreme values can affect the mean

Measures of Dispersion

how closely values of variable are clustered around “typical value” in the variable (a mean, median, or mode) Scattered/clustered How tightly distributed data

The Range

distance separating min + max | The hourly wages in Canada excluding Toronto ranged from a low of $2.00 to a high of $173.08 in 2009

Standard Deviation

standard error of sampling distribution: how closely values in sample clustered around pop mean how closely the values in the sample are clustered around sample mean

STANDARD DEVIATION AND STANDARD DEVIATION INCREMENTS

Probability theory: certain proportion of data in the sample will fall within a certain distance from its mean value

Subgroup Comparisons and Bivariate Analysis

Subgroup comparison of data involves description of 2/more groups simultaneously for comparison purposes Bivariate: look at relationship from 1 variable to another

Subgroup Comparisons and Bivariate Analysis

subgroup comparison is more descriptive. Bivariate analysis seeks to show empirical relationships. tables comparing bivariate/multivariate data - contingency tables (pattern in 1 variable is thought to be contingent on other)

Table Preparation and Interpretation in a Bivariate Analysis

general agreement that independent variable will appear in columns along top row of table, while dependent variables appear in rows comprising fist column Depends on what you are comparing

Table Preparation and Interpretation in a Bivariate Analysis

No standard agreement on displaying percentages in a bivariate table, so use the following general guideline Tables percentaged down (each column = 100%) should be read across Tables percentaged across (each row = 100%) should be read down

Logic of Multivariate Analysis

seeing causal/explanatory relationship relationship betw independent + dependent variable is examined with regard to more than 1 IV When you add 3rd variable, does it change relationship betw DV + IV

Constructing and Reading Multivariate Tables

What else could determine whether a person is employed part or full time? Perhaps one’s student status also affects this: Females still more prevalent in part time workers regardless of student status Student status has a big impact on part time status