Chapter 5 Flashcards
Sample of relevant content rather than census
How selecting sample determines which statistical test (inferential or descriptive)
Social science theory
Describe people’s behaviour and mental processes
Sample: subset of units from population = represent the population
Probability samples (units randomly) - valid inferences about population.
From probability: subject to sampling error - statistical procedures help to estimate sampling error.
If non-probability: sampling error cannot be calculated.
Universe: all units being considered
Population: all sampling units the study infer
Sampling frame: actual list of units from sample
Population specified but not sampling frame: multistage sampling
Sampling Time periods
Cross-sectional studies are most popular. Sample people at one point, behaviours, attitudes etc. Appears over time
For over time periods:
Longitudinal designs are possible.
Concerns about timing of content posted online, mobile content. Lack of predicable publication cycle for web content and ability for posting make sampling from time to time more important (and difficult)
Digital distribution: time sampling problems.
Interpersonal communication through writing and phone calls - changing content with no routine.
Impact of time on internet and mobile samples is a big problem when content does not have a timestamp.
Achieved content: searched and sampling frame created.
If not this: needs to be collected as it is posted= problems that can be addressed using software to scrape internet content at randomly selected predetermined times.
=generate their own archive using software.
Make sure inference concerns content producers, time or both. (dimension of content or time) is based on probability samples.
Sampling techniques
Sampling techniques
The sample must be a probability sample. Non-prob: meaningless. No validity.
Problem: allow valid conclusions without too much time.
Census
Census
Every unit in the population is included in CA - event or series of event.
Census or sample? How best to use coders time for research goals.
If census: depends on the resources and goals: the larger the number of content units the less bias but more resources.
Non-prob sampling
Used often. Sometimes used and another sampling frame is not available.
Two non-prob sampling: convenience and purposive sampling. (mostly purposive)
Convenience samples
Using content because its available. = its a census in which the population is defined by availability rather than RQ. Population is biased representation of the universe of units.
Problems: websites may not be equivalent - difficult accessing content.
Convenience: no inference to a population but justified under 3 conditions=
- Material studies hard to find
- Resources limit the ability to generate a random sample of population. Time and money
- Researcher is exploring some under-researched but important area little is known - importance of the scholarly.
Consistent results from a large number of convenience samples: contribute to theory
Purposive sampling
Logical or deductive reasons dictated by the nature of the research project
Studies of particular publications or time
Purposive samples: requires specific research justifications other than lack of money or availability.
= consecutive unit sampling: series of content produced during a certain time period. two week period in a consecutive day sample. = important when studying continuing news. (elections)
Probability sampling
Core: equal chance of being included
Extension of logic: take many samples from the same population at one time. Best guess for the value for each of the sample means would be population mean, sample means would vary from population mean.
Infinite number of samples
Average mean of all the sample means would equal the population. All means on a graph: result would be a distribution of sample means - sampling distribution.
Any sampling distribution when an infinite number of samples taken: central limits theorem
Allow researcher to estimate the amount of sampling error in probability sample. Can calculate the probability that a particular sample mean is close to the true population mean random samples. Probability can be calculated because the mean of infinite samples will equal the population mean
Sampling error combined with sample mean
Allows a researcher to estimate pop mean (given confidence)
Best guess: sample mean or proportion. Estimate range of error in the guess.
Understanding inference from a probability sample to population is sampling error: indication of accuracy of the sample
Standard error formulas
Adjust samples SD for sample size because sample size is one of three factors that affect how good an estimate a sample mean or proportion will be sample size most important
Larger sample - better estimate of population. More cases: smaller impact of the large and small values on the mean
Affecting accuracy of sample estimate is variability of case values: homogeneity of the population.
If case values vary widely, sample will have more error in estimating the population mean or proportion
Variability results from presence of large and small values for cases. Larger the sample: the more likely case variability will decline.
The third factor: (affecting accuracy of sample estimate of population) is the proportion of the population in the sample. High proportion in sample: error will decline (sample distribution is better approx population distribution
Sample must equal or exceed 20% of the population cases before this factor in estimating sampling error.
Sampling a high proportion of a large population is not necessary to generate a representative sample.
When the percentage of population exceeds 20%, adjust sampling error using the finite population correction (fpc).
To adjust standard error for sample: standard error formula multiplied with the FPC formula
All content involves a time dimensions - concept of it concerns trend studies over periods longer than a year (natural planning)
Sampling
Simple random sampling
All units equal chance of being selected. list of all films: 100 numbers between 1 and 375
Simple random sampling: two conditions: when units are replaced in the population after they are selected and when they are not replaced. With or without replacement
Large population: small variation of probability without replacement has negligible impact on sampling error estimates. not good in all situations. if list is long then another technique is preferred.
Systematic sampling
Selecting every nth unit from the sampling frame.
(n) is dividing the sampling frame by the sample size.
Sample 1000 sentences from 10000 sentences: select every tenth sentence.
Starting point have to be random. Works well when simple random creates problems.
Can have problems under two conditions:
Listing of all possible units (if incomplete inferences cannot be done)
It’s subject to periodicity, a bias in the arrangement of units in a list. problem since a few months might not be represented in sample.
Stratified sampling
Breaking a population into smaller groups and random sampling in the groups. More homogeneous than population with respect to characteristics of importance.
Can be stratified per year - makes smaller homogenous groups that would guarantee a more representative sample.
Two purposes: increases representativeness (knowledge about distribution to avoid oversampling and undersampling)
Proportionate sampling - sample sizes from within strata based on the proportion of the population.
Sometimes: straying can increase the number of units in a study when types of units make up a small proportion of the population.
Disproportionate sampling: selecting a sample frame from a stratum that is larger than that stratums proportion of the population.
= it oversamples some units to obtain enough cases for valid analysis. No longer representative for population.
Mass media content on a regular basis: stratified advantages from known variations within these production cycles.
Stratified: required adjustments to sampling error estimates.
Sampling from homogeneous groups - standard error is reduced. S
Standard error of proportion: equals the sum of standard errors for all strata.