Chapter 5 Flashcards

1
Q

Sample of relevant content rather than census

A

How selecting sample determines which statistical test (inferential or descriptive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Social science theory

A

Describe people’s behaviour and mental processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample: subset of units from population = represent the population

A

Probability samples (units randomly) - valid inferences about population.
From probability: subject to sampling error - statistical procedures help to estimate sampling error.
If non-probability: sampling error cannot be calculated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Universe: all units being considered
Population: all sampling units the study infer
Sampling frame: actual list of units from sample
Population specified but not sampling frame: multistage sampling

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sampling Time periods

A

Cross-sectional studies are most popular. Sample people at one point, behaviours, attitudes etc. Appears over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For over time periods:

A

Longitudinal designs are possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Concerns about timing of content posted online, mobile content. Lack of predicable publication cycle for web content and ability for posting make sampling from time to time more important (and difficult)

A

Digital distribution: time sampling problems.
Interpersonal communication through writing and phone calls - changing content with no routine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Impact of time on internet and mobile samples is a big problem when content does not have a timestamp.

A

Achieved content: searched and sampling frame created.
If not this: needs to be collected as it is posted= problems that can be addressed using software to scrape internet content at randomly selected predetermined times.
=generate their own archive using software.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Make sure inference concerns content producers, time or both. (dimension of content or time) is based on probability samples.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sampling techniques

A

Sampling techniques
The sample must be a probability sample. Non-prob: meaningless. No validity.
Problem: allow valid conclusions without too much time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Census

A

Census
Every unit in the population is included in CA - event or series of event.
Census or sample? How best to use coders time for research goals.
If census: depends on the resources and goals: the larger the number of content units the less bias but more resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Non-prob sampling

A

Used often. Sometimes used and another sampling frame is not available.
Two non-prob sampling: convenience and purposive sampling. (mostly purposive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Convenience samples

A

Using content because its available. = its a census in which the population is defined by availability rather than RQ. Population is biased representation of the universe of units.
Problems: websites may not be equivalent - difficult accessing content.
Convenience: no inference to a population but justified under 3 conditions=

  1. Material studies hard to find
  2. Resources limit the ability to generate a random sample of population. Time and money
  3. Researcher is exploring some under-researched but important area little is known - importance of the scholarly.

Consistent results from a large number of convenience samples: contribute to theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Purposive sampling

A

Logical or deductive reasons dictated by the nature of the research project
Studies of particular publications or time

Purposive samples: requires specific research justifications other than lack of money or availability.
= consecutive unit sampling: series of content produced during a certain time period. two week period in a consecutive day sample. = important when studying continuing news. (elections)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Probability sampling

A

Core: equal chance of being included
Extension of logic: take many samples from the same population at one time. Best guess for the value for each of the sample means would be population mean, sample means would vary from population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Infinite number of samples

A

Average mean of all the sample means would equal the population. All means on a graph: result would be a distribution of sample means - sampling distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Any sampling distribution when an infinite number of samples taken: central limits theorem

A

Allow researcher to estimate the amount of sampling error in probability sample. Can calculate the probability that a particular sample mean is close to the true population mean random samples. Probability can be calculated because the mean of infinite samples will equal the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Sampling error combined with sample mean

A

Allows a researcher to estimate pop mean (given confidence)
Best guess: sample mean or proportion. Estimate range of error in the guess.
Understanding inference from a probability sample to population is sampling error: indication of accuracy of the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Standard error formulas

A

Adjust samples SD for sample size because sample size is one of three factors that affect how good an estimate a sample mean or proportion will be sample size most important

Larger sample - better estimate of population. More cases: smaller impact of the large and small values on the mean

Affecting accuracy of sample estimate is variability of case values: homogeneity of the population.
If case values vary widely, sample will have more error in estimating the population mean or proportion

Variability results from presence of large and small values for cases. Larger the sample: the more likely case variability will decline.

The third factor: (affecting accuracy of sample estimate of population) is the proportion of the population in the sample. High proportion in sample: error will decline (sample distribution is better approx population distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Sample must equal or exceed 20% of the population cases before this factor in estimating sampling error.

Sampling a high proportion of a large population is not necessary to generate a representative sample.

A

When the percentage of population exceeds 20%, adjust sampling error using the finite population correction (fpc).

To adjust standard error for sample: standard error formula multiplied with the FPC formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

All content involves a time dimensions - concept of it concerns trend studies over periods longer than a year (natural planning)

A

Sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Simple random sampling

A

All units equal chance of being selected. list of all films: 100 numbers between 1 and 375

Simple random sampling: two conditions: when units are replaced in the population after they are selected and when they are not replaced. With or without replacement

Large population: small variation of probability without replacement has negligible impact on sampling error estimates. not good in all situations. if list is long then another technique is preferred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Systematic sampling

A

Selecting every nth unit from the sampling frame.
(n) is dividing the sampling frame by the sample size.
Sample 1000 sentences from 10000 sentences: select every tenth sentence.

Starting point have to be random. Works well when simple random creates problems.

Can have problems under two conditions:
Listing of all possible units (if incomplete inferences cannot be done)
It’s subject to periodicity, a bias in the arrangement of units in a list. problem since a few months might not be represented in sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Stratified sampling

A

Breaking a population into smaller groups and random sampling in the groups. More homogeneous than population with respect to characteristics of importance.

Can be stratified per year - makes smaller homogenous groups that would guarantee a more representative sample.

Two purposes: increases representativeness (knowledge about distribution to avoid oversampling and undersampling)

Proportionate sampling - sample sizes from within strata based on the proportion of the population.
Sometimes: straying can increase the number of units in a study when types of units make up a small proportion of the population.

Disproportionate sampling: selecting a sample frame from a stratum that is larger than that stratums proportion of the population.

= it oversamples some units to obtain enough cases for valid analysis. No longer representative for population.

Mass media content on a regular basis: stratified advantages from known variations within these production cycles.
Stratified: required adjustments to sampling error estimates.
Sampling from homogeneous groups - standard error is reduced. S

Standard error of proportion: equals the sum of standard errors for all strata.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Cluster sampling

A

Lists sometimes unavailable - then using cluster, selecting content units from clusters or groups of content.

Mass media: Google news: cluster of many articles divided into topics (sports, business etc). All websites impossible, however: local websites by city (cluster for sampling when geography is important).

Cluster: allows prob selection of groups and then subgroups: random sampling within these groups would lead to specific content units

Cluster: additional sampling error, because of intra correlation.

Content units may cluster together because they are similar in nature.

Shared characteristics: positive correlation among the attributes. Easier to exclude units that have different characteristics from units selected clusters. May not be representative?

26
Q

Multistage sampling

A

Common practice involving one or several of these techniques.

Mediated content three dimensions in sampling: titles, issues or dates.

Pure multistage sampling: random sampling for each stage
can also combine techniques: as representative as possible

27
Q

Stratified sampling (legacy media)

A

Simple or stratified regards efficiency.
Legacy media produces predicable variations. If these are known:

Variations used to select a more representative sample. Variations: identification of subsets of homogeneous content (smaller stratified sample as representative)

28
Q

In daily newspapers:

A

Use constructed week - randomly selecting an issue for each day of the week.

29
Q

Two constructed weeks sufficient for representing a years content

A

One constructed week, and two constructed weeks worked even better

30
Q

2 weeks for a year of content. For longer? 5 years? Nine constructed week is representative. 2 from each year.

A

Health stories: 6 weeks instead of two. (for 1 or 5 years) (but better for 1)

31
Q

Weekly newspapers

A

Either: simple random of 14 issues or one issue randomly selected from each month (12) most efficient.
1st best when: more risky decisions has to be made, 2nd best when less risk and time and money important

32
Q

Magazines

A

One magazine randomly from each month is the most efficient. (1 year) or 14 issues from a year.

Monthly: examine all issues - long term trends > GO FOR stratified.

33
Q

Network television news

A

Random 12 weeks from 60 months, using same two weeks from each year, sampling two constructed weeks per quarter for nine years and four consecutive weeks per six months period

34
Q

Best:

A

Random select two days from each month for a total of 24 days from the year. 35 days with simple random sampling for 1 year. Use simple - stratified in media not influenced by weekly, monthly etc will introduce bias in the data.

35
Q

sampling digital content

A

New information distribution and networking systems have a great impact on individuals. Assessing population and prob samples would be difficult. from: lack of sampling frame, private areas in web, difficulties in analysing big data sets

36
Q

Digital:

A

People can communicate to large numbers (mass) and a single person (interpersonal).
RQ, access to content and cost
Varies with the type of digital content.

Digital designed for mass is easier to access. Twitter easier (snapchat difficult)

37
Q

A prime difference between websites and social networking platform

A

Is the tendency of websites to represent organizations, profit or not, instead of individuals while social networking: represent content of organisations etc. social network: large data sets compared to websites - complicated with representative sample.

38
Q

Sampling the web
Problems sampling online content:

A
  • Interactivity
  • Immediacy
  • Multimodality
  • Hyperlinks

Online: unpredictable - sampling challenges. Not use stratified and longer time frame for simple
Problem: absence of sampling frames for populations.
Digital: large number of studies on a wide range of topics samples news websites.

39
Q

Online

A

Constructed week sampling most efficient - two or more. Another > six days could represent a year. Not generalisable? More than two weeks required.

Online press: 12 constructed weeks (three per quarter)
Associated press website: 8 weeks

40
Q

Warnings

A

Be aware of the web is similar and not similar to legacy media.

Use different than traditional media, also difficult because sampling frames are not available and content changes.

Third: changing nature of web makes coding difficult. Must be captured or sampling change into consideration.

Fourth: multimedia nature can affect various study units.

Fifth: changing nature of sites makes reliability testing difficult because coders may not be coding identical content

41
Q

Process of sampling:

A

Depends on the research conceptualisation. Differs from different websites

Convenience fewer problems - not generalised.

Researches ALWAYS aware of time element of changing web content

When no exact sampling frame: use multistage

42
Q

First stage

A

Range of search engines and algorithms to generate multiple lists of sites
lists becomes a sampling frame once the duplicate sites are removed

43
Q

Second stage:

A

Select from the sites in the sampling frame. If geography important add more stages.

44
Q

Problems:

A

Search engines and algo: long lists of sites - not random.

Various engines different algo for generating the order in their lists.

Sampling frames from search results can be time consuming, and can be representative of certain sites over others.

Content on pages changes at varying rates
- use other categories other than topics to classify web pages.

45
Q

How to deal with news websites:

A

Micro-longitudinal sampling with software program: download specific components of a page (headlines) every 60 seconds from a news site. Only download changed content

46
Q

Size and complexity of web has led to development of machine learning strategies

A

Topic-specific search engines: learn from training documents. Filter from URL. their approach uses both content and structure (links) to collect web content

After filtered: take sample from the population from searching web for topics

47
Q

Sampling with databases

A

CA: enhance with increasing storage capacity. Messages: digitised, preserved and available online. Capacity: search and retrieve specific types of content from different databases.

48
Q

Data base:

A

Structured collection of data that can be easily searched and retrieved. usually text.

Visual and audio. Can be commercial, or researchers can create them. Or combination of two. What content goes into data base is decided by database creator.

Most databases: keywords - terms. It has limits: unlikely a database contains all content. Purposively organised and does not represent population

Use more than one database - are indexing and archiving software equivalent?

49
Q

Existing literature:

A

Absence of information about process used to generate content sample from databases.
Keywords: crucial in determining the ability of a sample to yield valid results. searches with one keyword: not always relevant. Use strings of keyword instead.

Researchers conduct and report formal evaluations of the recall and precision of a search string.
Recall: measures a strings capability of getting pertinent content
Precision: actually relevant to study’s goals?

50
Q

Recall: dividing relevant articles by all articles. More precise: more likely to it will miss relevant content.

Relevant content in the data: established with a protocol that has reached acceptable reliability and applied by two or more coders

Precision and recall: used to create a correction coefficient that estimates errors associated with sample from a database. how: dividing precision by recall.

A
51
Q

Correction coefficient less than 1:

A

Correction coefficient less than 1: Search string overestimated number of articles
Greater than 1: string underestimated n of articles.
Correction c is accurate for longer time periods and not for short time periods

52
Q

Using database: provide detailed description of the process

A
  • Discussion of relevant media outlets
  • Search strings reported and determine process
  • Calculate precision and recall and report in the article, and correction c
53
Q

Sampling social media

A

Sampling Twitter: greater access to messages than Facebook.

A bias toward Twitter since it over represents its social impact. Not many studies used a representative sample.

Social networking sites: examine one platform at a time = sometimes using both Twitter and Youtube can yield a more diverse video collection.

54
Q

Public organisations:

A

Good sampling frame for tweets by searching the internet

55
Q

Organisations:

A

Tend to make their messages available. when twitter interpersonal: problem. Twitter: either census or representative sample useful.

56
Q

The most used way for representative samples is tweets from Twitter: access through firehouse (API).

Firehouse provides a census within a selected stream of tweets - probability samples. (expensive)

API stream available cost free - however estimates the top hashtags but misleading for smaller numbers. How well the API streaming represents all tweets depends on the coverage of the topic and nature. (Does not change with more than one API).

A
57
Q

Algorithmic sampling

A

Uses the nodes (web pages) and edges (hyperlinks) of the web graph to generate a sample (all samples equally probable).

Learning algorithms - sampling social media - open-source software for selecting and analysis tweets (API).

58
Q

Sampling suggestions for Digital media

A

Questions for sampling content

Studying content and messages created by organizations or individuals?
Which ones?

Time period

Sampling frame available?

Frame too long for all units? If not, conduct a census

Too long, can simple random be conducted?

Stratified sampling - more representative?

If no sampling frame - sampling studies suggested ways to generate using search engines or specialised?

If representative impossible, is convenience or purpose allowed?

59
Q

Sampling individual communication

A

Mass com: being regular in their creation style - records often available.

60
Q

Individual com is complicated

A

Often convenience samples result. Scientific method is solution for inability to random sample.

61
Q

SUMMARY:

A

Appropriate selection of content depends on theoretical issues and practical problems in RQ.

Units small: a census of all content conducted.

Large: probability sample is better. (inference to the population from the sample)

Appropriate sampling depends on the nature of the RQ. Probability: necessary if using statistical inference

Efficient sampling of mass media for a given time period: stratified sampling (mass media varies systematically with time periods). CA: sampling may involve prob samples based on time, content or both.

62
Q
A