lecture 12 - textual/content analysis + big data Flashcards

1
Q

textual documents

A

= major sources of info in IR/Polsci

  • official/public documents and records = gov., parliament, courts, parties, etc. = official documents
  • personal documents = letters, emails, diaries
  • cultural documents = mass meida (newspaper, tv), entertainment, film, literature, art, cartoons
    !important to understand how people think about politics
  • social media (big data) = twitter, facebook, instagram
  • research data: questionnaires, interview transcripts etc.

stored in archives, sometimes online databases

need for textual analysis - different qualitative and quantitative approaches (e.g. counting how often a word is used)

  • discourse analysis
  • content analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

discourse analysis vs content analysis

A

discourse =

  • interpretative
  • starts with assumptions about the real world (e.g. marxism, feminism, post-colonialism)
  • puts text in contestual context
  • is very much about the interpretation of the researcher based on how he/she thinks the world works

content =

  • systematic qualitative and/or quantitative analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

discourse analysis

A

not necessarily steps, text is used as illustration of how the world works acc to the researcher

intention of source, effects on audience and context all flow together

  • interpretive and constructivist approach: don’t assume there is an objective reality that can be observed, it is the interpretation that matters
  • idea that texts reflect underlying structures and that discourse analysis allows to find it
  • meanings are socially and discursively constructed (uncover how discursive practices construct meanings through production, dissemination, consumption of texts)
  • interaction of discourse and context is essential

e.g.
post-structuralism, speech act theory, critical discourse analysis

!is not an easy method

validity = seen as plausibility and credibility (like in ethnographic research)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

textual analysis

A

= quantitative content analysis as starting point

  1. manifest content = what is in the text (e.g. how often a word is used, a name, a topic is mentioned)
  2. description and comparison
    e.g. comparison over time
    is driven by the content that there is

but also more interpretative -> qualitative content analysis

  • intention of source
  • social, political, economic context (to give the content meaning)
  • effects on audience (can’t be told by only looking at the source) = hard for content analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

content analysis

A
  • systematic analysis of textual info
  • unobtrusive method of data collection: data is already there (no need to experiment, do survey)
    -> eliminates some threats to validity, e.g. reactivity (text can’t respond to being observed)
    -> easy access to study objects
    -> not restricted on time dimensions

quantitative = focus on manifest content (what is written literally)
- frequency and valence of words

qualitative content analysis = focus on latent content: interpretation of the meaning of words, of the context

usually: research in between quantitative and qualitative, a combination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

quantitative content analysis: definitions

A

content analysis is a research technique for the objective, systematic, and quantitative description of the manifest content of communication
(Berelson 1952)

  • objective as bias free and replicable
  • systematic: explicit and consistent (coding) rules and procedures = how to summarize etc.
  • quantitative = quantifiable, using numerical variables

= positivist approach for content analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

content analysis: steps

A

1

unit of analysis & case selection: type of text, accessibility, unit (size, section, duration)

population & sampling

  • time period
  • if population is too large, selection procedure for representative sample
    probability sampling rarely done
    nonprobability/systematic sampling, e.g. time interval, ?’virtual month’ (= taking a week from random months, so that 4 weeks are covered spread across the year -> representable (e.g. take week 1 of february, week 2 june, week 3 august, week 4 november)

categories :
- content categories
mutually exclusive & exhaustive: needs to cover all possible actors/categories + everyone needs to fit in only one category
manifest (e.g. names, roles) vs latent (e.g. words with positive vs negative meaning)
intensity (how often something is measured) vs valence (evaluations: positive, negative, needs some form of interpretation)

  1. how do we get categories= development
  • starting with theory: Deductive -> a priori codes -> closed coding (apply classes you already have to classify the text)
  • inductive -> grounded codes (dev. through observation) -> open coding (categorize whilst reading)
  1. unit of measurement
    recording unit / unit of content = not just a newspaper, will you code each article, each paragraph, each sentence?
  • physical unit = e.g. every square cm 1 code
  • symbolic units:
    syntactical (discrete units of language, e.g. words, sentence, paragraph, articles/stories) vs
    referential (physical or temporal units, e.g. every time a certain event, person or object is mentioned) vs
    thematic (topics within messages, e.g. migration)
  1. Coding:

a priori coding = defined in advance = typical approach of quantitative content analysis
- create a codebook (lists categories) with coding rules, coding categories and codes
coding by recording codes in a coding sheet

open/grounded coding = dev. by reading, coding in the text= done primarily by qualitative analysis, also quantitive analysis when wanting to make a codebook
- coding process: creating a coding protocol, creating codes to tag text, coding of content by assigning tags to text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

coding and categories

A

content categories

  • mutually exclusive: all ‘‘contents’’ need to fit in one category (something should not both be positive and negative, male and female, observer and participant e.g.)
  • exhaustive: needs to cover all possible actors/categories
  • manifest = literal (e.g. names, roles)
  • latent = upon interpretation (e.g. are words positive vs negative meaning), need to explain how you categorize
  • can measure intensity (how often something is measured)
  • can measure valence (evaluations: positive, negative, needs some form of interpretation)

how do you get categories?

quantitative = deductive - a priori codes - closed coding (apply existing codes to classify the text)

qualitative = inductive - grounded codes - open codes
(develop code through observation/reading the text)

coding:

qualitative = grounded codes = create a coding protocol (e.g. when I encounter a new political actor I will create a code/tag), tag whilst reading and then afterward classify and code/summarize the content

quantitative = ** a priori coding** = codebook (lists categories and codes), apply the codes to the category/unit you want to analyse
*use a code sheet to enter the codes in (a table with the categories, e.g. ID, newspaper and then add the codes from the codebook)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

discourse analysis fancy definition to memorize

A

interpretative and constructivist type of analysis that explores the ways in
which discourses (language, ideas, concepts, categories) give legitimacy and
meaning to social practices and institutions in a particular historical situation
or context;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

example: manifesto project

A

analyzes/summarizes party platforms (56 countries)

  • ideological positions, issues of the parties, policy positions (57policy categories)
  • manual coding of manifestos (quasi-sentences), later also using wordscores

it checks how much attention is given to certain issues + what sides are picked -> determine what parties find important, what their positions are and what their ideological position is

freely available info. for researchers

NL parties: election scatter plot lef-right and international peace dimension
US: parties over time on left-right dimension

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

content analysis: humans vs computer

A

manual content analysis

  • coders enter codes in coding sheets (hardcopy or digital)
  • time-consuming activity
  • intercoder reliability necessary (you can calculate this: see if how coders code the same texts correlates)

computer-assisted content analysis

  • qualitative: help & manage manual coding (e.g. annotating)
    e.g. safes what you enter
  • quantitative:
    dictionary-based automatic computer coding (codes when certain words are mentioned, researcher has to come up with a full dictionary in advance)
    wordscores = coding positions using reference text
    wordfish = uses statistical models to estimate the probability that words occur together to classify
    AI?

reliability vs validity

  • reliability = computer
  • validity = manual/humans
    (*computer does nothing with latent meaning + can’t add new codes, humans can)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

example: CEDS

A

= computational event data system = discontinued now

  • machine coding syste for generatng even data using pattern recognition and simple linguistic parsing
  • input: Reuters and Agence France Presse (AFP) news agency texts (Lexis-Nexis)
  • processing: identifies source/subject, verb phrase and tget/object
  • Use of coding dictionaries

e.g. conflict Israel-Palestine: whether they cooperated + where in conflict (only in 1990s: Oslo peace accrods there was cooperation)
- weekly weighted events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

content analysis: analysis

A
  • quantitative content analysis = tables, figures, statistical analysis
  • qualitative content analysis = quotation, concept maps, narrative
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

content analysis: reliability and validity

A

quantitative = intercoder reliability

  • coder stability = let same coder code the same text at different points of time
    see if the coder is consistent
  • reproducibility: different coders code the same time (using the same coding scheme) the data consistently
  • objectivity: different coders code/interpret same data consistently

reliability for qualitative content analysis = plausibility, see if the interpretation/conclusions make sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly