Lecture 3 Flashcards
How the computer changed content analysis
Databases back then - with different data
Modern society: very online based > revolution of content analysis as a method
Manuals and syntax in exam: KALPHA judges= v3.1 v3.2 v3.3/level=4/detail=0/boot=5000. Recognise what is the intercoder reliability
CHECK ONLINE
What do we mean with digital media?
Online news
Websites
Blogs
Apps
Online fora
Social media content
Other characteristics than traditional media
Research methods different
Difference
Content actually exists, finished static object. After remixes, mashup etc this is a problematic assumption.
Study the first publication? Extreme content? Quick in collecting this.
Aiming at a moving target
Digital content is ‘chaotic’
Data is - moving (position on site, relation to other articles)
Data is - changing, has different shapes and sizes. Data can grow extensively.
Data is - varying content (comments, hyperlinks)
What will you take into account? Posts, links or likes?
Sampling digital data
More complex than sampling traditional data
Unit of analysis and registration unit is often diverse
Dynamic character of data: extra challenge
More garbage unrelated content (spam, not working links etc)
Hard to recognise and exclude unrelated content from the sample on forehand (irrelevant units)
Limits to accessibility of digital data
Commercial data often protected of terms of service (TOS) Ask for permission from META
Research partnerships lack independence and not accessible for all
Facebook is now cooperating with researchers.
Proprietary (bought) data: replicability not possible!
Forums are often not public: consent required
How digitalisation changed CA: crowd coding
> traditionally coding team: researchers, students
Why not outsource to the internet crowd?
Cheaper! 2 cents per coded headline. 1500 headlines: 50 dollars.
Faster! Team of 200 coders? Coding at the same time.
More reliable! Because in a team it can be biassed: more systematic though.
Signals ambiguity in the data! = new insights in the material.
Use two coders and a third coder for cases of disagreement.
What is CCA?
Three categories
Dictionary approach
Supervised machine learning
Unsupervised machine learning
Computational content analysis, also known as ATA, CCA, CATA; CATA
What do we mean by CCA?
CCA stands for all the content analysis approaches that are aided by the computer when collecting, coding or interpreting data
The role of the computer can be modest, or substantial
Advantages of using CCA
Enables coping with data growth
Try to automate it to keep track of the information
More efficient: ACA can save time and money (developing software is time-consuming)
Computers are 100% reliable - getting a reliable coding instrument can be difficult.
Why reliable? A computer will do what is told. Treat the instruction based on the instructions you have been giving.
DIscover unknown patterns: ACA can recognise patterns not visible for human eye
Three types of CCA approaches
Deductive (rules by researcher in codebook) → inductive (rules are determined by the computer (no codebook)
Counting and dictionary (deductive) → supervised machine learning → unsupervised machine learning (inductive)
I: Counting and dictionary (We did it at school)
Rule based by researcher
Simple tasks that involve the counting of things
Examples: The number of references to a person or issue
All you need is: a searchable database (Lexis uni) and a keyword or combination of keywords
Can also be short (visibility of US president) or long
Limitations of dictionary approach:
Not suited to measure latent concepts
Dictionaries are handmade: very labour intensive!
In case of a big data with unknown characteristics, it’s not suitable: you can’t draw a representative sample
Dictionaries are topic specific: don’t work well in other domains. A sentiment for sports news is not good for financial news.
Not so popular anymore.
II: Supervised Machine Learning (in between deductive and inductive)
We do apply rules but also let’s it do itself
Basic idea: the algorithm tried to replicate human coding decisions
There is a training set, which has been manually coded
Computer studies the training set, and decisions made by researchers and tries to find patterns
Can be used to code genres, frames, sentiment, subjectivity and topics