Quantitative text analysis Flashcards
What is quant text analysis
Converting text into numerical values and use statistical analysis to identify patterns, trends & relationships within the text.
Computational text analysis
Using automated and semi-automated computational techniques to process, analyze & interpret textual data.
KWIK
Key words in context.
Return a list of the keyword, identifying the source text and the word index number within the source text.
In which context do key words appear?
Can be used to identify and extract paragraphs of interest.
Keyness
Compare the differential associations of keywords in a target and reference group.
Which words are used more by one group, relative to the other one?
The most common words are often similar, but the focus if on the words that distinguish between the two groups.
Lexical dispersion plot
Visualize the occurrences of particular terms throughout the text.
Not only how often the term is used, but also WHERE in the speech it is used
Lexical diversity
Measure of how many different words are used in a text
How rich is the vocabulary?
Lexical density
Measure of the proportion of lexical items (i.e. nouns, verbs, adjectives and some adverbs) in the text.
How complex is the text itself?
Co-occurrence
- Measuring co-occurrences of features within a user-defined context.
A document
A window within a collection of documents
Can be plotted as a co-occurrence network.
What is a dictionary
Dictionary – exclusive – one feature linked to one key
Thesaurus – not exclusive – set of features linked to one key
LIWC: Linguistic Inquiry and Word Count
- Uses a dictionary to calculate the percentage of words in the text that match each of up to 82 language dimensions
Coding scheme in dictionary
Hierarchy
First level - domain
Second level - subdomain
Other levels: may be additional sub-domains.
What are dictionaries for
Describe the text
Measure expressed concepts in documents
Identify words that separate different categories, such as policy categories
Measure how often the categories apply in the text
Supervised classification
Manually code a subset of the data
Use a supervised classifier to learn the relation between the words and the labels/categories.
Infer labels for the rest of the dataset.
Unsupervised classification
Discover the main themes/topics in an unstructured corpus
- Infer hidden variables
Structurual topic model
How are some covariates associated with the prevalence of topic usage?