quantiative and computational text analysis Flashcards
what is quantitative text analysis ?
converting text into numerical values and use statistical analysis to identify patterns trends and relationships within text
computational text analysis ?
using automated and semi computational techniques to process analyse and interpret textual data
descriptive methods
word clouds , descriptive statistics , KWIK, KEYNESS, lexical dispersion plot
KWIK
Key words in context - list of the key words identifying the source text and the word index number within the source text - in which context do key words appear
KEYNESS
compares differential associations of key words in a target and reference group - distinguish words between two groups
LEXICAL DISPERSION PLOT
VISUALISE the occurrences of particular terms throughout the text - not only how often but where in speech it is used
lexical diversity vs density
diversity = measure of how many different words are used in a text
density = measure of the proportion of lexical items
WHEN do we use descriptive statistics for text?
-when trying to present characteristics of a corpus
-for explanatory analysis
-when comparing different texts
-trying to measure frequency of certain concepts /eval changes over time
DICTIONARY METHODS
-words that hold similar meanings and use multiple categories
-predefine words associated with specific meanings
two components - KEY - label for categories
values- multiple features associated with those categories
what do we use dictionary methods for ?
- to measure concept prevalence in text
- measure the extent to which documents belong to certain categories
-classify documents into categories - features to measure similarity
computational text analysis methods
MACHINE LEARNING - supervised classification - manually code a subset of the data
unsupervised classification
discover main themes in an unstructured corpus
-organise collection according to themes - requires no human annotation or prior info
-need to tell about number of topics
structural topic model
-allows for the inclusion of arbitrary covariates into generative model
addition of covariates - provides structure
-topics within STM can be correlated