cadc Flashcards
phonetics
Sounds that people use in language
phonology
systems of sounds in particular languages
morphology
how words are formed
syntax
how sentences are formed from words
semantics
what sentences mean
pragmatics
how language is used in context
tokenization
taking an input and a token type and splitting the input into pieces that correspond to the type
sparsity
when data contains a lot of zeros
accuracy
share of correct classifications overall
precision
probability of a positively coded document is relevant
recall
probability that a relevant document is coded positively
F1-Score
mean between precision and recall
supervised
have labeled data, train algorithm, teach algorithm and use on new data
unsupervised
let the algorithm figure out the labels and everything
independent variables
input features
dependent variables
output class
overfitting
that an algorithm can predict a training data perfectly, but does not generalize to new data
computational social science
field of social science that uses algorithmic tools and large/unstructured data to understand human and social behavior
text analysis
a research technique for making replicable and valid inferences from texts (or other meaningful matter) to the contexts of their use
feature creation
breaking down text into the features that we want to analyze
feature transformation
involves text cleaning such as stopword removal
feature selection
frequency trimming
creation of structured data
e.g., a document-feature matrix (DTM)