lecture 4 Flashcards
sources of bias
- selection phase (influences data)
- annotation (influences data)
- input representation: how language is encoded and fed to models
- models
- research design
importance of data
- datasets form the basis of model training, evaluating, and benchmarking
- the ways in which we collect/construct/share these datasets inform the kinds of problems the field pursues and the methods explored in algorithm development
- good quality data ensures models perform well, are fair, and can be generalized across various contexts
text classification
corpora help us with text classification
goal: assign a label or category to a specific piece of text
why use text classification
- categorize language at word, sentence, and document level
- predict future outcomes
- find patterns
sentiment analysis
goal: predict the sentiment expressed in a piece of text (+, - , scale rating)
why is sentiment analysis hard
- sentiment is a measure of a speaker’s private state, which is unobservable
- sometimes words are a good indicator of sentiment, but many times it requires deep world + contextual knowledge
other text classification problems
- language identification: which language the text is in
- spam classificiation
- authorship attribution
- genre classificiation
- senitment analysis: understanding public opinion
questions when building a sentiment classifier
- what is the input for each prediction (e.g., sentence, text, etc.)
–> requires substantial data - what are the possible outputs (e.g., +, -, scale)
- how will the model decide (model decision mechanism)
- how to measure effectiveness (evaluation metrics)
–> requires substantial data
data-driven evaluation
choose a dataset for evaluation before you build a system
why is data-driven evaluation important
- controlled experimentation
- benchmarks: serve as reference points to evaluate the performance of a system
- your intuitions about inputs are probably wrong
where to get a corpus
- many corpora are prepared specifically for linguistic/NLP research with text from providers
- collect a new one by scraping websites
gold labels
annotations used to evaluate and compare sentiment analyzers
these can be
1. derived automatically from the original data artifact (metadata such as starratings)
2. added by human annotator who reads the text (but how to address trouble with deciding and agreeing between annotators)
sentiment analysis training data
(X,Y) pairs to learn h(X)
–> (input, output)
–> relies heavily on accurately labeled data
–> this is text classification
accuracy
- #correct / #total
- simplest measure
- not a good measure when there are class imbalances: when a classifier always predicts the majority class it will seem accurate but is ineffective in reality
- doesnt show the quality of predictions
confusion matrix
- gives more detailed insight into classification
- used for precision, recall, F1 score
precision
- accuracy of positive predictions (how often is my prediction correct)
- TP/ (TP + FP)
- measure of quality
precise model
might not find all positives, but the ones that the model does classify as positive are very likely to be correct