CL mini quiz Flashcards
1
Q
Wordlist (3)
A
- The top words (per million) are dominated by high-frequency grammatical words e.g. X
- Top content words are …
- Anything in common?
2
Q
Concordance (3)
A
- Any high frequency content words have different nuances of meaning?
- Semantic prosody: positive or negative
- any recurrent phraseological patterns or collocates?
3
Q
N-grams (2)
A
- Higher order n-grams are more useful e.g. 3-gram, 4-gram, 5-gram (frequency cut-off?)
- The N-grams ordered by frequency … contain information about
4
Q
Keywords (4)
A
- Using the X corpus as a reference corpus, and a minimum frequency cut-off point of X, the keywords were as follows…
- Keywords accounted for by various word classes e.g.
- Following generation of a keyword dispersion plot, we can see that keywords are general features of the text / concentrated at particular points e.g. global vs local
- Do any keywords “share collocational space”?
5
Q
POS tagging (3)
A
- Ideally should check manually, but time did not allow.
- The POS tags in order of keyness are as follows:
- Are the grammatical categories dominated by a single item?
6
Q
Semantic tagging (4)
A
- The most key semantic categories are as follows.
- Are the top categories semantically linked?
- Are the categories what we might expect? Any unpredictable? Metaphor not distinguished by tagger
- Semantic bin
7
Q
Overall linguistic/genre-specific nature of style and rhetoric (4)
A
Mode - spoken/written
Domain or genre - academic, political, promotional, legal
Style - non-content keywords
Producer’s intentions and intended audience