Human and sentiment Flashcards

1
Q

Tokenization

A

”The process of splitting text into meaningful elements is called tokenization.”

(Eg. splitting strings into lists)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Lemmatization

A

Converting each token into a representative lemma. For example, ‘go’ is the English lemma for words such as ‘gone’, ‘going’, and ‘went’.”
(Roden af ordet)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Part of Speech (POS): Syntactic categories of a word

A

rt/NN polakpolly/RB yesterday/NN i/FW had/VBD to/TO teach/VB my/PRP$ students/NNS in/IN under/IN hours/NNS what/WP the/DT eu/NN was/VBD and/CC why/WRB brexit/NN was/VBD happening/VBG it/PRP seemed/VBD like/IN an/DT im/NN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Coreference

A

Words with same meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

tf–idf

A

In information retrieval, tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Notation of td-idf

A

https://monkeylearn.com/static/dc103a13ad766591be11bca8774dfc02/e3135/image3.png

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

INFORMATION. RETRIEVAL

A

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The bag-of-words model

A

A bag-of-words is a representation of text that describes the occurrence of words within a document. It involves two things:

A vocabulary of known words.
A measure of the presence of known words.
It is called a “bag” of words, because any information about the order or structure of words in the document is discarded. The model is only concerned with whether known words occur in the document, not where in the document.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Stylometric analysis

A

Characterising writing style

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Topic models

A

Finding the topics of the words in text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly