Human and sentiment Flashcards

Question 1

Q

Tokenization

Answer

A

”The process of splitting text into meaningful elements is called tokenization.”

(Eg. splitting strings into lists)

Question 2

Q

Lemmatization

Answer

A

Converting each token into a representative lemma. For example, ‘go’ is the English lemma for words such as ‘gone’, ‘going’, and ‘went’.”
(Roden af ordet)

Question 3

Q

Part of Speech (POS): Syntactic categories of a word

Answer

A

rt/NN polakpolly/RB yesterday/NN i/FW had/VBD to/TO teach/VB my/PRP$ students/NNS in/IN under/IN hours/NNS what/WP the/DT eu/NN was/VBD and/CC why/WRB brexit/NN was/VBD happening/VBG it/PRP seemed/VBD like/IN an/DT im/NN

Question 4

Q

Coreference

Answer

A

Words with same meaning

Question 5

Q

tf–idf

Answer

A

In information retrieval, tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

Question 6

Q

Notation of td-idf

Answer

A

https://monkeylearn.com/static/dc103a13ad766591be11bca8774dfc02/e3135/image3.png

Question 7

Q

INFORMATION. RETRIEVAL

Answer

A

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections

Question 8

Q

The bag-of-words model

Answer

A

A bag-of-words is a representation of text that describes the occurrence of words within a document. It involves two things:

A vocabulary of known words.
A measure of the presence of known words.
It is called a “bag” of words, because any information about the order or structure of words in the document is discarded. The model is only concerned with whether known words occur in the document, not where in the document.

Question 9

Q

Stylometric analysis

Answer

A

Characterising writing style

Question 10

Q

Topic models

Answer

A

Finding the topics of the words in text

Human and sentiment Flashcards

(10 cards)