Chapter 1 Flashcards

Takeaway of chapter1

1
Q

Load all data in NLP ?

A

>>> from nltk.book import *

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do yo find out about ant texts in NLP ?

A

just type text1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Concordance :?

A

An alphabetical list of the words (especially the important ones) present in a text or texts, usually with citations of the passages concerned or with the context displayed on a computer screen. ‘a concordance to the Bible’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is concordance in Python?

A

.concordance() is a method in the Text class of nltk Basically, if you want to use the .concordance(), you have to instantiate a Text object first, and then call it on that object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A concordance permits us ?

A

A concordance permits us to see words in context. For example, we saw that mon- strous occurred in contexts such as the ___ pictures and the ___ size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What other words appear in a similar range of contexts?

A

We can find out by appending the term similar to the name of the text in question, then inserting the relevant word in parentheses: >>> text1.similar(“monstrous”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Common_context?

A

The term common_contexts allows us to examine just the contexts that are shared by two or more words, such as monstrous and very. text2.common_contexts([“monstrous”, “very”be_glad am_glad a_pretty is_pretty a_lucky

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to find location of word in a text?

A

Can be done using dispersion plot E.g >>> text4.dispersion_plot([“citizens”, “democracy”, “freedom”, “duties”, “America”])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to Generate Random Text?

A

The method generate random text from the provided corpus. E.g here text3 . text3.generate()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The generate() method is not available in NLTK 3.0? why

A

but will be reinstated in a subsequent version.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

is generate truly random?

A

Although the text is random, it reuses common words and phrases from the source text and gives us a sense of its style and content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a token?

A

A token is the technical name for a sequence of characters — such as hairy, his, or :)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how to get length of text?

A

len(text3) : We use the term len to get the length of something,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is vocabulary of a text?

A

The vocabulary of a text is just the set of tokens that it uses, since in a set, all duplicates are collapsed together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is python command to find vocabulary of tex?

A

In Python we can obtain the vocabulary items of text3 with the command: set(text3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is word type?

A

A word type is the form or spelling of the word independently of its specific occurrences in a text — that is, the word considered as a unique item of vocabulary. Also called distinct word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to calculate a measure of the lexical richness of the text?

A

□ example shows us that the number of distinct words is just 6% of the total number of words, or equivalently that each word is used 16 times on average len(set(text3)) / len(text3) 0.06230453042623537

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Count on particular word?

A

Count how often a word occur and the percentage of word in text text3.count(“smote”) 5 >>> 100 * text4.count(‘a’) / len(text4) 1.464301643393831

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How to create function?

A

def lexical_diversity(text): … return len(set(text)) / len(text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

…. in python during function?

A

The Python interpreter changes the prompt from >>> to … after encountering the colon at the end of the first line. The … prompt indicates that Python expects an indented code block to appear next. It is up to you to do the indentation, by typing four spaces or hitting the tab key. To finish the indented block just enter a blank line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

List operation? how to append , concatenate ?

A

sent1.append(“Some”) + = concatenation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Python how to do indexing ? two types what are they ?

A
  1. i. Indexing By position >>> text4[173] By valuetext4.index(‘awaken’)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Slicing?

A

Tacking some part of list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How to do slicing?

A

1) text5[16715:16735] 2) Text5[start : last] 3) By convention, m:n means elements m…n-1 4) we can omit the first number if the slice begins at the start of the list , and we can omit the second number if the slice goes to the endE.g sent[:3] [‘word1’, ‘word2’, ‘word2’] 5) text2[141525:] …will report upto the last item in the list

25
Q

How to update a list?

A

sent[0] = ‘First sent[1:9] = [‘Second’, ‘Third’]

26
Q

In sorting what is diff btw capitalize words and lowercase?

A

• Remember that capitalized words appear before lowercase words in sorted lists.

27
Q

Can python expression be splitted into several lines?

A

Python expressions can be split across multiple lines, so long as this happens within any kind of brackets. Python uses the “…” prompt to indicate that more input is expected. It doesn’t matter how much indentation is used in these continuation lines, but some indentation usually makes them easier to read.

28
Q

Strings

A

same method use with list can be use wth strings. for example : >>> name = ‘Monty’ >>> name[0] ‘M’ >>> name[:4] ‘Mont’

29
Q

How can we do addition and mult with strings?

A

>>> name * 2 ‘MontyMonty’ >>> name + ‘!’ ‘Monty!’ >>>

30
Q

How can we do addition and mult with strings?

A

>>> name * 2 ‘MontyMonty’ >>> name + ‘!’ ‘Monty!’ >>>

31
Q

what is join and split function as use in string?

A

>>> ‘ ‘.join([‘Monty’, ‘Python’]) ‘Monty Python’ >>> ‘Monty Python’.split() [‘Monty’, ‘Python’]

32
Q

Whatis frequency distribution?

A

§ It is a “distribution” because it tells us how the total number of word tokens in the text are distributed across the vocabulary items Since we often need frequency distributions in language processing, NLTK provides built-in support for them.

33
Q

what is function for finding frequency dist?

A

fdist1 = FreqDist(text1)

34
Q

what is most_commmon function?

A

fdist1.most_common(50) return 50 most common §

35
Q

Word that appear only once?

A

Which function give use word fdist1.hapaxes().. Words that appear only Once.

36
Q

How to produce commulative frequency graph

A

What proportion of the text is taken up with such words? We can generate a cumulative frequency plot for these words, using fdist1.plot(50, cumulative=True), to produce the graph.

37
Q

What is “the set of all w such that w is an element of V (the vocabulary) and w has property P”.

A

{w | w ∈ V & P(w)} Set notation [w for w in V if p(w)] in Python , This is a set , so it may contain duplicate

38
Q

Example of using set repr

A

V = set(text1) long_words = [w for w in V if len(w) > 15]

39
Q

Example of words from the chat corpus that are longer than seven characters, that occur more than seven times:

A

fdist5 = FreqDist(text5) >>> sorted(w for w in set(text5) if len(w) > 7 and fdist5[w]> 7) Or >>> R = [w for w in set(text5) if len(w)> 7 and fdist5[w] > 7]

40
Q

What is collocation?

A

A collocation is a sequence of words that occur together unusually often. Thus red wine is a collocation, whereas the wine is not

41
Q

example of using collocation function

A

text4.collocations()

42
Q

What is the essence of collocation?

A

§ The collocations that may emerge are very specific to the genre of the texts.

43
Q

Functions by NTLK for frequency distribution

A

ddddd

44
Q

Word comparison operator

A
45
Q

[w for w in text if condition]

.

A

We can also create more complex conditions. If c is a condition, then not c is also a condition. If we have two conditions c1 and c2, then we can combine them to form a new condition using conjunction and disjunction: c1 and c2, c1 or c2

46
Q

How to Operate on Every Element?

A

[len(w) for w in text1] , [w.upper() for w in text1],

○ These expressions have the form [f(w) for …] or [w.f() for …], where f is a function that operates on a word to compute its length, or to convert it to uppercase.
○ The notation just described is called a “list comprehension

47
Q

What is relation between nexted block and extra blank line?

A

When we use the Python interpreter we have to add an extra blank line
in order for it to detect that the nested block is complete.

48
Q

Pyhon control; structure end with what? and why?

A

all Python control structures end with a colon. The colon indicates that the current statement relates to the indented block that follows.

49
Q

Automatic Natural Language Understanding challenges?

A

○ On a more philosophical level, a long-standing challenge within artificial intelligence has been to build intelligent machines, and a major part of intelligent behaviour is understanding language. For many years this goal has been seen as too difficult. However, as NLP technologies become more mature, and robust methods for analyzing unrestricted text become more widespread, the prospect of natural language understanding has re-emerged as a plausible goal.

50
Q
A
51
Q

What is Word Sense Disambiguation?

A

In word sense disambiguation we want to work out which sense of a word was intended in a given context

52
Q

How to automatically disambiguate words ?

A

We automatically disambiguate words using context, exploiting the simple fact that nearby words have closely related meaning?

53
Q

Example

A

xample:

a. The lost children were found by the searchers (agentive)
b. The lost children were found by the mountain (locative)
c. The lost children were found by the afternoon (temporal)

54
Q

What is Pronoun Resolution?

A

A deeper kind of language understanding is to work out “who did what to whom” — i.e., to detect the subjects and objects of verbs. it is difficult example(one case is ambiguous):

The thieves stole the paintings. They were subsequently sold.

b. The thieves stole the paintings. They were subsequently caught.
c. The thieves stole the paintings. They were subsequently found.

Answering this question involves finding the antecedent of the pronoun they, either thieves or paintings. Computational techniques for tackling this problem include anaphora resolution — identifying what a pronoun or noun phrase refers to — and semantic role labeling — identifying how a noun phrase relates to the verb (as agent, patient, instrument, and so on)

55
Q

Why Machine Translation is difficult?

A

For a long time now, machine translation (MT) has been the holy grail of language understanding.

Machine translation is difficult because a given word could have several possible translations (depending on its meaning), and because word order must be changed in keeping with the grammatical structure of the target language. Today these difficulties are being faced by collecting massive quantities of parallel texts from news and government websites that publish documents in two or more languages. Given a document in German and English, and possibly a bilingual dictionary, we can automatically pair up the sentences, a process called text alignment. Once we have a million or more sentence pairs, we can detect corresponding words and phrases, and build a model that can be used for translating new text.

56
Q

What is chief measure of intelligence in history of AI?

A

In the history of artificial intelligence, the chief measure of intelligence has been a linguistic one, namely the Turing Test: can a dialogue system, responding to a user’s text input, perform so naturally that we cannot distinguish it from a human-generated response?

57
Q

Limitations of NLP?

A

Despite the research-led advances in tasks like RTE, natural language systems that have been deployed for real-world applications still cannot perform common-sense reasoning or draw on world knowledge in a general and robust manner

58
Q

Chapter 1 Summary

A
  • Texts are represented in Python using lists: [‘Monty’, ‘Python’]. We can use indexing, slicing, and the len() function on lists.
  • A word “token” is a particular appearance of a given word in a text; a word “type” is the unique form of the word as a particular sequence of letters. We count word tokens using len(text) and word types using len(set(text)).
  • We obtain the vocabulary of a text t using sorted(set(t)).
  • We operate on each item of a text using [f(x) for x in text].
  • To derive the vocabulary, collapsing case distinctions and ignoring punctuation, we can write set(w.lower() for w in text if w.isalpha()).
  • We process each word in a text using a for statement, such as for w in t: or for word in text:. This must be followed by the colon character and an indented block of code, to be executed each time through the loop.
  • We test a condition using an if statement: if len(word) < 5:. This must be followed by the colon character and an indented block of code, to be executed only if the condition is true.
  • A frequency distribution is a collection of items along with their frequency counts (e.g., the words of a text and their frequency of appearance).
  • A function is a block of code that has been assigned a name and can be reused. Functions are defined using the def keyword, as indef mult(x, y); x and y are parameters of the function, and act as placeholders for actual data values.
  • A function is called by specifying its name followed by zero or more arguments inside parentheses, like this: texts(), mult(3, 4), len(text1).
59
Q

What is collocation?

A

When you eat at a quick-service restaurant, you are eating fast food. You wouldn’t say you went and got ‘quick food.’ That is because fast food is a collocation, or a pair or set of words that are commonly put together. In a collocation, if you replace one of the words with a synonym, it sounds unnatural to native English speakers