Chapter 1 Flashcards
Takeaway of chapter1
Load all data in NLP ?
>>> from nltk.book import *
How do yo find out about ant texts in NLP ?
just type text1
What is Concordance :?
An alphabetical list of the words (especially the important ones) present in a text or texts, usually with citations of the passages concerned or with the context displayed on a computer screen. ‘a concordance to the Bible’
What is concordance in Python?
.concordance() is a method in the Text class of nltk Basically, if you want to use the .concordance(), you have to instantiate a Text object first, and then call it on that object
A concordance permits us ?
A concordance permits us to see words in context. For example, we saw that mon- strous occurred in contexts such as the ___ pictures and the ___ size.
What other words appear in a similar range of contexts?
We can find out by appending the term similar to the name of the text in question, then inserting the relevant word in parentheses: >>> text1.similar(“monstrous”)
What is Common_context?
The term common_contexts allows us to examine just the contexts that are shared by two or more words, such as monstrous and very. text2.common_contexts([“monstrous”, “very”be_glad am_glad a_pretty is_pretty a_lucky
how to find location of word in a text?
Can be done using dispersion plot E.g >>> text4.dispersion_plot([“citizens”, “democracy”, “freedom”, “duties”, “America”])
How to Generate Random Text?
The method generate random text from the provided corpus. E.g here text3 . text3.generate()
The generate() method is not available in NLTK 3.0? why
but will be reinstated in a subsequent version.
is generate truly random?
Although the text is random, it reuses common words and phrases from the source text and gives us a sense of its style and content
What is a token?
A token is the technical name for a sequence of characters — such as hairy, his, or :)
how to get length of text?
len(text3) : We use the term len to get the length of something,
What is vocabulary of a text?
The vocabulary of a text is just the set of tokens that it uses, since in a set, all duplicates are collapsed together.
What is python command to find vocabulary of tex?
In Python we can obtain the vocabulary items of text3 with the command: set(text3)
what is word type?
A word type is the form or spelling of the word independently of its specific occurrences in a text — that is, the word considered as a unique item of vocabulary. Also called distinct word
How to calculate a measure of the lexical richness of the text?
□ example shows us that the number of distinct words is just 6% of the total number of words, or equivalently that each word is used 16 times on average len(set(text3)) / len(text3) 0.06230453042623537
Count on particular word?
Count how often a word occur and the percentage of word in text text3.count(“smote”) 5 >>> 100 * text4.count(‘a’) / len(text4) 1.464301643393831
How to create function?
def lexical_diversity(text): … return len(set(text)) / len(text)
…. in python during function?
The Python interpreter changes the prompt from >>> to … after encountering the colon at the end of the first line. The … prompt indicates that Python expects an indented code block to appear next. It is up to you to do the indentation, by typing four spaces or hitting the tab key. To finish the indented block just enter a blank line.
List operation? how to append , concatenate ?
sent1.append(“Some”) + = concatenation
Python how to do indexing ? two types what are they ?
- i. Indexing By position >>> text4[173] By valuetext4.index(‘awaken’)
What is Slicing?
Tacking some part of list