NLP Python Tutorial Flashcards

1
Q

Function to download nltk corpora

A

nltk.download()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Import to tokenize sentences

A

from nltk.tokenize import sent_tokenize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Import to tokenize words

A

from nltk.tokenize import word_tokenize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Function to tokenize sentences

A

print(sent_tokenize())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Function to tokenize sentences example

A

print(sent_tokenize(text))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Function to tokenize words

A

print(word_tokenize())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Function to tokenize words example

A

print(word_tokenize(text))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Import for stop words

A

from nltk.corpus import stopwords

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Turning stop words into a variable

A

stop_words = set(stopwords.words(‘english’))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Turning tokenized text into a variable

A

word_tokens = word_tokenize()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Turning tokenized text into a variable example

A

word_tokens = word_tokenize(example)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Function for removing stop words

A

sentence = [w for w in word_tokens if not w in stop_words]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stemming import

A

from nltk.stem import PorterStemmer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Turning stemmer into a variable

A

ps = PorterStemmer()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Function for stemming

A

for w in:

print(ps.stem(w))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Function for stemming example

A

for w in example_words:

print(ps.stem(w))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Stemming an entire sentence steps

A
  1. Tokenize

2. Stem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Stemming an entire sentence example

A

words = word_tokenize(new_text)

for w in words:
print(ps.stem(w))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Importing an nltk text

A

from nltk.corpus import

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Importing an nltk text example

A

from nltk.corpus import udhr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Import for the PunktSentenceTokenizer

A

from nltk.tokenize import PunktSentenceTokenizer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Training the PunktSentenceTokenizer

A

custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Run sample text through PunktSentenceTokenizer

A

tokenized = custom_sent_tokenizer.tokenize(sample_text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Tag each tokenized word with a part of speech steps

A
  1. For loop
  2. Tokenize words
  3. Tag words
  4. Print tagged words
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Tag each tokenized word with a part of speech example
for i in tokenized[:5]: words = nltk.word_tokenize(i) tagged = nltk.pos_tag(words) print(tagged)
26
Chunking 3 steps
1. chunkgram = 2. chunkParser = 3. chunked =
27
chunkGram
chunkGram = r”””Chunk: { }”””
28
chunkGram example
chunkGram = r”””Chunk: {**+?}”””
29
chunkParser
chunkParser = nltk.RegexpParser()
30
chunkParser example
chunkParser = nltk.RegexpParser(chunkGram)
31
chunked
chunked = chunkParser.parse(tagged)
32
Print the nltk tree
for subtree in chunked.subtrees(filter=lambda t: t.label() == ‘Chunk’): print(subtree)
33
Draw chunks with nltk
chunked.draw()
34
namedEnt definition
Marks proper nouns as organizations or people or money, etc
35
namedEnt steps
1. Tokenize 2. pos Tagging 3. namedEnt
36
namedEnt example
namedEnt = nltk.ne_chunk(tagged, binary=false)
37
chunkGram for chinking
chunkGram = r”””Chunk: {+} | }+{“”””
38
Build a list of documents
documents = [list( .words(fileid)), category) for category in .categories() for fileid in .fileids(category)]
39
Build a list of documents example
documents = [list(movie_reviews.words(fileid)), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)]
40
Shuffle the documents
random.shuffle(documents)
41
Print the number of documents
print(‘Number of Documents {}’.format(len(documents)))
42
Print the first review
print(‘First Review: {}’.format(documents[0]))
43
Write all the words in the reviews
all_words = [] for w in movie_reviews.words(): all_words.append(w.lower())
44
Create a FreqDist
all_words = nltk.FreqDist(all_words)
45
Find the most common words
print(‘Most common words: {}’.format(all_words.most_common(15)))
46
Find how many times the word “happy” is used
print(‘the word happy: {}’.format(all_words[“happy”]))
47
Find how many words are in the text
print(len(all_words))
48
Use the 4000 most common words as features
word_features = list(all_words.keys())[:4000]
49
Find features steps
1. define function 2. prep 3. for loop 4. return features
50
Define find features
def find_features(document):
51
Find features prep
``` words = set(document) features = {} ```
52
Find features for loop
for w in word_features: | features[w] = (w in words)
53
Find features example steps
1. prep 2. for loop 3. print(features)
54
Find features example prep
features = find_features(.words(‘ ’))
55
Find features example prep example
features = find_features(movie_reviews.words(‘neg/cv000_29416.txt’))
56
Find features example for loop steps
1. for loop 2. if clause 3. print(keys)
57
Find features example for loop
for key, value in features.items():
58
Find features example if clause
if value == True: | print(key)
59
Find features for all documents
featuresets = [(find_features(rev), category) for (rev, category) in documents]
60
Import model selection
from sklearn import model_selection
61
Define a seed for reproducibility
seed = 1
62
Split featuresets into training and testing datasets
training, testing = model_selection.train_test_split(featuresets, test_size = 0.25, random_state = seed)
63
Import sklearn classifier
from nltk.classify.scikitlearn import SklearnClassifier
64
import SVC
from sklearn.svm import SVC
65
Define the SVC model
model = SklearnClassifier(SVC(kernel = ‘linear’))
66
Train the model on the training data
model.train(training)
67
Find the accuracy of the SVC model steps
1. define accuracy | 2. print
68
Define accuracy
accuracy = nltk.classify.accuracy(model, testing)
69
Print accuracy
print(‘SVC Accuracy: {}’.format(accuracy))