NLP Python Tutorial Flashcards

1
Q

Function to download nltk corpora

A

nltk.download()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Import to tokenize sentences

A

from nltk.tokenize import sent_tokenize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Import to tokenize words

A

from nltk.tokenize import word_tokenize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Function to tokenize sentences

A

print(sent_tokenize())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Function to tokenize sentences example

A

print(sent_tokenize(text))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Function to tokenize words

A

print(word_tokenize())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Function to tokenize words example

A

print(word_tokenize(text))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Import for stop words

A

from nltk.corpus import stopwords

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Turning stop words into a variable

A

stop_words = set(stopwords.words(‘english’))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Turning tokenized text into a variable

A

word_tokens = word_tokenize()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Turning tokenized text into a variable example

A

word_tokens = word_tokenize(example)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Function for removing stop words

A

sentence = [w for w in word_tokens if not w in stop_words]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stemming import

A

from nltk.stem import PorterStemmer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Turning stemmer into a variable

A

ps = PorterStemmer()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Function for stemming

A

for w in:

print(ps.stem(w))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Function for stemming example

A

for w in example_words:

print(ps.stem(w))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Stemming an entire sentence steps

A
  1. Tokenize

2. Stem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Stemming an entire sentence example

A

words = word_tokenize(new_text)

for w in words:
print(ps.stem(w))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Importing an nltk text

A

from nltk.corpus import

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Importing an nltk text example

A

from nltk.corpus import udhr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Import for the PunktSentenceTokenizer

A

from nltk.tokenize import PunktSentenceTokenizer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Training the PunktSentenceTokenizer

A

custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Run sample text through PunktSentenceTokenizer

A

tokenized = custom_sent_tokenizer.tokenize(sample_text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Tag each tokenized word with a part of speech steps

A
  1. For loop
  2. Tokenize words
  3. Tag words
  4. Print tagged words
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Tag each tokenized word with a part of speech example

A

for i in tokenized[:5]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
print(tagged)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Chunking 3 steps

A
  1. chunkgram =
  2. chunkParser =
  3. chunked =
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

chunkGram

A

chunkGram = r”””Chunk: { }”””

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

chunkGram example

A

chunkGram = r”””Chunk: {**+?}”””

29
Q

chunkParser

A

chunkParser = nltk.RegexpParser()

30
Q

chunkParser example

A

chunkParser = nltk.RegexpParser(chunkGram)

31
Q

chunked

A

chunked = chunkParser.parse(tagged)

32
Q

Print the nltk tree

A

for subtree in chunked.subtrees(filter=lambda t: t.label() == ‘Chunk’):
print(subtree)

33
Q

Draw chunks with nltk

A

chunked.draw()

34
Q

namedEnt definition

A

Marks proper nouns as organizations or people or money, etc

35
Q

namedEnt steps

A
  1. Tokenize
  2. pos Tagging
  3. namedEnt
36
Q

namedEnt example

A

namedEnt = nltk.ne_chunk(tagged, binary=false)

37
Q

chunkGram for chinking

A

chunkGram = r”””Chunk: {+}

}+{“”””

38
Q

Build a list of documents

A

documents = [list( .words(fileid)), category)
for category in .categories()
for fileid in .fileids(category)]

39
Q

Build a list of documents example

A

documents = [list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]

40
Q

Shuffle the documents

A

random.shuffle(documents)

41
Q

Print the number of documents

A

print(‘Number of Documents {}’.format(len(documents)))

42
Q

Print the first review

A

print(‘First Review: {}’.format(documents[0]))

43
Q

Write all the words in the reviews

A

all_words = []
for w in movie_reviews.words():
all_words.append(w.lower())

44
Q

Create a FreqDist

A

all_words = nltk.FreqDist(all_words)

45
Q

Find the most common words

A

print(‘Most common words: {}’.format(all_words.most_common(15)))

46
Q

Find how many times the word “happy” is used

A

print(‘the word happy: {}’.format(all_words[“happy”]))

47
Q

Find how many words are in the text

A

print(len(all_words))

48
Q

Use the 4000 most common words as features

A

word_features = list(all_words.keys())[:4000]

49
Q

Find features steps

A
  1. define function
  2. prep
  3. for loop
  4. return features
50
Q

Define find features

A

def find_features(document):

51
Q

Find features prep

A
words = set(document)
features = {}
52
Q

Find features for loop

A

for w in word_features:

features[w] = (w in words)

53
Q

Find features example steps

A
  1. prep
  2. for loop
  3. print(features)
54
Q

Find features example prep

A

features = find_features(.words(‘ ’))

55
Q

Find features example prep example

A

features = find_features(movie_reviews.words(‘neg/cv000_29416.txt’))

56
Q

Find features example for loop steps

A
  1. for loop
  2. if clause
  3. print(keys)
57
Q

Find features example for loop

A

for key, value in features.items():

58
Q

Find features example if clause

A

if value == True:

print(key)

59
Q

Find features for all documents

A

featuresets = [(find_features(rev), category) for (rev, category) in documents]

60
Q

Import model selection

A

from sklearn import model_selection

61
Q

Define a seed for reproducibility

A

seed = 1

62
Q

Split featuresets into training and testing datasets

A

training, testing = model_selection.train_test_split(featuresets, test_size = 0.25, random_state = seed)

63
Q

Import sklearn classifier

A

from nltk.classify.scikitlearn import SklearnClassifier

64
Q

import SVC

A

from sklearn.svm import SVC

65
Q

Define the SVC model

A

model = SklearnClassifier(SVC(kernel = ‘linear’))

66
Q

Train the model on the training data

A

model.train(training)

67
Q

Find the accuracy of the SVC model steps

A
  1. define accuracy

2. print

68
Q

Define accuracy

A

accuracy = nltk.classify.accuracy(model, testing)

69
Q

Print accuracy

A

print(‘SVC Accuracy: {}’.format(accuracy))