NLP Python Tutorial Flashcards
Function to download nltk corpora
nltk.download()
Import to tokenize sentences
from nltk.tokenize import sent_tokenize
Import to tokenize words
from nltk.tokenize import word_tokenize
Function to tokenize sentences
print(sent_tokenize())
Function to tokenize sentences example
print(sent_tokenize(text))
Function to tokenize words
print(word_tokenize())
Function to tokenize words example
print(word_tokenize(text))
Import for stop words
from nltk.corpus import stopwords
Turning stop words into a variable
stop_words = set(stopwords.words(‘english’))
Turning tokenized text into a variable
word_tokens = word_tokenize()
Turning tokenized text into a variable example
word_tokens = word_tokenize(example)
Function for removing stop words
sentence = [w for w in word_tokens if not w in stop_words]
Stemming import
from nltk.stem import PorterStemmer
Turning stemmer into a variable
ps = PorterStemmer()
Function for stemming
for w in:
print(ps.stem(w))
Function for stemming example
for w in example_words:
print(ps.stem(w))
Stemming an entire sentence steps
- Tokenize
2. Stem
Stemming an entire sentence example
words = word_tokenize(new_text)
for w in words:
print(ps.stem(w))
Importing an nltk text
from nltk.corpus import
Importing an nltk text example
from nltk.corpus import udhr
Import for the PunktSentenceTokenizer
from nltk.tokenize import PunktSentenceTokenizer
Training the PunktSentenceTokenizer
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
Run sample text through PunktSentenceTokenizer
tokenized = custom_sent_tokenizer.tokenize(sample_text)
Tag each tokenized word with a part of speech steps
- For loop
- Tokenize words
- Tag words
- Print tagged words
Tag each tokenized word with a part of speech example
for i in tokenized[:5]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
print(tagged)
Chunking 3 steps
- chunkgram =
- chunkParser =
- chunked =
chunkGram
chunkGram = r”””Chunk: { }”””