NLP Python Tutorial Flashcards
Function to download nltk corpora
Import to tokenize sentences
from nltk.tokenize import sent_tokenize
Import to tokenize words
from nltk.tokenize import word_tokenize
Function to tokenize sentences
Function to tokenize sentences example
Function to tokenize words
Function to tokenize words example
Import for stop words
from nltk.corpus import stopwords
Turning stop words into a variable
stop_words = set(stopwords.words(‘english’))
Turning tokenized text into a variable
word_tokens = word_tokenize()
Turning tokenized text into a variable example
word_tokens = word_tokenize(example)
Function for removing stop words
sentence = [w for w in word_tokens if not w in stop_words]
Stemming import
from nltk.stem import PorterStemmer
Turning stemmer into a variable
ps = PorterStemmer()
Function for stemming
for w in:
Function for stemming example
for w in example_words:
Stemming an entire sentence steps
- Tokenize
2. Stem
Stemming an entire sentence example
words = word_tokenize(new_text)
for w in words:
Importing an nltk text
from nltk.corpus import
Importing an nltk text example
from nltk.corpus import udhr
Import for the PunktSentenceTokenizer
from nltk.tokenize import PunktSentenceTokenizer
Training the PunktSentenceTokenizer
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
Run sample text through PunktSentenceTokenizer
tokenized = custom_sent_tokenizer.tokenize(sample_text)
Tag each tokenized word with a part of speech steps
- For loop
- Tokenize words
- Tag words
- Print tagged words
Tag each tokenized word with a part of speech example
for i in tokenized[:5]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
Chunking 3 steps
- chunkgram =
- chunkParser =
- chunked =
chunkGram = r”””Chunk: { }”””