NLTK Flashcards

1
Q

What is spaCy?

A

spaCy is an advanced Natural Language Processing (NLP) library designed for efficiency and production use.

🔹 Why spaCy?
Faster than NLTK
Supports deep learning with TensorFlow & PyTorch
Handles large text data efficiently

🔹 Features:
Tokenization
Named Entity Recognition (NER)
Part-of-Speech (POS) tagging
Dependency Parsing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to import nltk?

A

import nltk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to import stopwords?

A

from nltk.corpus import stopwords

Remove Stopwords
stop_words = set(stopwords.words(‘english’))
words = [word for word in words if word not in stop_words]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to import tokenizer ?

A

from nltk.tokenize import word_tokenize
# Tokenization
words = word_tokenize(text) # text contains text data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to import lemmatizer?

A

from nltk.stem import WordNetLemmatizer

Lemmatization
lemmatizer = WordNetLemmatizer()
words = [lemmatizer.lemmatize(word) for word in words]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How to do text Lowercasing ?

A

text = text.lower() # Lowercasing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to remove numbers from text?

A

text = re.sub(r’\d+’, ‘’, text) # Remove numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to remove punctuations from text?

A

text = text.translate(str.maketrans(‘’, ‘’, string.punctuation)) # Remove punctuation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to remove extra space from text?

A

text = ‘ ‘.join(text.split()) # Remove extra spaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to import TF-IDF Vectorizer?

A

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(max_features=5000) # here vector is initiated to
X=vectorizer.fit_transform(df[‘cleaned_message’]).toarray()
y = df[‘label_spam’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly