NLP and CV Flashcards

1
Q

What is tokenization in NLP?

A

Tokenization is the process of splitting text into individual tokens (words, subwords and characters) for further processing into NLP tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is stemming and lemmatization?

A

Techniques used in NLP to reduce words to their base or root forms. Stemming removes affixes, while lemmatization considers the context to reduce words to their dictionary form.

Stemming: better speed, used for real time systems like search, text classification. When precision is not critical.

Lemmatization: translation, sentiment analysis etc. It is more accurate but slower.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is TF-IDF

A

Term Frequency-Inverse Document Frequency is a statistic that reflects the importance of a word in a document relative to a collection of documents.

Rows are docs, Cols are word in doc, so each document becomes represented by a vector. These are great to use as input for sentiment analysis, spam detection, search engines and document similarity. Can also be used to reduce vocabulary size by filtering out less important words. So many use cases!

For example, a search engine can use a TF-IDF vector to rank which documents are most relevant to a search query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is word embedding?

A

A representation of words as vectors in a continuous vector space. where similar words have similar vectors. Word2Vec, GloVe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a convolution in CNNs (for CV)

A

A convolution is a mathematical operation taht slides a filter over the input (such as an image) to extract features like edges, textures etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is max pooling in CNNs (for CV)?

A

Max pooling is a down-sampling operaiotn in CNN that takes the maximum value within a sliding window of the input, reducing spatial dimensions while retaining important information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the BLEU score in NLP?

A

Bilingual Evaluation Understudy score is a metric for evaluating the quality of text generated by machine translation systems by comparing it to reference translations.

It can be used for a variety of use cases, designed for machine translation, it can be used for summarization, image captions etc as long as there is a reference of what a human written accurate true value would be.

It goes for precision over recall, so may not turn out as accurate for documents that do a good job of getting the point across but are, for example too short compared to the reference.

context and synonymity can be hard as if the exact words aren’t used the BLEAU metric will be lower.

for context, use METEOR, ROUGE and BERT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is attention in NLP

A

Attention is a mechanism that allows models to focus on relevant parts of the input sequence, improving performance in tasks like translation and summarization.

Example: Transformer models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the transformer architecture in NLP?

A

A neural network architecture based on self-attention mechanisms, commonly used in taks like translation summarization and language modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an autoencoder in NLP/CV

A

It is a neural network used for unsupervised learning, where the network tires to compress the input data into a lower dimensional representation and then reconstruct it.

Autoencoders are neural network models primarily used for unsupervised learning tasks such as dimensionality reduction, data compression, and feature extraction. They learn to reconstruct the input data and capture its essential patterns, making them useful for anomaly detection and image-denoising tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is image segmentation in computer vision?

A

The task of partitioning an image into regions or objects. Examples include semantic segmentation, where every pixel is labeled with a class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is named entity recognition (NER) in NLP?

A

A task in NLP that identifies and classifies named entities (such as people, organizations, and locations) in text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is text classification?

A

The process of assigning a category or label to a given piece of text, used in tasks liek sentiment analysis and spam detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is sequence-to-sequence (seq2seq) modeling in NLP?

A

A framework of NLP for tasks like translation, where an input sequence is transformed into an output sequence. Often uses RNNs or Transformer models. (Encoder / Decoder) models

•	The encoder processes the input sequence and converts it into a fixed-length context vector (a learned representation of the sequence).
•	The decoder takes this context vector and generates an output sequence (such as a translation or summary).
•	Early seq2seq models often used RNNs (Recurrent Neural Networks), LSTMs, or GRUs as the encoder and decoder components.

The Transformer architecture can also be considered a type of sequence-to-sequence model, but with an important distinction:

•	Transformers use self-attention mechanisms instead of recurrence (like in RNNs or LSTMs) to model dependencies between tokens, allowing for parallel processing of input data, making them much faster and better at handling long-range dependencies.
•	The Transformer is inherently an encoder-decoder model, where the encoder processes the input sequence and the decoder generates the output sequence, as in traditional seq2seq models.

However, Transformers are more versatile:

•	You can use only the encoder part for tasks like text classification (e.g., BERT).
•	You can use only the decoder part for text generation (e.g., GPT).
•	The full encoder-decoder Transformer is used in tasks like machine translation (e.g., T5 or original Transformer).

Machine translation
For example, if you want your model to translate English into Spanish, you’d first need to convert each sentence into numerical values representing words from both languages.

Text summarization
Encoder-decoder models can help you understand input sequences and generate output sequences.

Question answering
Encoder-decoder models can map sequences of different lengths to each other, which can help you solve problems like question answering.

Sentiment analysis
Encoder-decoder models can understand the meaning and emotions of input sentences, and output a sentiment score. For example, call centers can use this to analyze how clients’ emotions change in response to keywords or discounts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is data augmentation in CV?

A

Techniques used to artificially expand a dataset by applying transformations such as rotations, flips, and color adjustments to input images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is optical character recognition (OCR)

A

Process of converting scanned images of text into machine-readable text.

16
Q

What is object detection in CV?

A

Task of identifying and localizing objects within an image. Examples, YOLO, Faster R-CNN

17
Q

What is transfer learning in NLP/CV?

A

Using pre-trained models like BERT (for NLP) or ResNet (for CV) on a new, related task, allowing faster training and better performance with limited data.

18
Q

What is BERT in NLP?

A

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model designed to understand context from both directions in text.

BERTScore:

•	Key Features: BERTScore uses contextual embeddings from BERT (Bidirectional Encoder Representations from Transformers) to evaluate the similarity between generated text and reference text at a contextual level.
•	Context Sensitivity: Unlike BLEU, which is based on surface-level n-gram overlap, BERTScore measures semantic similarity, capturing deeper contextual and meaning-based relationships between texts.
19
Q

What is GAN (Generative Adversarial Network)

A

Are a type of neural network consisting of a generator and a discriminator, where the generator creates images and the discriminator attempts to distinguish between real and generated images.

A GAN consists of two neural networks:

1.	Generator: The generator’s job is to create synthetic data that is similar to real data. It takes random noise as input and generates data, such as an image, text, or any other structured form of information.
2.	Discriminator: The discriminator’s task is to differentiate between real data (from the dataset) and the fake data generated by the generator. It outputs a probability score indicating whether the input is real or fake.

How GAN Works:

•	The generator and discriminator are trained together in an adversarial process:
•	The generator tries to improve at producing data that fools the discriminator into thinking it’s real.
•	The discriminator tries to improve at distinguishing real data from the fake data produced by the generator.
•	This back-and-forth process continues until the generator becomes good at generating data that the discriminator can no longer reliably tell apart from real data.
20
Q

ROUGE

A

ROUGE (Recall-Oriented Understudy for Gisting Evaluation):

•	Key Features: ROUGE is commonly used for evaluating text summarization. It measures overlap between n-grams, word sequences, and word pairs between the generated text and reference summaries.
•	Context Sensitivity: ROUGE-L, a variant, considers the longest common subsequence between the generated and reference text, capturing sentence-level structure and context more effectively than BLEU.
21
Q

N-gram

A

An n-gram is a contiguous sequence of n items (usually words or characters) from a given text or speech. In Natural Language Processing (NLP), n-grams are used to analyze and model the structure and relationships within text.

Types of n-grams:

•	Unigram (1-gram): A single word. For example, in the sentence “The cat sat,” the unigrams would be: ["The", "cat", "sat"].
•	Bigram (2-gram): A sequence of two words. In “The cat sat,” the bigrams would be: ["The cat", "cat sat"].
•	Trigram (3-gram): A sequence of three words. In “The cat sat,” the trigram would be: ["The cat sat"].

Applications of n-grams:

•	Language modeling: n-grams help in predicting the next word in a sequence by analyzing previous n-grams in a corpus. For instance, if the bigram “New York” appears frequently, a model might predict that “City” follows.
•	Text classification: n-grams are used to extract features from text for classification tasks such as sentiment analysis, spam detection, etc.
•	Text generation: n-grams are often used in models for generating coherent sentences by learning common word sequences.

While n-grams are simple and effective for capturing local word patterns, they can struggle with long-range dependencies in text, which is where more sophisticated models like Transformers come in.