NLP and CV Flashcards
What is tokenization in NLP?
Tokenization is the process of splitting text into individual tokens (words, subwords and characters) for further processing into NLP tasks.
What is stemming and lemmatization?
Techniques used in NLP to reduce words to their base or root forms. Stemming removes affixes, while lemmatization considers the context to reduce words to their dictionary form.
Stemming: better speed, used for real time systems like search, text classification. When precision is not critical.
Lemmatization: translation, sentiment analysis etc. It is more accurate but slower.
What is TF-IDF
Term Frequency-Inverse Document Frequency is a statistic that reflects the importance of a word in a document relative to a collection of documents.
Rows are docs, Cols are word in doc, so each document becomes represented by a vector. These are great to use as input for sentiment analysis, spam detection, search engines and document similarity. Can also be used to reduce vocabulary size by filtering out less important words. So many use cases!
For example, a search engine can use a TF-IDF vector to rank which documents are most relevant to a search query.
What is word embedding?
A representation of words as vectors in a continuous vector space. where similar words have similar vectors. Word2Vec, GloVe.
What is a convolution in CNNs (for CV)
A convolution is a mathematical operation taht slides a filter over the input (such as an image) to extract features like edges, textures etc.
What is max pooling in CNNs (for CV)?
Max pooling is a down-sampling operaiotn in CNN that takes the maximum value within a sliding window of the input, reducing spatial dimensions while retaining important information.
What is the BLEU score in NLP?
Bilingual Evaluation Understudy score is a metric for evaluating the quality of text generated by machine translation systems by comparing it to reference translations.
It can be used for a variety of use cases, designed for machine translation, it can be used for summarization, image captions etc as long as there is a reference of what a human written accurate true value would be.
It goes for precision over recall, so may not turn out as accurate for documents that do a good job of getting the point across but are, for example too short compared to the reference.
context and synonymity can be hard as if the exact words aren’t used the BLEAU metric will be lower.
for context, use METEOR, ROUGE and BERT
What is attention in NLP
Attention is a mechanism that allows models to focus on relevant parts of the input sequence, improving performance in tasks like translation and summarization.
Example: Transformer models.
What is the transformer architecture in NLP?
A neural network architecture based on self-attention mechanisms, commonly used in taks like translation summarization and language modeling.
What is an autoencoder in NLP/CV
It is a neural network used for unsupervised learning, where the network tires to compress the input data into a lower dimensional representation and then reconstruct it.
Autoencoders are neural network models primarily used for unsupervised learning tasks such as dimensionality reduction, data compression, and feature extraction. They learn to reconstruct the input data and capture its essential patterns, making them useful for anomaly detection and image-denoising tasks
What is image segmentation in computer vision?
The task of partitioning an image into regions or objects. Examples include semantic segmentation, where every pixel is labeled with a class.
What is named entity recognition (NER) in NLP?
A task in NLP that identifies and classifies named entities (such as people, organizations, and locations) in text.
What is text classification?
The process of assigning a category or label to a given piece of text, used in tasks liek sentiment analysis and spam detection.
What is sequence-to-sequence (seq2seq) modeling in NLP?
A framework of NLP for tasks like translation, where an input sequence is transformed into an output sequence. Often uses RNNs or Transformer models. (Encoder / Decoder) models
• The encoder processes the input sequence and converts it into a fixed-length context vector (a learned representation of the sequence). • The decoder takes this context vector and generates an output sequence (such as a translation or summary). • Early seq2seq models often used RNNs (Recurrent Neural Networks), LSTMs, or GRUs as the encoder and decoder components.
The Transformer architecture can also be considered a type of sequence-to-sequence model, but with an important distinction:
• Transformers use self-attention mechanisms instead of recurrence (like in RNNs or LSTMs) to model dependencies between tokens, allowing for parallel processing of input data, making them much faster and better at handling long-range dependencies. • The Transformer is inherently an encoder-decoder model, where the encoder processes the input sequence and the decoder generates the output sequence, as in traditional seq2seq models.
However, Transformers are more versatile:
• You can use only the encoder part for tasks like text classification (e.g., BERT). • You can use only the decoder part for text generation (e.g., GPT). • The full encoder-decoder Transformer is used in tasks like machine translation (e.g., T5 or original Transformer).
Machine translation
For example, if you want your model to translate English into Spanish, you’d first need to convert each sentence into numerical values representing words from both languages.
Text summarization
Encoder-decoder models can help you understand input sequences and generate output sequences.
Question answering
Encoder-decoder models can map sequences of different lengths to each other, which can help you solve problems like question answering.
Sentiment analysis
Encoder-decoder models can understand the meaning and emotions of input sentences, and output a sentiment score. For example, call centers can use this to analyze how clients’ emotions change in response to keywords or discounts.
What is data augmentation in CV?
Techniques used to artificially expand a dataset by applying transformations such as rotations, flips, and color adjustments to input images.