CAIC 9.5 Flashcards
What are common marketing tactics employed by retailers?
Direct marketing emails, digital advertisements, incentives, discounts
These tactics are often based on customer demographics.
What is the goal of using ML models in marketing campaigns?
To optimize effectiveness, target the right customers, achieve high conversion rates, minimize costs
This involves analyzing customer data and demographics.
What is unsupervised clustering in customer segmentation?
A method to group customers based on data, such as basic demographics
This helps create unique marketing campaigns for each segment.
What do highly personalized marketing campaigns utilize?
Accurate individual profiles using behavior data, historical transaction data, social media data
This leads to higher conversion rates.
What is contextual advertising?
A targeted marketing technique that displays ads relevant to web page content
Example: Cooking product ads on cooking recipe websites.
How does generative AI enhance targeted marketing?
By creating dynamically personalized content, such as customized images and text
This is tailored to individual customer preferences and interests.
What is sentiment analysis?
A text classification problem that determines if sentiment is positive, negative, or neutral
It uses labeled text data, such as product reviews.
What techniques do retailers use to assess brand perception?
Soliciting feedback, monitoring social media channels
This helps retailers understand customer emotions and sentiments.
What is the main purpose of inventory planning and demand forecasting?
To manage inventory costs while maximizing revenue and avoiding out-of-stock situations
Traditional methods have limitations in accuracy.
Which techniques do retailers use for demand forecasting?
Statistical techniques, ML techniques such as regression analysis and deep learning
These approaches create accurate demand forecasts.
What are the three main stages of the autonomous driving system architecture?
Perception and localization, decision and planning, control
Each stage plays a crucial role in the functioning of autonomous vehicles.
What is the role of the perception stage in autonomous driving?
To gather information about surroundings and determine the vehicle’s position
It uses sensors like RADAR, LIDAR, and cameras.
What is the function of the decision and planning stage in autonomous vehicles?
Controls motion and behavior based on data from the perception stage
It analyzes data to determine the optimal path for the vehicle.
How do AI and ML enhance the control module in autonomous vehicles?
By translating decisions into physical actions and optimizing the vehicle’s performance
This includes adaptive control systems and reinforcement learning.
What is the purpose of Advanced Driver Assistance Systems (ADAS)?
To enhance driving experience and safety by detecting hazards and issuing warnings
Examples include lane departure warnings and automatic emergency braking.
What is the main role of ML solutions architects?
To understand common ML algorithms and design technology infrastructure for deployment
This knowledge helps in selecting suitable data science solutions.
What is an objective function in ML algorithms?
A metric used to minimize or maximize, such as the disparity between projected and actual sales
It guides the optimization process.
What is the purpose of gradient descent in ML?
To optimize model parameters by calculating the rate of error change
This iterative approach helps reduce errors in predictions.
What is the primary purpose of gradient descent?
To optimize neural networks and various ML algorithms
What does gradient descent calculate to update model parameters?
The rate of error change (gradient) associated with each input variable
What is the role of the learning rate in gradient descent?
Controls the magnitude of parameter updates at each iteration
List the key steps involved in the gradient descent optimization process.
- Initialize the value of W randomly
- Calculate the error (loss) using the assigned value of W
- Compute the gradient of the error with respect to the loss function
- Update the value of W to reduce the error
- Repeat until the gradient becomes zero
What is the normal equation in relation to machine learning?
A one-step analytical solution for calculating the coefficients of linear regression models
What are some factors to consider when selecting a ML algorithm?
- Problem type
- Dataset size
- Number and nature of features
- Computational requirements
- Interpretability of results
- Assumptions about data distribution
What is classification in machine learning?
A task that assigns categories or classes to data points
What is regression in machine learning?
A technique used to predict continuous numeric values
What is linear regression used for?
To predict a scalar output based on a linear function of input variables
What does logistic regression estimate?
The probability of an event occurring
What is the primary output of logistic regression?
A probability score between 0 and 1
What are the advantages of logistic regression?
- Fast training speed
- Interpretability
What is the main advantage of decision trees over linear models?
Ability to capture non-linear relationships and interactions between features
What algorithms are used for splitting data in decision trees?
- Gini purity index
- Information gain
What is a limitation of decision trees?
Prone to overfitting, especially with noisy data
What is the primary benefit of random forests?
Improved accuracy by combining predictions from multiple trees
How do random forests reduce overfitting?
By introducing randomness and using diverse subsets of features
What distinguishes gradient boosting from random forests?
Gradient boosting sequentially aggregates results while random forests use parallel independent learners
What is a key advantage of gradient boosting?
Ability to handle imbalanced datasets effectively
What is a disadvantage of gradient boosting?
Lacks parallelization capabilities, making it slower in training
What is gradient boosting?
A machine learning algorithm that can achieve higher performance than other algorithms when properly tuned.
What is a key advantage of gradient boosting?
It supports custom loss functions, providing flexibility in modeling real-world applications.
What is one limitation of gradient boosting?
It lacks parallelization capabilities, making it slower in training compared to parallelizable algorithms.
How does gradient boosting handle noisy data?
It is sensitive to noisy data, including outliers, which can lead to overfitting and reduced generalization performance.
What is XGBoost?
A widely-used implementation of gradient boosting that enables training a single tree across multiple cores and CPUs.
What are some improvements XGBoost offers over traditional gradient boosting?
Faster training times and powerful regularization techniques to mitigate overfitting.
What is K-NN?
A versatile algorithm used for both classification and regression tasks based on the proximity of data points.
What distance metric is commonly used in K-NN?
Euclidean distance.
What is the majority voting process in K-NN classification?
The most frequent class among the K nearest neighbors is assigned to the new data point.
What is one advantage of K-NN?
Its simplicity and lack of the need for training or tuning with hyperparameters.
What is a significant drawback of K-NN?
It is not suitable for high-dimensional datasets due to the diminished meaning of proximity.
What is an artificial neuron?
A computational unit that processes inputs and produces an output, similar to a biological neuron.
What does the activation function in an artificial neuron do?
Modifies the output of the linear function, capturing non-linear relationships.
What is a Multi-Layer Perceptron (MLP)?
An artificial neural network with multiple layers of interconnected neurons.
What is backpropagation?
The process of adjusting the weights of neurons based on the total error propagated back through the network.
What is the purpose of an MLP in machine learning?
To perform classification and regression tasks by capturing intricate nonlinear patterns.
What is clustering in data mining?
A method of grouping items together based on shared attributes.
What is the K-means clustering algorithm used for?
Grouping similar data points into clusters based on proximity to centroids.
What is a time series?
A sequence of data points recorded at successive time intervals.
What are the key characteristics of time series data?
- Trend
- Seasonality
- Stationarity
What does the ARIMA model stand for?
Auto-Regressive Integrated Moving Average.
What is the autoregressive component of ARIMA?
The value of a variable in a given period is influenced by its own previous values.
What does DeepAR utilize for forecasting?
A recurrent neural network (RNN) to capture patterns in target time series.
What is a major disadvantage of DeepAR?
The black-box nature of deep learning models makes forecasts difficult to explain.
What is a recommender system?
A machine learning system designed to suggest items to users based on various data inputs.
What is a significant drawback of DeepAR?
The black-box nature of the deep learning model, which lacks interpretability and transparency.
What is the primary function of a recommender system?
Predicting a user’s preference for items based on user or item attribute similarities or user-item interactions.
In which industries has the recommender system gained widespread adoption?
- Retail
- Media and entertainment
- Finance
- Healthcare
What is collaborative filtering?
A recommendation algorithm that predicts user preferences by analyzing the collective experiences and behaviors of different users.
What is a major benefit of collaborative filtering?
It provides highly personalized recommendations matched to each user’s unique interests.
What is the cold-start problem in collaborative filtering?
Collaborative models struggle when new users or items with no ratings are introduced.
What does matrix factorization do in collaborative filtering?
It learns vector representations for both users and items to predict missing entries in the user-item interaction matrix.
What is the primary function of a convolutional neural network (CNN)?
Processing and analyzing image data.
What role does the pooling layer play in a CNN?
It reduces the dimensionality of the extracted features.
What are two commonly used pooling techniques?
- Max pooling
- Average pooling
What is the vanishing gradient problem in CNNs?
Signals from initial inputs diminish as they traverse through multiple layers.
How does ResNet address the vanishing gradient problem?
By implementing a layer-skipping technique with skip connections.
What does natural language processing (NLP) focus on?
The relationship between computers and human language.
What is the purpose of embedding in NLP?
To generate low-dimensional representations for words or sentences that capture semantic meaning.
What are the two components of TF-IDF?
- TF (Term Frequency)
- IDF (Inverse Document Frequency)
What is a limitation of Bag of Words (BOW) and TF-IDF?
They lack the ability to capture the semantic meaning of words and often result in large and sparse input vectors.
What does the term ‘embedding’ refer to in machine learning?
Creating numerical representations for entities that capture their semantic similarity.
What is the primary advantage of using CNNs for image data?
Efficient training due to high degrees of parallelism.
What is a key challenge faced by MAB algorithms?
Striking the right balance between exploration and exploitation.
What is a practical application of computer vision technology?
- Object identification
- Image classification
- Face recognition
- Activity detection
What is embedding?
A technique used to generate low-dimensional representations (mathematical vectors) for words or sentences that capture the semantic meaning of the text.
What does the underlying idea of embedding suggest?
Words or sentences with similar semantic meanings tend to occur in similar contexts.
How are semantically similar entities represented in embedding space?
They are closer to each other than those with different meanings.
What is cosine similarity?
A metric that measures how similar two vectors are by calculating the cosine of the angle between them.
What are the two techniques for learning embedding in Word2Vec?
- CBOW (Continuous Bag of Words)
- Continuous-skip-gram
How does CBOW work?
It tries to predict a word for a given window of surrounding words.
How does continuous-skip-gram work?
It tries to predict surrounding words for a given word.
What is the purpose of a sliding window in CBOW?
To run across running text and choose one of the words as the target and the rest as inputs.
What type of network is used to train Word2Vec embeddings?
A straightforward one-hidden-layer MLP network.
What is the purpose of the hidden layer’s weights in Word2Vec?
They serve as the actual embeddings for the words after training.
What is the term for using embeddings as features for downstream tasks?
Transfer learning.
What limitation does Word2Vec have regarding word meanings?
It produces a fixed embedding representation for each word, disregarding contextual variations.
What does BERT stand for?
Bidirectional Encoder Representations from Transformers.
What is the main advantage of contextualized word embeddings?
They consider the surrounding words or overall context, allowing for more nuanced representations.
What are the two main tasks BERT performs?
- Predicting randomly masked words in sentences
- Predicting the next sentence from a given sentence.
What is the significance of subword level embeddings in BERT?
They allow BERT to handle out-of-vocabulary (OOV) words more effectively.
What component of the transformer architecture does BERT primarily use?
The encoder part.
What are some NLP tasks that BERT can be used for?
- Question answering
- Named entity extraction
- Text summarization
What is fine-tuning in the context of BERT?
Adding an additional output layer to the BERT network for a specific task and updating the pre-trained model weights.
What is a GAN?
A type of generative model designed to generate realistic data instances, such as images.
What are the two networks in a GAN?
- Generator
- Discriminator
What is the role of the Generator in a GAN?
To generate instances of data.
What does the Discriminator in a GAN do?
Learns to distinguish between real and fake instances generated by the Generator.
What is few-shot learning?
A process where a model learns how to perform a task with just a few examples.
What is the main difference between GPT and BERT?
GPT uses the Transformer decoder block while BERT uses the Transformer encoder block.
What is the primary training approach used by GPT?
Next word prediction.
What does GPT-3 exemplify in terms of model parameters?
It has 175 billion parameters after training.
What are foundation models?
Models that are pre-trained on massive datasets and can handle multiple tasks.
What is the architecture called that PaLM uses?
Pathways.
What is LLaMA?
An LLM available in multiple sizes from 7 billion to 65 billion parameters.
What is a key advantage of LLaMA compared to larger models?
It requires fewer computational resources.
What is the parameter range of LLaMA?
From 7 billion parameters to 65 billion parameters
What advantages does LLaMA offer compared to larger models?
Requires fewer computational resources
What capabilities does LLaMA provide?
- Generating creative text
- Answering questions
- Solving mathematical problems
What type of license has Meta issued for LLaMA?
A noncommercial license emphasizing usage in research contexts
How does LLaMA perform when fine-tuned?
It performs extremely well with additional training data
What is the parameter count of BLOOM?
176 billion parameters
In how many languages can BLOOM generate text?
46 different languages and 13 programming languages
How many researchers contributed to the development of BLOOM?
More than 1,000 researchers
What is the Responsible AI License associated with BLOOM?
Individuals and institutions can use and build upon the model under agreed terms
Where is BLOOM easily accessible?
Within the Hugging Face ecosystem
What are domain-specific LLMs?
Models specifically trained for industries to solve tough domain-focused problems
What is BloombergGPT?
A domain-focused LLM specifically trained for the finance industry
What financial NLP tasks does BloombergGPT enhance?
- Sentiment analysis
- Named entity recognition
- News classification
- Question answering
How many tokens comprise the comprehensive dataset used for BloombergGPT?
Over 700 billion tokens
What are some significant limitations of LLMs?
- Generating misinformation (hallucinations)
- Toxic content
- Potential bias
- High resource consumption
What common problems have existing NLP techniques solved?
- Named entity extraction
- Document classification
- Sentiment analysis
What recent advancements have AI made in image generation?
High-resolution, photorealistic images and generative art
What is the new type of deep learning model used for image generation?
Diffusion model
How does a diffusion model generate realistic data?
By adding noise to input data until unrecognizable, then reversing the process
What is the process of adding noise to data in a diffusion model called?
Diffusion steps
What method does a diffusion model use to learn data generation?
Optimizing a set of learnable parameters through backpropagation
What does the iterative process of a diffusion model allow for?
Capturing complex dependencies, intricate patterns, and structures