1 - Fundamentals of AI/ML Flashcards
____ data is a dataset where each instance or example is accompanied by a label or target variable that represents the desired output or classification.
Labeled
____ data is a dataset where the instances or examples do not have any associated labels or target variables. The data consists only of input features, without any corresponding output or classification.
Unlabeled
____ data refers to data that is organized and formatted in a predefined manner, typically in the form of tables or databases with rows and columns. This type of data is suitable for traditional machine learning algorithms that require well-defined features and labels.
Structured
What type of structured data includes data stored in spreadsheets, databases, or CSV files, with rows representing instances and columns representing features or attributes?
Tabular data
What type of structured data consists of sequences of values measured at successive points in time, such as stock prices, sensor readings, or weather data?
Time-series data
____ data is data that lacks a predefined structure or format, such as text, images, audio, and video. This type of data requires more advanced machine learning techniques to extract meaningful patterns and insights.
Unstructured
What type of unstructured data includes documents, articles, social media posts, and other textual data?
text data
What type of unstructured data includes digital images, photographs, and video frames?
image data
The ML learning process is traditionally divided into what three broad categories?
supervised learning, unsupervised learning, and reinforcement learning
In ____ learning, the algorithms are trained on labeled data. The goal is to learn a mapping function that can predict the output for new, unseen input data.
supervised
____ learning refers to algorithms that learn from unlabeled data. The goal is to discover inherent patterns, structures, or relationships within the input data.
Unsupervised
In ____ learning, the machine is given only a performance score as guidance and semi-supervised learning, where only a portion of training data is labeled. Feedback is provided in the form of rewards or penalties for its actions, and the machine learns from this feedback to improve its decision-making over time.
reinforcement
After the model has been trained, it is time to begin the process of using the information that a model has learned to make predictions or decisions. This is called ____.
inferencing
What are two main types of inferencing in machine learning?
batch / real-time
____ inferencing is when the computer takes a large amount of data, such as images or text, and analyzes it all at once to provide a set of results.
Batch
Which type of inferencing in machine learning is often used for tasks like data analysis, where the speed of the decision-making process is not as crucial as the accuracy of the results?
Batch
____ inferencing is when the computer has to make decisions quickly, in response to new information as it comes in.
Real-time
Which type of inferencing in machine learning is important for applications where immediate decision-making is critical, such as in chatbots or self-driving cars. The computer has to process the incoming data and make a decision almost instantaneously, without taking the time to analyze a large dataset?
Real-time
At the core of deep learning are neural networks, which have lots of tiny units called ____ that are connected together.
nodes
The nodes in neural networks are organized into a ____ layer, one or more ____ layers, and an ____ layer.
input / hidden / output
When we show a neural network many examples, like data about customers who bought certain products or used certain services, it figures out how to ____ by adjusting the connections between its nodes.
identify patterns
True/False: When a neural network learns to recognize patterns from examples, it can then look at data for completely new customers that it has never seen before and still make predictions about what they might buy or how they might behave.
true
____ is a field of artificial intelligence that makes it possible for computers to interpret and understand digital images and videos.
Computer Vision
____ is a branch of artificial intelligence that deals with the interaction between computers and human languages.
Natural language processing (LNP)
Generative AI is powered by models that are pretrained on internet-scale data, and these models are called ____ models.
foundation
With ____ models, instead of gathering labeled data for each model and training multiple models as in traditional ML, you can adapt a single model to perform multiple tasks.
foundation
____ models perform tasks including text generation, text summarization, information extraction, image generation, chatbot interactions, and question answering.
Foundation
The foundation model ____ is a comprehensive process that involves several stages, each playing a crucial role in developing and deploying effective and reliable foundation models.
lifecycle
True/False: Foundation models require no training.
False: FMs require training on massive datasets from diverse sources.
____ data can be used at scale for pre-training because it is much easier to obtain compared to ____ data.
Unlabeled / labeled
____ data includes raw data, such as images, text files, or videos, with no meaningful informative labels to provide context.
Unlabeled
Although traditional ML models rely on supervised, unsupervised, or reinforcement learning patterns, ____ are typically pre-trained through self-supervised learning.
foundation models
____ learning makes use of the structure within the data to autogenerate labels.
Self-supervised
During the initial ____ stage, the FM’s algorithm can learn the meaning, context, and relationship of the words in the datasets. For example, the model might learn whether drink means beverage, the noun, or swallowing the liquid, the verb.
pre-training
After the initial pre-training, the foundation model can be further pre-trained on additional data. This is known as ____ pre-training.
continuous
Pre-trained language models can be ____ through techniques like prompt engineering, retrieval-augmented generation (RAG), and fine-tuning on task-specific data.
optimized
Whether or not you fine-tune a model or use a pre-trained model off the shelf, the next logical step is to ____ the model.
evaluate
When the foundation model meets the desired performance criteria, it can be ____ in the target production environment.
deployed
After ____, the model’s performance is continuously monitored, and feedback is collected from users, domain experts, or other stakeholders. This feedback, along with model monitoring data, is used to identify areas for improvement, detect potential biases or drift, and inform future iterations of the model. The feedback loop permits continuous enhancement of the foundation model through fine-tuning, continuous pre-training, or re-training, as needed.
deployment
____ are powerful models that can understand and generate human-like text. They are trained on vast amounts of text data from the internet, books, and other sources, and learn patterns and relationships between words and phrases.
large language models
____ are the basic units of text that the model processes. They can be words, phrases, or individual characters like a period. They also provide standardization of input data, which makes it easier for the model to process.
Tokens
____ are numerical representations of tokens, where each token is assigned a vector (a list of numbers) that captures its meaning and relationships with other tokens.
Embeddings
____ use tokens, embeddings, and vectors to understand and generate text.
large language models
____ is a deep learning architecture system that starts with pure noise or random data. The models gradually add more and more meaningful information to this noise until they end up with a clear and coherent output, like an image or a piece of text.
Diffusion
____ models learn through a two-step process of forward diffusion and reverse diffusion.
Diffusion