Fundamental of ML and AI Flashcards

Question 1

Q

What is AI?

Answer

A

AI is a field of computer science dedicated to solving cognitive problems commonly associated with human intelligence, such as learning, creation, and image recognition.
AI is a broad field that encompasses the development of intelligent systems capable of performing tasks that typically require human intelligence, such as perception, reasoning, learning, problem-solving, and decision-making. AI serves as an umbrella term for various techniques and approaches, including machine learning, deep learning, and generative AI, among others.

Question 2

Q

What is ML?

Answer

A

ML is a type of AI for understanding and building methods that make it possible for machines to learn. These methods use data to improve computer performance on a set of tasks.

Machines learn from huge datasets. There is no explicit instructions.
e.g. medical application to diagnose cancer from x-rays by storing millions of scanned images and diagnoses.

Question 3

Q

What is Deep Learning (DL)?

Answer

A

Deep learning uses the concept of neurons and synapses similar to how our brain is wired. An example of a deep learning application is Amazon Rekognition, which can analyze millions of images and streaming and stored videos within seconds.

Neural networks are at the core of deep learning.
Neural networks have input layer -> one or more hidden layers -> output layer.

Question 4

Q

What is Gen AI?

Answer

A

Generative AI is a subset of deep learning because it can adapt models built using deep learning, but without retraining or fine tuning.
Generative AI systems are capable of generating new data based on the patterns and structures learned from training data.

Question 5

Q

What are the key AI technologies?

Answer

A

Generative AI is just one of several AI technologies.
Others include:
Natural Language Processing
Computer Vision
Speech Recognition

Question 6

Q

What are the key layers of AI application architecture?

Answer

A

Data Layer
ML frameworks and algorithm layer - e.g. functions to build and train AI models -e.g. PyTorch, TensorFlow, etc.
Model Layer - implements the AI model using data and algorithms. Components include a) model structure, b) model parameters and functions, c) optimizer etc.
Application layer - customer facing part of AI, end-users interact with AI systems

Question 7

Q

Challenges in AI implementation?

Answer

A

Data governance - privacy, regulations, security of data involved.
Technical difficulties - high processing power
Data limitations - data quality, high storage, accuracy

Question 8

Q

What are the four steps involved in building a ML model?

Answer

A

Data collection and preparation
Selecting an appropriate algorithm
Training the model on the prepared data
Evaluating performance through test and iteration

Question 9

Q

ML Model - Training Data

Answer

A

Garbage in, garbage out - a ML model is only as good as the data it is trained on.
2.Labeled data - data where each instance or example is categorized/classified using a label e.g. image with a label of cat, dog etc. Normally provided by humans
Unlabeled data - no associated output classification - e.g. a collection of images
Structured data - organized in a certain format –e.g. tabular or time series data, database rows/columns -e.g. typically used in conventional ML models
Unstructured data - no predefined structure or format e.g. text, images, audio, video, etc. Needs more advanced ML algos to extract patterns and insights.

Question 10

Q

What is the ML Process? What are the three types of learnings?

Answer

A

Compiled data is fed into algorithms.
Supervised learning - algorithms are trained on labeled data. The goal is to learn a mapping function that can predict output for new, unseen input data.
Unsupervised learning - learn from unlabeled data. The goal is to discover inherent patterns, structures, or relationships within the input data
Reinforcement learning - the machine is given only a performance score as guidance and semi-supervised learning, where only a portion of training data is labeled. Feedback is provided in the form of rewards or penalties for its actions, and the machine learns from this feedback to improve its decision-making over time.

Question 11

Q

What is Inferencing?

Answer

A

Trained ML model making predictions or decisions.
Batch inferencing and Real-time inferencing
Batch inferencing - e.g. data analysis, accuracy more important than speed
Realtime - e.g. chatbots and self-driving cars, speed of decision making is important

Question 12

Q

What is the LCM of a FM?

Answer

A

Data Selection - FMs are trained on unlabeled data from diverse sources
Pre-training - FMs use self-supervised training, may include initial pre-training and additional pre-training
Optimization - prompt engineering, RAG, fine-tuning on task-specific data
Evaluation - performance measurements using metrics. Can it meet the business needs?
Deployment - target production environment, integrating with APIs etc.
Feedback and continuous improvement - identify biases and drift, inform future iterations of the models

Question 13

Q

What are the different types of FMs?

Answer

A

LLMs - different architectures possible, but transformer based is state-of-the art. Understand and generate human-like text. LLMs use tokens, embeddings and vectors.
Diffusion Models - start with noise/random data and add more and more meaningful information/forward and reverse diffusion
Multimodal models - process and generate multiple modes of data - e.g. video and text; they understand how different modes (e.g. image and text) relate to each other.

Question 14

Q

In an LLM, what are tokens, embeddings, and vectors?

Answer

A

Tokens are basic units of text that an LLM processes
Embeddings are numerical representations of tokens.
i.e. Tokens are assigned a vector which is a list of numbers that captures its meanings and relationships with other tokens.

Question 15

Q

What are GANs?

Answer

A

A type of generative AI model that involves two competing neural networks.
Generator - generates new synthetic data similar to training data distribution
Discriminator - tries to distinguish between synthetic and real data.
Generator iteratively tries to fool the discriminator until the Discriminator can no longer distinguish real from synthetic.

Question 16

Q

What are VAE?

Answer

Study These Flashcards

A

Variational Auto Encoders
A type of generative AI model.
Uses encoders and decoders
Encoders encodes essential features of the data into a latent representation
Decoders generates a reconstruction of the original data from the latent representation.

Question 17

Q

What is Prompt Engineering?

Answer

Study These Flashcards

A

PE is the fastest and lowest cost way to optimize a model.
Prompt engineering focuses on developing, designing, and optimizing prompts to enhance the output of FMs for your needs.
Prompts include:
a) Instructions - the task for the FM to do.
b) Context - external information to guide the model.
c) Input data - the input for which you want a response
d) Output indicator - the format of the output.

Question 18

Q

What is Fine Tuning?

Answer

Study These Flashcards

A

FT is a supervised learning process that involves taking a pre-trained model and adding specific, smaller datasets.

Adding these narrower datasets modifies the weights of the data to better align with the task.

FT can involve a) Instruction Fine Tuning or b) Reinforcement learning from human feedback (RLHF)

Question 19

Q

What is Retrieval-augmented generation?

Answer

Study These Flashcards

A

RAG supplies domain-relevant data as context.
A technique is similar to fine-tuning.
However, RAG will not change the weights of the foundation model, whereas fine-tuning will change model weights.

Question 20

Q

What infrastructure does AWS offer for AI/ML services?

Answer

Study These Flashcards

A

ML Frameworks - SageMaker allows you to build your own ML models.
AI/ML Services - Textract, Polly, Rekognition, Lex, Transcribe etc.
Gen AI - Bedrock, Q, SageMaker JumpStart etc.

Question 21

Q

What are the cost considerations when using AI/ML services on AWS?

Answer

Study These Flashcards

A

Responsiveness and Availability - higher levels = greater cost
Redundancy and Regional Coverage - multi AZ or multi Region costs more
Performance - higher performing GPUs cost more
Token-based pricing - more token = more cost
Provisioned throughput -
Custom Models - training your own model incurs cost.

Question 22

Q

How does ML work?

Answer

Study These Flashcards

A

Given a large dataset, the model learns the pattern, and is able to use it to predict output for a given input.

For e.g.
We ‘train’ the algorithm by giving it the following input/output (i,o) combinations – (2,10), (5,19), and (9,31)
The algorithm computes the relationship between input and output to be: o=3*i+4
We then give it input 7 and ask it to predict the output. It can automatically determine the output as 25.

Question 23

Q

Are ML models deterministic?

Answer

Study These Flashcards

A

No. They are probabilistic.

If a system’s output is predictable, then it is said to be deterministic. Most software applications respond predictably to the user’s action, so you can say: “If the user does this, he gets that.”

However, machine learning algorithms learn through observation along with experiences. Therefore, they are probabilistic in nature. The statement now changes to: “If the user does this, there is an X% chance of that happening.”

Fundamental of ML and AI Flashcards

Get the basics right! (23 cards)