Fundamental of ML and AI Flashcards

Get the basics right!

1
Q

What is AI?

A
  1. AI is a field of computer science dedicated to solving cognitive problems commonly associated with human intelligence, such as learning, creation, and image recognition.
  2. AI is a broad field that encompasses the development of intelligent systems capable of performing tasks that typically require human intelligence, such as perception, reasoning, learning, problem-solving, and decision-making. AI serves as an umbrella term for various techniques and approaches, including machine learning, deep learning, and generative AI, among others.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is ML?

A
  1. ML is a type of AI for understanding and building methods that make it possible for machines to learn. These methods use data to improve computer performance on a set of tasks.

Machines learn from huge datasets. There is no explicit instructions.
e.g. medical application to diagnose cancer from x-rays by storing millions of scanned images and diagnoses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Deep Learning (DL)?

A

Deep learning uses the concept of neurons and synapses similar to how our brain is wired. An example of a deep learning application is Amazon Rekognition, which can analyze millions of images and streaming and stored videos within seconds.

Neural networks are at the core of deep learning.
Neural networks have input layer -> one or more hidden layers -> output layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Gen AI?

A

Generative AI is a subset of deep learning because it can adapt models built using deep learning, but without retraining or fine tuning.
Generative AI systems are capable of generating new data based on the patterns and structures learned from training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the key AI technologies?

A

Generative AI is just one of several AI technologies.
Others include:
Natural Language Processing
Computer Vision
Speech Recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the key layers of AI application architecture?

A
  1. Data Layer
  2. ML frameworks and algorithm layer - e.g. functions to build and train AI models -e.g. PyTorch, TensorFlow, etc.
  3. Model Layer - implements the AI model using data and algorithms. Components include a) model structure, b) model parameters and functions, c) optimizer etc.
  4. Application layer - customer facing part of AI, end-users interact with AI systems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Challenges in AI implementation?

A
  1. Data governance - privacy, regulations, security of data involved.
  2. Technical difficulties - high processing power
  3. Data limitations - data quality, high storage, accuracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the four steps involved in building a ML model?

A
  1. Data collection and preparation
  2. Selecting an appropriate algorithm
  3. Training the model on the prepared data
  4. Evaluating performance through test and iteration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ML Model - Training Data

A
  1. Garbage in, garbage out - a ML model is only as good as the data it is trained on.
    2.Labeled data - data where each instance or example is categorized/classified using a label e.g. image with a label of cat, dog etc. Normally provided by humans
  2. Unlabeled data - no associated output classification - e.g. a collection of images
  3. Structured data - organized in a certain format –e.g. tabular or time series data, database rows/columns -e.g. typically used in conventional ML models
  4. Unstructured data - no predefined structure or format e.g. text, images, audio, video, etc. Needs more advanced ML algos to extract patterns and insights.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the ML Process? What are the three types of learnings?

A

Compiled data is fed into algorithms.
Supervised learning - algorithms are trained on labeled data. The goal is to learn a mapping function that can predict output for new, unseen input data.
Unsupervised learning - learn from unlabeled data. The goal is to discover inherent patterns, structures, or relationships within the input data
Reinforcement learning - the machine is given only a performance score as guidance and semi-supervised learning, where only a portion of training data is labeled. Feedback is provided in the form of rewards or penalties for its actions, and the machine learns from this feedback to improve its decision-making over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Inferencing?

A

Trained ML model making predictions or decisions.
Batch inferencing and Real-time inferencing
Batch inferencing - e.g. data analysis, accuracy more important than speed
Realtime - e.g. chatbots and self-driving cars, speed of decision making is important

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the LCM of a FM?

A
  1. Data Selection - FMs are trained on unlabeled data from diverse sources
  2. Pre-training - FMs use self-supervised training, may include initial pre-training and additional pre-training
  3. Optimization - prompt engineering, RAG, fine-tuning on task-specific data
  4. Evaluation - performance measurements using metrics. Can it meet the business needs?
  5. Deployment - target production environment, integrating with APIs etc.
  6. Feedback and continuous improvement - identify biases and drift, inform future iterations of the models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different types of FMs?

A
  1. LLMs - different architectures possible, but transformer based is state-of-the art. Understand and generate human-like text. LLMs use tokens, embeddings and vectors.
  2. Diffusion Models - start with noise/random data and add more and more meaningful information/forward and reverse diffusion
  3. Multimodal models - process and generate multiple modes of data - e.g. video and text; they understand how different modes (e.g. image and text) relate to each other.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In an LLM, what are tokens, embeddings, and vectors?

A

Tokens are basic units of text that an LLM processes
Embeddings are numerical representations of tokens.
i.e. Tokens are assigned a vector which is a list of numbers that captures its meanings and relationships with other tokens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are GANs?

A

A type of generative AI model that involves two competing neural networks.
Generator - generates new synthetic data similar to training data distribution
Discriminator - tries to distinguish between synthetic and real data.
Generator iteratively tries to fool the discriminator until the Discriminator can no longer distinguish real from synthetic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are VAE?

A

Variational Auto Encoders
A type of generative AI model.
Uses encoders and decoders
Encoders encodes essential features of the data into a latent representation
Decoders generates a reconstruction of the original data from the latent representation.

17
Q

What is Prompt Engineering?

A

PE is the fastest and lowest cost way to optimize a model.
Prompt engineering focuses on developing, designing, and optimizing prompts to enhance the output of FMs for your needs.
Prompts include:
a) Instructions - the task for the FM to do.
b) Context - external information to guide the model.
c) Input data - the input for which you want a response
d) Output indicator - the format of the output.

18
Q

What is Fine Tuning?

A

FT is a supervised learning process that involves taking a pre-trained model and adding specific, smaller datasets.

Adding these narrower datasets modifies the weights of the data to better align with the task.

FT can involve a) Instruction Fine Tuning or b) Reinforcement learning from human feedback (RLHF)

19
Q

What is Retrieval-augmented generation?

A

RAG supplies domain-relevant data as context.
A technique is similar to fine-tuning.
However, RAG will not change the weights of the foundation model, whereas fine-tuning will change model weights.

20
Q

What infrastructure does AWS offer for AI/ML services?

A
  1. ML Frameworks - SageMaker allows you to build your own ML models.
  2. AI/ML Services - Textract, Polly, Rekognition, Lex, Transcribe etc.
  3. Gen AI - Bedrock, Q, SageMaker JumpStart etc.
21
Q

What are the cost considerations when using AI/ML services on AWS?

A
  1. Responsiveness and Availability - higher levels = greater cost
  2. Redundancy and Regional Coverage - multi AZ or multi Region costs more
  3. Performance - higher performing GPUs cost more
  4. Token-based pricing - more token = more cost
  5. Provisioned throughput -
  6. Custom Models - training your own model incurs cost.
22
Q

How does ML work?

A

Given a large dataset, the model learns the pattern, and is able to use it to predict output for a given input.

For e.g.
We ‘train’ the algorithm by giving it the following input/output (i,o) combinations – (2,10), (5,19), and (9,31)
The algorithm computes the relationship between input and output to be: o=3*i+4
We then give it input 7 and ask it to predict the output. It can automatically determine the output as 25.

23
Q

Are ML models deterministic?

A

No. They are probabilistic.

If a system’s output is predictable, then it is said to be deterministic. Most software applications respond predictably to the user’s action, so you can say: “If the user does this, he gets that.”

However, machine learning algorithms learn through observation along with experiences. Therefore, they are probabilistic in nature. The statement now changes to: “If the user does this, there is an X% chance of that happening.”