Fundamental of ML and AI Flashcards
Get the basics right!
What is AI?
- AI is a field of computer science dedicated to solving cognitive problems commonly associated with human intelligence, such as learning, creation, and image recognition.
- AI is a broad field that encompasses the development of intelligent systems capable of performing tasks that typically require human intelligence, such as perception, reasoning, learning, problem-solving, and decision-making. AI serves as an umbrella term for various techniques and approaches, including machine learning, deep learning, and generative AI, among others.
What is ML?
- ML is a type of AI for understanding and building methods that make it possible for machines to learn. These methods use data to improve computer performance on a set of tasks.
Machines learn from huge datasets. There is no explicit instructions.
e.g. medical application to diagnose cancer from x-rays by storing millions of scanned images and diagnoses.
What is Deep Learning (DL)?
Deep learning uses the concept of neurons and synapses similar to how our brain is wired. An example of a deep learning application is Amazon Rekognition, which can analyze millions of images and streaming and stored videos within seconds.
Neural networks are at the core of deep learning.
Neural networks have input layer -> one or more hidden layers -> output layer.
What is Gen AI?
Generative AI is a subset of deep learning because it can adapt models built using deep learning, but without retraining or fine tuning.
Generative AI systems are capable of generating new data based on the patterns and structures learned from training data.
What are the key AI technologies?
Generative AI is just one of several AI technologies.
Others include:
Natural Language Processing
Computer Vision
Speech Recognition
What are the key layers of AI application architecture?
- Data Layer
- ML frameworks and algorithm layer - e.g. functions to build and train AI models -e.g. PyTorch, TensorFlow, etc.
- Model Layer - implements the AI model using data and algorithms. Components include a) model structure, b) model parameters and functions, c) optimizer etc.
- Application layer - customer facing part of AI, end-users interact with AI systems
Challenges in AI implementation?
- Data governance - privacy, regulations, security of data involved.
- Technical difficulties - high processing power
- Data limitations - data quality, high storage, accuracy
What are the four steps involved in building a ML model?
- Data collection and preparation
- Selecting an appropriate algorithm
- Training the model on the prepared data
- Evaluating performance through test and iteration
ML Model - Training Data
- Garbage in, garbage out - a ML model is only as good as the data it is trained on.
2.Labeled data - data where each instance or example is categorized/classified using a label e.g. image with a label of cat, dog etc. Normally provided by humans - Unlabeled data - no associated output classification - e.g. a collection of images
- Structured data - organized in a certain format –e.g. tabular or time series data, database rows/columns -e.g. typically used in conventional ML models
- Unstructured data - no predefined structure or format e.g. text, images, audio, video, etc. Needs more advanced ML algos to extract patterns and insights.
What is the ML Process? What are the three types of learnings?
Compiled data is fed into algorithms.
Supervised learning - algorithms are trained on labeled data. The goal is to learn a mapping function that can predict output for new, unseen input data.
Unsupervised learning - learn from unlabeled data. The goal is to discover inherent patterns, structures, or relationships within the input data
Reinforcement learning - the machine is given only a performance score as guidance and semi-supervised learning, where only a portion of training data is labeled. Feedback is provided in the form of rewards or penalties for its actions, and the machine learns from this feedback to improve its decision-making over time.
What is Inferencing?
Trained ML model making predictions or decisions.
Batch inferencing and Real-time inferencing
Batch inferencing - e.g. data analysis, accuracy more important than speed
Realtime - e.g. chatbots and self-driving cars, speed of decision making is important
What is the LCM of a FM?
- Data Selection - FMs are trained on unlabeled data from diverse sources
- Pre-training - FMs use self-supervised training, may include initial pre-training and additional pre-training
- Optimization - prompt engineering, RAG, fine-tuning on task-specific data
- Evaluation - performance measurements using metrics. Can it meet the business needs?
- Deployment - target production environment, integrating with APIs etc.
- Feedback and continuous improvement - identify biases and drift, inform future iterations of the models
What are the different types of FMs?
- LLMs - different architectures possible, but transformer based is state-of-the art. Understand and generate human-like text. LLMs use tokens, embeddings and vectors.
- Diffusion Models - start with noise/random data and add more and more meaningful information/forward and reverse diffusion
- Multimodal models - process and generate multiple modes of data - e.g. video and text; they understand how different modes (e.g. image and text) relate to each other.
In an LLM, what are tokens, embeddings, and vectors?
Tokens are basic units of text that an LLM processes
Embeddings are numerical representations of tokens.
i.e. Tokens are assigned a vector which is a list of numbers that captures its meanings and relationships with other tokens.
What are GANs?
A type of generative AI model that involves two competing neural networks.
Generator - generates new synthetic data similar to training data distribution
Discriminator - tries to distinguish between synthetic and real data.
Generator iteratively tries to fool the discriminator until the Discriminator can no longer distinguish real from synthetic.