Deep Learning Flashcards
Neural Networks
Interconected nodes or “neurons” organized in layers. Neurons are connected by weighted links adjusted during training. Can learn complex patterns and relationships in data.
Inspired by animal brains.
Layers of a Neural Network
- Input Layer - receives initial data
- Hidden layer(s) - process the data
- Output Layer - produces the final result
Simple neural networks may only have a few layers.
Deep Learning
Uses neural networks with many layers to progressively extract higher-level features from raw input. Can be supervised, unsupervised, or semi-supervised. Automatically “understands” the data by breaking it down and progressively building up hierarchy of learned features, from simple patterns to complex, abstract concepts.
Range from a few to thousands of layers.
Computational Power for Deep Learning
Requires significant computational power, often necessitating the use of GPUs and distributed computing, where multiple computers work together to solve a common problem or perform complex tasks.
Where Deep Learning Excels
Automatically learning representations of data with multiple levels of abstraction. As data passes through the network, learned representations become more abstract and closer to the high-level concepts we care about. Allows the model to understand complex data and handle a wide range of data types and tasks efficiently. It also allows the network to learn features by itself without domain-specific engineering.
Example: in an image, first detect edges or colors, then corners …
Examples of Deep Learning
- Self-driving cars
- Virtual assistants (Siri, Alexa)
- Recommendation systems
- Medical image analysis
- Game-playing AI
Transformers
Type of Deep Learning architecture that has revolutionized AI. Can process entire sequences in parallel, speeding up training and inference. Particularly good at sequences like language or time series data.
Introduced in seminal paper, “Attention is All Your Need” by researchers at Google in 2017.
How Transformers Work
- Take a series of inputs, e.g., words in a sentence
- Use “attention” to focus on different parts of the sequence at once and how they relate.
- Process the entire sequence in parallel –> faster. E.g., reading an entire paragraph at once.
- Have multiple layers, each adding more depth. First might capture simple patterns like word order, while deeper ones understand complex relationships like grammar or meaning.
- By the end, the Transformer has a rich understanding of the input data.
- Once the transformer understand the input, it can produce outputs like translation or answering a question.
Attention
Focusing on different parts of the sequence at once. Helps the model decide which words in a sentence are important and how they relate to each other, even if far apart.
Self-Attention
Allows the model to weigh the importance of different words in a sentence relative to each other, enabling the model to capture conceptual relationships. Each word is transformed into a vector, and the self-attention mechanism calculates attention scores to determine the relevance of each world to the others.
Diffusers
Gradually add noise to data, such as an image, and then reverse this process to “denoise” the data. Starts with random noise and progressively refines it into a clear, structured output, like an image or sound.
Examples: DALL-E, Stable Diffusion
Transformers vs. Diffusers
Transformer models are great at text-based tasks involving sequential data like translation, chatbots, and content generation, while diffuser models are generally used for image-based tasks like generating images, videos, or even sound where you’re creating new data from scratch.
Parameters
Internal variables that determine how an AI model processes input data and generates output. Learned from training data. Act as the “knobs and dials” that the AI model adjusts during training to minimize the difference between its predictions and actual values. Can be tailored and tuned for specific applications.
Weights, biases, and scaling factors.
Hyperparameters
External settings that influence the models’ learning process and architecture.
Weights
Parameters that determine the strength of connections between neurons. Learned by the model during training.