Programming Flashcards
Advanced indexing (PyTorch)
A powerful feature that allows users to access and manipulate specific elements or subsets of a tensor using advanced indexing techniques. This includes boolean masking, integer array indexing, and using tensor indices to select elements along specific dimensions.
Application Programming Interface (API)
Set of rules and protocols that allows different software applications to communicate and interact with each other. APIs define the methods and data formats that applications can use to request and exchange information. They facilitate the development of software by providing a standardized way for developers to access functionality or services provided by other applications, libraries, or platforms.
Array
A data structure that stores a collection of elements of the same data type in contiguous memory locations. Arrays offer efficient access to elements using index-based retrieval and support various operations such as insertion, deletion, and traversal. They are fundamental in programming and are used to represent vectors, matrices, and multidimensional data structures in languages like Python, Java, and C++.
Arrays in Python are data structures that store collections of elements of the same data type in contiguous memory locations. Unlike traditional arrays in languages like C or Java, Python arrays are implemented using the array module or the more versatile numpy library. Arrays in Python provide efficient access to elements through index-based retrieval and support various operations such as insertion, deletion, and traversal. They are commonly used to represent vectors, matrices, and multidimensional data structures.
Arrays are created by calling a method, not just constructed when used [] as this is reserved for lists
array = array.arrray[0,4,0,9,1]
Assert
Programming construct used to test assumptions or conditions within code. It evaluates a Boolean expression and throws an exception or raises an error if the condition is false, indicating a violation of expected behavior. Assert statements are commonly employed in unit testing to validate program correctness and identify errors early in the development process.
Atribute
Represents a piece of data associated with an object. Attributes describe the state of an object. For example, in a Car class, color could be an attribute representing the color of the car.
We call an atribute like this:
object.atribute
Autograd (PyTorch)
Built in function in PyTorch. Autograd in PyTorch is the heart of its deep learning capabilities. It’s a powerful automatic differentiation engine that allows you to efficiently compute gradients (rates of change) for any operation performed on tensors. Here’s a breakdown:
Automatic Differentiation: Imagine building a complex mathematical equation with tensors. Normally, calculating the gradients for each variable involved would be tedious and error-prone. Autograd automates this process.
Computational Graph: When you perform operations on tensors with requires_grad=True, PyTorch creates a computational graph behind the scenes. This graph tracks all the operations performed, essentially showing how each tensor depends on others.
Backpropagation: During training, when you calculate a loss function (how well your model performs), autograd uses the computational graph to efficiently backpropagate the error. It starts from the loss and works backward through the graph, calculating the gradients for each tensor involved.
Optimizer: These gradients are then used by an optimizer (like SGD) to update the weights and biases in your neural network, allowing it to learn and improve its predictions.
In simpler terms: Autograd acts like a magical bookkeeper, meticulously tracking every step in your calculations and then efficiently calculating the gradients you need to train your neural network effectively.
Boolean
Named after the mathematician George Boole, refers to a data type or algebraic system that represents two possible values: True and False. In Boolean algebra, these values are typically denoted as 1 for True and 0 for False. Boolean values are fundamental in computer science for logical operations, decision-making, and binary state representation. In Python, boolean values are represented by the bool type, and logical operations such as AND, OR, and NOT are performed using the keywords and, or, and not, respectively.
Buffer
A temporary storage area in computer memory used to hold data temporarily during input/output operations or between different processes. In the context of neural networks, a buffer refers to a temporary storage area used to hold intermediate or temporary data during the forward and backward passes of the training process. Buffers are commonly used to store activations, gradients, and other intermediate computations at different layers of the network. During the forward pass, input data is propagated through the network, and intermediate results are stored in buffers for subsequent computation. During the backward pass (backpropagation), gradients are computed with respect to the loss function, and intermediate gradients are stored in buffers to update the network parameters (weights and biases) through optimization algorithms such as gradient descent. Buffers play a crucial role in managing data flow and optimizing memory usage in neural network implementations, especially for large-scale models with many layers and parameters.
Casting
Casting in Python, with functions like int(), float(), and str(), ensures data type compatibility and facilitates manipulation. Explicit conversion is common, while implicit casting occurs, such as in arithmetic operations. Handling errors, like incompatible type conversions, is essential for smooth execution. Python provides built-in functions for type conversion, allowing seamless transition between different data types. Care should be taken to ensure data integrity and prevent runtime issues. Overall, casting is a fundamental aspect of Python programming, enabling flexibility and versatility in data processing tasks.
Class
A class is a blueprint for creating objects with specific attributes and behaviors. It encapsulates data (attributes) and behavior (methods) into a cohesive unit, promoting code organization and reusability. Objects are instances of classes, created using the class’s constructor method. Classes support inheritance, allowing subclasses to inherit attributes and methods from their superclass. This enables hierarchical organization of code and facilitates code reuse and modularity, essential principles in object-oriented programming.
Think of an object as a blueprint (class) brought to life. For example, a “Car” blueprint has properties (color, make, model) and behaviors (accelerate, brake, turn). A specific car you see on the street is an object—an instance of the “Car” class.
Code interpreter
A software component responsible for executing code statements or instructions interactively. Interpreters translate and execute code directly, line by line, without the need for compilation. In machine learning, code interpreters facilitate rapid prototyping, debugging, and experimentation with algorithms and models, enhancing the development workflow and productivity of practitioners.
Code layouts
Code layout refers to the organization and structure of code within a file or project. It encompasses various aspects such as indentation, spacing, and commenting styles, which significantly affect code readability and maintainability. An effective code layout enhances collaboration among developers and reduces the likelihood of introducing errors during code modifications. Properly structured code layouts adhere to consistent conventions and principles, making it easier for developers to understand, debug, and extend the codebase over time.
Command line argparse
A module in Python’s standard library that facilitates the parsing of command-line arguments passed to Python scripts. It provides a user-friendly interface for creating powerful and flexible command-line interfaces. By defining arguments, options, and their corresponding actions, developers can effortlessly handle user inputs from the command line. Command line argparse simplifies argument parsing by automatically generating help messages and error handling mechanisms. It supports a wide range of argument types and validation rules, making it suitable for building robust and interactive command-line applications.
Command lines
Command lines serve as the primary interface for users to interact with computer programs by entering text commands into a terminal or command prompt. These commands typically instruct the operating system to execute specific actions or run programs. Command lines provide a versatile and efficient means of performing tasks such as file manipulation, system configuration, and program execution. Users can leverage command lines to navigate file systems, install software packages, manage processes, and automate repetitive tasks through scripting. Despite the prevalence of graphical user interfaces (GUIs), command lines remain indispensable for advanced users and system administrators due to their flexibility and scripting capabilities.
Example: Running Python scripts or executing system commands using the terminal or command prompt.
Compiler
A software tool that translates source code written in a high-level programming language into machine-readable binary code or executable files. Compilers analyze, optimize, and transform source code into an efficient form that can be executed on a target platform. In machine learning and artificial intelligence, compilers are used to optimize and accelerate code execution, particularly for performance-critical tasks such as training deep neural networks and executing inference on edge devices.
Comprehentions
Comprehensions are concise and expressive syntax constructs in programming languages, such as list comprehensions, dictionary comprehensions, and set comprehensions. Comprehensions enable developers to create new data structures by iterating over existing ones and applying transformations or filters in a single line of code.
even_numbers = [number for number in numbers if number % 2 == 0]
Conditional breakpoint
A debugging feature that allows developers to pause program execution at specific points in the code only when certain conditions are met. Unlike regular breakpoints, which halt execution unconditionally, conditional breakpoints provide more flexibility by allowing developers to specify criteria for triggering the breakpoint. Common use cases include debugging loops, conditional branches, or complex logic where developers need to inspect variables or evaluate expressions under specific conditions. By setting conditional breakpoints, developers can streamline the debugging process and focus their attention on relevant code paths, thereby accelerating the identification and resolution of software bugs.
Context managers
Objects that enable the management of resources within a block of code by automatically allocating and releasing them. They are typically used with the with statement, which ensures that the necessary setup and teardown actions are performed in a predictable and consistent manner. Context managers abstract away resource management complexities and help prevent resource leaks or conflicts by encapsulating resource-related logic within context manager objects. Common examples of context managers include file handles (open()), database connections, and locks. By using context managers, developers can write cleaner, more robust code that is easier to read, understand, and maintain.
Control Flow
Control flow refers to the order in which instructions or statements are executed in a program or algorithm. In coding, control flow structures, such as loops, conditional statements, and function calls, govern the flow of execution and decision-making in algorithms and models. Control flow mechanisms enable the implementation of complex logic, iteration, and branching behavior in code, facilitating algorithmic design and problem-solving strategies.
Data Frame
A two-dimensional labeled data structure used for storing and manipulating tabular data in programming languages like Python (Pandas), R, and Julia. It consists of rows and columns, where each column can be of a different data type (e.g., numerical, categorical, or text).
DataLoader library
Library in PyTorch equiped allowing for loading and assembling many data inputs into batches
Debugging
Process of identifying, isolating, and resolving errors, or bugs, in computer programs. It plays a crucial role in software development by ensuring that programs behave as intended and meet the specified requirements. Debugging techniques range from simple print statements and logging to sophisticated debugging tools and techniques provided by integrated development environments (IDEs). Developers use debugging to trace the execution flow, inspect variable values, analyze stack traces, and identify the root causes of software defects. Effective debugging requires a systematic approach, critical thinking skills, and a deep understanding of the programming language and environment.
Decorators
A higher-order functions in Python that modify or enhance the behavior of other functions or methods without altering their core implementation. They achieve this by wrapping the target function with additional functionality, such as logging, caching, authentication, or error handling. Decorators are commonly used to enforce cross-cutting concerns, such as security policies or performance optimizations, across multiple functions within a codebase. They promote code reuse, modularity, and separation of concerns by allowing developers to encapsulate common functionalities within reusable decorator functions. Decorators are a powerful tool in Python’s arsenal, enabling developers to write clean, concise, and expressive code with minimal boilerplate.
Dictionary
Dictionaries are sometimes found in other languages as “associative memories” or “associative arrays”. Dictionary is a set of key: value pairs, with the requirement that the keys are unique (within one dictionary).
Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend().
A pair of braces creates an empty dictionary: {}.
Enumerate
The enumerate() method in Python is a built-in function used to iterate over a sequence (such as a list, tuple, or string) while also keeping track of the index or position of each item. It returns an iterator that yields pairs of (index, value) tuples, where index represents the index of the item in the sequence and value represents the corresponding item.
It is commonly used in for loops when you need to access both the index and the value of each item in a sequence simultaneously. It simplifies code by eliminating the need to manually manage index variables. It is also handy for constructing dictionaries or other data structures where you need both keys and values from a sequence.
You can specify a custom starting index for counting by providing the start parameter. For example, enumerate(sequence, start=1) will start counting from 1 instead of 0. The enumerate() function returns an enumerate object, which is an iterator. You can convert it to a list or tuple if needed using the list() or tuple() functions, respectively.
range() is used when you need to iterate over a sequence of numbers, typically for controlling the number of iterations in a loop or generating index values for accessing elements in a sequence.
Evaluation mode in NN
The operational state, or a setting in neural network frameworks (like PyTorch or TensorFlow). When it’s used solely for making predictions or inferences on new, unseen data, without updating its parameters. In this mode, the network doesn’t learn from the input data; instead, it applies the learned parameters to produce output. During training you for example have activated dropouts . Evaluation mode is typically used during model evaluation, testing, or deployment phases.
model.eval() # Switches the model to evaluation mode
model.train() # Switches back to training mode
Exception handling
Programming paradigm that focuses on managing and responding to errors, or exceptions, that occur during program execution. Exceptions represent abnormal or unexpected conditions that disrupt the normal flow of the program and require special handling to ensure graceful recovery or termination. Python provides robust support for exception handling through the try, except, finally, and raise keywords, allowing developers to intercept, handle, and propagate exceptions as needed. By implementing effective exception handling strategies, developers can improve the reliability, resilience, and maintainability of their software applications. Common exception handling techniques include logging errors, retrying failed operations, and providing informative error messages to users.
Float
Floats. Decimal point can “float” move before and after any number of digits. Floating-point number is a data type used to represent decimal numbers with a fractional component. Floats are typically stored using a fixed number of bits in memory, allowing them to represent a wide range of values, but with limited precision.
Floats are used to represent real numbers in applications where precision is required, such as scientific computing, numerical simulations, and machine learning. However, due to the finite precision of float representation, arithmetic operations on floats may introduce rounding errors, leading to numerical instability in certain computations. Techniques like double precision and arbitrary-precision arithmetic are used to mitigate these issues in critical applications.
7.6
Generators
Generators in Python are functions or expressions that enable the creation of iterators in a memory-efficient and lazy-evaluation manner. Unlike traditional functions that compute and return all values at once, generators produce values on-the-fly, one at a time, as they are requested by the consumer. This lazy evaluation strategy conserves memory and improves performance, especially when dealing with large or infinite sequences of data. Generators are implemented using the yield keyword, which suspends the function’s execution and yields a value to the caller. By leveraging generators, developers can write concise, expressive code for processing data streams, generating sequences, and implementing custom iterators with minimal overhead.
Iloc vs loc
In pandas, iloc and loc are methods used for selection and indexing in DataFrame objects. While both methods enable data access based on row and column labels, they differ in their indexing conventions. iloc stands for integer location and is used for selecting rows and columns by their integer position within the DataFrame. In contrast, loc stands for label location and is used for selecting rows and columns by their index or column labels.
In what parts, files is NN model saved/exported
Model Architecture:
Structure: The configuration of layers (types, numbers, connections), activation functions, optimizer used, input/output shapes, etc. This blueprint defines the model itself.
Format: Can be a computational graph representation (TensorFlow), or a more declarative, serialization-friendly format.
Learned Weights:
Values: The numerical values of the weights for each connection within the neural network. These are meticulously adjusted during training and are crucial for the model’s ability to make predictions.
Format: Often stored in binary files optimized for fast loading.
What Might Sometimes Be Included
Optimizer State:
Includes information like momentum values, learning rate schedules, etc. This is less essential for basic inference, but useful when you want to resume training later.
Metadata:
Things like the original training dataset’s information, preprocessing steps, or class labels can sometimes be included for easier model management later.
How It’s Saved
Libraries:
TensorFlow: SavedModel format (a whole directory structure), or HDF5 files.
PyTorch: Typically using torch.save() which pickles the model objects.
Keras: HDF5 files are common, or you can integrate TensorFlow’s SavedModel.
Serialization: Saving the model in platform-independent formats like ONNX (Open Neural Network Exchange) for deployment across different frameworks.
it’s very common to save a neural network model in two separate files:
Architecture File:
Contains the model’s structure: layer types, connections, activation functions, etc.
Usually a text-based format like JSON or YAML for human readability.
Weights File:
Contains the learned weights and biases (numerical values) for all the connections in the network.
This is often a binary file optimized for efficient loading and computation.
Why Two Files?
Flexibility: Separating architecture from weights allows you to load the same architecture and initialize it with different weight sets (e.g., fine-tuning a pre-trained model, experimenting with random initializations).
Transferability: You might potentially reuse the model architecture with weights trained on a different dataset.
File 1: Model Architecture
JSON (JavaScript Object Notation): A hierarchical, text-based format widely used due to its simplicity and human readability. Libraries often provide easy ways to define and save the model structure as JSON.
YAML (Yet Another Markup Language): Similar to JSON but often considered slightly more human-friendly for configuration files.
Protocol Buffers: Language-neutral, platform-neutral mechanism by Google for serializing data. Can be more efficient than JSON or YAML in some cases.
Library-Specific Formats:
TensorFlow: Can be part of SavedModel, or its own structure saved within an HDF5 file.
PyTorch: Often uses a Python pickle format.
File 2: Learned Weights
HDF5 (Hierarchical Data Format 5): A common standard for storing scientific data. Allows for organizing weights and related metadata within a single file.
NumPy Arrays (.npy): Simple format for storing raw numerical arrays, often used for individual weight matrices.
Library-specific:
TensorFlow: Checkpoints or within SavedModel (variables).
PyTorch: Often in Python pickle format.
Important Notes
Cross-Framework Formats: Formats like ONNX aim to represent the model in a way that’s portable between different deep learning frameworks.
Compression: Weights files can sometimes be compressed to save space
Inheritance
Inheritance is a core concept in object-oriented programming that enables code reuse and specialization. A subclass inherits attributes and methods from its superclass, allowing it to extend or modify the superclass’s behavior. This promotes modularity and scalability by facilitating hierarchical organization of classes. Inheritance supports the “is-a” relationship, where a subclass is a specialized version of its superclass. It fosters polymorphism, enabling objects of different subclasses to be treated uniformly based on their common superclass.
Intermidiete data representations (PyTorch)
intermediate data representations refer to the transformed versions of your input data as it flows through the different layers of the neural network. Here’s why they’re important:
Hierarchical Learning:
Each layer of a neural network extracts increasingly complex and abstract features from the previous layer’s output.
Early layers might focus on basic edges and patterns. Later layers can build representations of objects, textures, or even higher-level concepts.
Debugging and Understanding:
Examining intermediate representations can help you understand what the network has learned at different stages.
This can be helpful for identifying training problems, diagnosing bottlenecks, or even interpreting how a model makes its decisions.
Not Input, Not Output: It’s the data in its transformed states between the layers of a neural network. It’s neither the raw input nor the model’s final prediction.
Progressive Transformation: Each layer takes its input data, applies weights, biases, and activation functions, and produces a new modified representation. This modified representation is the intermediate data for that layer.
Increasingly Abstract: As data moves deeper into the network:
Early layers: Intermediate data captures low-level features (lines, edges, basic colors).
Later layers: Intermediate data represents more complex concepts and patterns (shapes, objects, or task-specific information).
Why the Term “Representation” Matters
Reframing the Data: Intermediate data isn’t just modified numbers; it’s the evolving way the neural network “understands” or re-represents the input to suit the task.
The Key to Learning: The network’s ability to learn lies in how it successfully modifies these intermediate representations into ones that are highly useful for the final output.
Example: Facial Recognition
Input: Raw pixel values of a face image.
Early Layers: Intermediate data might highlight edges, lines, and color gradients.
Middle Layers: Intermediate data might capture parts of a face (eyes, nose, mouth shapes).
Later Layers: Intermediate data could represent higher-level concepts related to facial identity.
Output: The final classification of the person’s identity.
Intiger
Integers (often shortened to ‘int’) represent whole numbers – both positive and negative – without any fractional components. Integers cannot have a decimal point. (3.14 is not an integer, it’s a floating-point number).