Machine learning Flashcards

Question

The Cold Start problem (bootstrapping) -

Answer 1

happens when there is little or no labeled data available at the beginning of training. Incremental Learning: Gradually improve the model as new data comes in. Transfer Learning: Use knowledge from a similar, pre-trained model. Data Augmentation: Create synthetic data to increase the dataset size.

Answer 2

Batch Learning (Traditional ML) All data is available before training. Model is optimized using the full dataset. Assumes data remains the same over time. Incremental Learning (Online ML) Model continuously updates with new data. Needs to make accurate predictions at any time. Limited memory (can't store all data). Uses a compact representation of past data (e.g., statistics, recent samples). Challenges in Incremental Learning: - - Updating the Model Fully Online: Update after every new data point. - Mini-Batch: Update after collecting small batches of data. - Batch Learning: Store all data and retrain (not always feasible). Concept Drift (Changes in Data Over Time): - Gradual Change: Adapt smoothly by giving more weight to recent data. - Sudden Change: Detect shifts (e.g., accuracy drops) and adjust the model. - New Data Categories: Use clustering to detect new patterns. Incremental learning helps models stay up-to-date without retraining from scratch.

Answer 3

To deal with the cold start problem. A model learned for one task is reused as the starting point for learning a model for another task (reuse model learned to recognize activity in one room for another room). Activity Recognition: A model trained to recognize activities in one room can be reused to recognize activities in another room, instead of starting from scratch.

Answer 4

This means that you should generate additional labeled data points from the data you already have. • For image data, well-developed methods already exist, such as rotation, flipping, brightness adjustments, or using neural networks to generate new images based on existing ones. • For other types of data (such as text, tabular data, or time series), there are not as many well-established methods, making it harder to generate realistic additional examples

Answer 5

Black Box: User feedback is based only on the input and output of the model. White Box: User provides feedback on the internal structure of the model, offering more transparency. Increases user trust by showing how the model works. Requires visualising the model and data, and some ML expertise. Used mainly in offline IML so far.

Answer 6

the algorithms are provided with data that does not contain any labels or explicit instructions on what to do with it. The goal is for the learning algorithm to find structure in the input data on its own. Large datasets are costly, especially with time-consuming and expensive labeling. It’s useful when the number or type of classes in the data is unknown. Applications: market basket analysis, medical diagnosis, marketing, social media. Divided into two problems: Clustering and dimension reduction.

Answer 7

Clustering algorithms find hidden patterns in the data based on their similarities or differences. These patterns can relate to the shape, size, or color and are used to group data items or create clusters. Clustering methods: Partitioning clustering (K-means and K-medoids) and Hierarchical clustering (agglomerative clustering and divisive clustering).

Answer 8

Partitioning clustering algorithms group data based on similarities and differences. Key characteristics: * Non-overlapping: Each data point belongs to one cluster. Predefined number of clusters (hyperparameter). * Center-based: Cluster described by its center. * Objective function: Measures data similarity/dissimilarity. * Iterative optimization: Process based on similarity criteria, often using Euclidean distance.

Answer 9

K-means: is a clustering algorithm that groups similar data points together. Each group (cluster) is defined by a centroid (a central point). Goal: Group data into K clusters based on similarity. How It Works: Choose K – Pick the number of clusters. Initialize Centroids – Randomly select K starting points. Assign Points – Each data point joins the nearest centroid. Update Centroids – Recalculate the center of each cluster. Repeat – Until centroids stop changing. Finding the Best K: Elbow Method – Find the "bend" in a WCSS (Within-Cluster Sum of Squares) plot. Silhouette Score – Measures how well points fit in clusters (closer to 1 = better).

Answer 10

Selecting some samples of clusters as primary centers (each cluster one sample as a center) Result: Greater resistance to outliers and noise.

Answer 11

Groups data into a tree structure (dendrogram), showing the hierarchical relationship between clusters. Steps: Start with each sample as a separate cluster. Merge the closest clusters. Repeat merging until all samples are in one large cluster. Strategies: Agglomerative Clustering (Bottom-up): Starts with individual clusters and merges the closest ones until all data points form a single cluster. Uses a distance metric (e.g., Euclidean distance). Divisive Clustering (Top-down): Starts with all data in one cluster and recursively splits it into smaller clusters until each sample is its own cluster.

Answer 12

Dimensionality reduction is used to reduce the number of features in a dataset while retaining as much of the important information as possible. It is a process of transforming high-dimensional data into a lower-dimensional space that still preserves the essence of the original data. Reasons for using: simplifying complex data, removal of excesses, noise reduction. Feature selection and Feature extraction.

Answer 13

Creating new feature sets from original features (Finding a combination of new features). Different methods: 1. Linear, PCA (prinicpal component analysis, convering original features to uncorrelate features), LDA (linear discriminant analysis, is a technique based on between class distance and within class distance based on labels). 2. Non-linear, LLE (locally linear embedding, A dimensionality reduction method that preserves the geometric structure of high-dimensional data. It transforms complex structures (e.g., a 3D Swiss roll) into a lower-dimensional space while keeping relationships intact).

Answer 14

selecting a subset of the main features, removing less important features.

Answer 15

Centralized Learning: Data is collected and processed in one central server. Model training and inference happen on the central server. * Suitable when: Data is centralized. No strict privacy or transfer constraints. E.g. centralized patient data. Decentralized Learning: Data is generated across multiple devices (phones, IoT). Enables smarter models without centralizing data.

Answer 16

Distributed learning reduces the cost of training a model on a centralized server by using multiple computers (nodes) across a network, like in clusters or cloud systems. The training process is divided into smaller tasks, executed on different machines in parallel. Data is spread across these nodes. Nodes communicate to share information and update the model. Used for large datasets or complex models that need a lot of computing power. Tools like Apache Spark, TensorFlow, and PyTorch are used for distributed learning.

Answer 17

Federated learning is a privacy-focused approach where model training happens on decentralized devices (edge devices or local servers) without sending raw data to a central location. Each device trains the model locally and shares only updates with a central server, which combines these updates to improve the global model. It’s used when data privacy is important, and data cannot be easily centralized. Commonly applied in mobile devices, healthcare, and IoT. When to use Federated Learning: When on-device data is more relevant than data stored on servers. When the data is sensitive or large (e.g., health data, IoT). When labels can be derived from user interactions. Federated Learning Strategies: Centralized Federated Learning: Needs a central server to coordinate clients and gather updates. Decentralized Federated Learning: No central server; devices share updates directly with each other, avoiding a single-point failure.

Answer 18

How it works: The central server sends an untrained model to devices. Each device trains the model with local data. Devices send back trained models (not data) to the server. The server combines the models, often by averaging. This process repeats until the global model improves. The updated model is sent to devices for testing. Pros: ensure privacy: data remains on the user's device. reduce latency: the update model can make predictions on the user's device. smarter models: collaborative training process. Cons: implementation cost: higher than collecting the data and processing it centrally. communication is expensive. unreliable client availability.

Answer 19

federated stochastic gradient descent (fedsgd). Federated averaging (fedavg): the common baseline algorithm. federated learning with dynamic regularization (feddyn).

Answer 20

A method where multiple computers work together to solve a problem, appearing as a powerful single computer. It handles complex tasks like encrypting large data, solving equations, and rendering 3D animations. Examples: Cloud Computing. Edge Computing. Fog Computing: A hybrid approach that balances processing between the cloud and edge devices. Pros: - Scalability: Can grow by adding more computing devices (nodes) when handling increased workload. - Availability: The system remains functional even if the computer fails, fault tolerance. - Consistency: Automatically manages data consistency across computers, ensuring reliable data without compromising fault tolerance. - Transparency: Users interact as if it's a single system. - Efficiency: Optimizes hardware use for faster performance, handling workloads without system failures.

Answer 21

- Client-Server Architecture: Client: Requests data from the server. Server: Manages and synchronizes resources. Easy to maintain, secure. Communication bottleneck. - Three-Tier Architecture: Client: Same as client-server. Application Server: Handles communication and application logic. Database Server: Manages data storage Reduces communication bottleneck. More complex than client-server. - N-Tier Architecture: Multiple client-server systems working together to solve a problem. Used in modern distributed systems with various enterprise applications. Increased complexity. - Peer-to-Peer Architecture: All networked computers share equal responsibilities. Popular for content sharing, file streaming, and blockchain networks. No central control, can be harder to manage.

Answer 22

Can use either loos or tight coupling. Loose Coupling: Components are weakly connected and can operate independently. Changes made to one component don’t affect the others. e.g client sends a message to a server and does other jobs until getting a response. Tight Coupling: Components are strongly connected, often relying on each other to work efficiently. Changes in one component can directly impact the others.

Answer 23

Types of Parallelism: - Task Parallelism: Different tasks run on separate cores/processors. - Data Parallelism: The same task runs on different data chunks at the same time. Hardware: Multi-core Processor: A CPU with multiple cores that can perform tasks simultaneously. GPUs/TPUs: Specialized processors for fast, parallel execution (mainly used in AI). Parallel computing speeds up tasks by running them at the same time, using multi-core CPUs or specialized hardware like GPUs/TPUs.

Answer 24

Distributed System: Components are located in different places. Multiple computers in a network work together. Each computer has its own memory. Tasks are distributed across different computers. Parallel Computing: Multiple processes run simultaneously using multiple processors. A single computer is used. Processors may share or have separate memory. Tasks are performed within one system.

Answer 25

* can process large datasets in a fraction of the time. * less memory and computer requirements needed as the set of instructions is distributed to smaller execution nodes. * more execution can be added or removed from the processing network depending on the complexity of the problem.

Answer 26

Horizontal Scaling: Increases capacity by adding more computers to a cluster. Expands storage and computing power by adding nodes. Embarrassingly Parallel: Tasks are easy to split and run independently. If one process fails, it can be re-run without affecting others. Data Locality: Data is processed where it's stored to improve efficiency. Computation and output happen on the same node to reduce data movement. Fault Tolerance: Allows the system to keep running even if some components fail. Ensures reliability and continuous operation.

Answer 27

Pro: * Scalability: K-means is a scalable algorithm that can handle large datasets with high dimensionality. * Speed: is a relatively fast algorithm, making it suitable for real-time or near-real-time applications * Simplicity: is a simple algorithm to implement and understand. Con: * User-defined: requires the user to specify the number of clusters (K) beforehand. * Non-convex shaped clusters: assumes clusters are round (spherical), struggles with irregularly shaped clusters. * Can't handle noisy data: are sensitive to noisy data or outliers, which can significantly affect the clustering results