DS Entretien 3 Flashcards
“You are given an unsorted list of integers. The list is too large to fit into memory all at once, but you need to sort it as efficiently as possible. What sorting algorithm would you choose, and why?”
Quick Sort
Its average-case time complexity is O(n log n), making it efficient for large datasets. It operates in-place, requiring only O(log n) additional space for the recursion stack, which is crucial when dealing with memory constraints.
You can mention that for even better performance in certain cases, you might use a randomized pivot or switch to a different algorithm (like merge sort) if the worst-case performance is a concern.
What algorithm is recommended for merging two sorted lists of integers and why?
Merge Sort is recommended because:
* Guarantees a time complexity of O(n log n) in the worst case
* More predictable than Quick Sort
* Useful for large datasets that don’t fit into memory
* Works efficiently with external storage
* Higher space complexity of O(n) due to additional memory requirements
* Less space-efficient than in-place algorithms like Quick Sort
Merge Sort is particularly effective in scenarios where data is too large to handle in memory, making it suitable for external sorting tasks.
what is regularization in one sentence
Regularization is a technique used to prevent overfitting in machine learning models by adding a penalty to the loss function, discouraging overly complex models.
Lasso (L1): adds |w| to the loss, encouraging some weights to become exactly zero
Ridge (L2): adds w^2 encouraging smaller weights without eliminating them
Machine Learning Model Deployment Steps
- Understand problem definition and expected output
- Data preparation (handle missing values, duplicates, formatting)
- Feature engineering (create new features, transform raw data into useful formats)
- Select model
- Train model
- Evaluate model (accuracy, precision, recall, etc)
- Save model (pickle or tensorflow)
- Develop API for inference (Flask, FastAPI)
- Containerize model (using Docker for portability)
- Deploy model (cloud / on-premises/ edge)
- Monitor performance (check for model drift)
- Scale and load balancing (using Kubernetes, cloud scaling)
- Model retraining (automate with CI/CD pipelines)
- Security & Governance (authentification, encryption, compliance)
What is problem definition in machine learning, and why is it important?
Problem definition is understanding the business or research question and what the model is expected to predict or classify. It helps in aligning the model’s objective with real-world goals, guiding data collection and model selection.
What is data preparation in machine learning?
Data preparation involves cleaning the data, handling missing values, removing duplicates, and formatting the data correctly. This step ensures the model can learn effectively and the results are reliable.
What is feature engineering, and why is it important?
Feature engineering is the process of creating new features from raw data or transforming existing features into more meaningful formats. It helps improve model performance by ensuring the model can capture the right patterns in the data.
What is model selection in machine learning?
Model selection is the process of choosing an appropriate algorithm for the task at hand, whether it’s regression, classification, or clustering. It depends on the problem, data, and desired output.
What is model training in machine learning?
Model training involves feeding the preprocessed data into the chosen algorithm so it can learn the underlying patterns. This step adjusts model parameters to minimize error and improve predictions.
How do you evaluate a machine learning model?
Model evaluation involves measuring its performance using metrics such as accuracy, precision, recall, F1-score, or RMSE. These metrics indicate how well the model generalizes to new data.
What does saving a model in machine learning mean?
Saving a model involves serializing it to a file so it can be used later for inference. Common formats are Pickle for Python or TensorFlow’s SavedModel for deep learning.
What is an API, and what are some tools to develop it?
An API (Application Programming Interface) allows external systems to interact with a model. Tools like Flask and FastAPI are used to build APIs that serve machine learning models for real-time predictions.
What does containerizing a machine learning model mean?
Containerizing a model involves packaging it and its dependencies into a portable container using tools like Docker. This ensures the model can run consistently across different environments.
What does it mean to deploy a model, and where can you deploy it?
Deploying a model means making it accessible for use in production, either on cloud platforms (AWS, GCP, Azure), on-premises servers, or on edge devices like mobile phones or IoT devices.
What is model monitoring in deployment?
Model monitoring tracks the model’s performance over time, ensuring it continues to perform well. It involves checking for issues like model drift, which can happen if data patterns change.
What is scaling and load balancing in machine learning deployment?
Scaling ensures that the deployed model can handle varying amounts of traffic by increasing or decreasing resources. Load balancing distributes the traffic evenly across multiple instances to ensure smooth performance.
What is model retraining in machine learning?
Model retraining involves periodically updating the model with new data to maintain its accuracy. This can be automated using CI/CD pipelines, ensuring the model adapts to changing data.
What are security and governance in machine learning deployment?
Security ensures that only authorized users can access the model, using methods like authentication and encryption. Governance involves ensuring compliance with data protection regulations and managing the model lifecycle.
What is precision in machine learning evaluation?
Precision is the ratio of true positive predictions to the total predicted positives. It measures how many of the positive predictions were actually correct. Formula: Precision = TP / (TP + FP)
TP = True Positives, FP = False Positives
What is recall in machine learning evaluation?
Recall is the ratio of true positive predictions to the total actual positives. It measures how many of the actual positives were correctly identified. Formula: Recall = TP / (TP + FN)
FN = False Negatives
What is a confusion matrix?
A confusion matrix is a table that shows the performance of a classification model. It compares the predicted labels with the true labels, showing the counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN).
What is accuracy in machine learning evaluation?
Accuracy is the ratio of correct predictions (both true positives and true negatives) to the total predictions. It is the most common metric but can be misleading in imbalanced datasets. Formula: Accuracy = (TP + TN) / (TP + TN + FP + FN)
TN = True Negatives
What is the F1 score in machine learning evaluation?
The F1 score is the harmonic mean of precision and recall. It balances the two metrics and is especially useful when the class distribution is imbalanced. Formula: F1 = 2 * (Precision * Recall) / (Precision + Recall).
What is specificity in machine learning evaluation?
Specificity (or True Negative Rate) measures the proportion of actual negatives that were correctly identified. Formula: Specificity = TN / (TN + FP).
What is area under the ROC curve (AUC-ROC)?
AUC-ROC is a metric used to evaluate the performance of a binary classification model. The ROC curve plots the true positive rate (recall) against the false positive rate. AUC represents the probability that the model ranks a randomly chosen positive instance higher than a randomly chosen negative instance.
What is the ROC curve in machine learning?
The ROC (Receiver Operating Characteristic) curve is a graphical plot that illustrates the diagnostic ability of a binary classifier as its discrimination threshold is varied. The curve plots the true positive rate (recall) against the false positive rate.
What is logarithmic loss (log loss) in machine learning evaluation?
Log loss measures the performance of a classification model where the prediction is a probability value between 0 and 1. It calculates the negative log-likelihood of the true labels given the predicted probabilities. Lower log loss indicates better model performance.
What is mean squared error (MSE) in machine learning evaluation?
Mean Squared Error (MSE) measures the average squared difference between predicted and actual values. It’s commonly used for regression models. A lower MSE indicates better model accuracy. Formula: MSE = (1/n) * Σ(actual - predicted)²
n = number of observations
What is mean absolute error (MAE) in machine learning evaluation?
Mean Absolute Error (MAE) measures the average absolute difference between predicted and actual values. It’s another common metric for regression models. Unlike MSE, it doesn’t penalize larger errors as heavily. Formula: MAE = (1/n) * Σ|actual - predicted|.
What is R-squared (R²) in machine learning evaluation?
R-squared (also known as the coefficient of determination) measures how well the model explains the variability in the target variable. It ranges from 0 to 1, where 1 indicates a perfect fit. Formula: R² = 1 - (Σ(actual - predicted)² / Σ(actual - mean)²)
mean = average of actual values
What is an index in SQL, and how does it improve query performance?
An index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional storage and maintenance overhead.
ex:
CREATE INDEX idx_user_email ON Users(email);
- this creates an index on the email column of the Users table.
Write an index to make this query faster
SELECT *
FROM Users
WHERE email = ‘doucheturd@email.com’;
CREATE INDEX idx_user_email ON Users(email);
this creates an index on the email column of the Users table.
Inference meaning in ML
The process of using a trained model to make predictions on new, unseen data (without labels)
Supervised ML algorithms
Linear Regression
Logistic Regression
k-NN (K nearest neighbors)
Decision Trees
Random Forests
SVM Linear Kernel
SVM RBF Kernel
Unsupervised learning algorithms in ML
K-means
Hierarchical Clustering
PCA
t-SNE
Deep Learning Algorithms
Fully Connected NN (Dense Layers)
CNN (Convolutional Neural Network)
RNN / LSTM
What does Linear Regression do?
Fits a line to minimize the squared error between predictions and actual values.
What is the training complexity of Linear Regression?
O(n d²), where n = number of samples, d = number of features (due to matrix inversion).
What is the inference complexity of Linear Regression?
O(d), since predictions only require multiplying weights by input features.
What does Logistic Regression do?
A classification model that applies a sigmoid function to predict probabilities.
What is the training complexity of Logistic Regression?
O(n d), as gradient descent updates weights iteratively.
where n = number of samples, d = number of features
What is the inference complexity of Logistic Regression?
O(d), since predictions only require a dot product and activation function.
d = number of features
What does k-NN (k-Nearest Neighbors) do?
Classifies a new point by voting among its k nearest neighbors.
What is the training complexity of k-NN?
O(n d), as there is no real training phase, only storing data.
where n = number of samples, d = number of features
What is the inference complexity of k-NN?
O(n d), since each prediction requires computing distances to all points.
where n = number of samples, d = number of features
What does a Decision Tree do?
Splits data recursively based on feature conditions to create a tree structure.
What is the training complexity of a Decision Tree?
O(n d log n), since it recursively partitions data.
where n = number of samples, d = number of features
What is the inference complexity of a Decision Tree?
O(log n), as prediction follows a single path from root to leaf.
where n = number of samples
What does Random Forest do?
An ensemble of decision trees that votes on predictions for robustness.
What is the training complexity of Random Forest?
O(t n d log n), where t = number of trees and n = number of samples
What is the inference complexity of Random Forest?
O(t log n), since predictions require passing through t trees and n samples.
What does SVM with a Linear Kernel do?
Finds the optimal linear boundary that separates classes.
What is the training complexity of SVM with a Linear Kernel?
O(n d), since it optimizes a convex function.
where n = number of samples, d = number of features
What is the inference complexity of SVM with a Linear Kernel?
O(d), as classification is based on a simple dot product.
where d = number of features
What does SVM with an RBF Kernel do?
Maps data to a higher-dimensional space for non-linear separation.
What is the training complexity of SVM with an RBF Kernel?
O(n² d), as it requires computing pairwise kernel similarities.
where n = number of samples, d = number of features
What is the inference complexity of SVM with an RBF Kernel?
O(n d), since predictions require summing over all support vectors.
where n = number of samples, d = number of features
What does k-Means do?
Groups data into k clusters by minimizing intra-cluster distance.
What is the training complexity of k-Means?
O(n k d t), where t = number of iterations until convergence. k clusters and d is number of features
What is the inference complexity of k-Means?
O(k d), as new points are assigned to the nearest cluster center. k clusters and d features.
What does Hierarchical Clustering do?
Builds a hierarchy of clusters via merging or splitting.
What is the training complexity of Hierarchical Clustering?
O(n² log n), due to computing all pairwise distances. where n = number of samples
What is the inference complexity of Hierarchical Clustering?
O(n²), as merging clusters requires distance updates. n is number of samples.
What does PCA (Principal Component Analysis) do?
Reduces dimensionality by projecting data onto principal components.
What is the training complexity of PCA?
O(n d² + d³), as it involves eigen decomposition.
where n = number of samples, d = number of features
What is the inference complexity of PCA?
O(d²), as transformation only requires matrix multiplication.
where d = number of features
What does a Fully Connected Neural Network (Dense Layers) do?
A deep learning model where each neuron is connected to all neurons in the next layer.
What is the training complexity of a Fully Connected Neural Network?
O(n d² e), where n = samples, d = features, e = epochs (due to matrix multiplications in backpropagation).
What is the inference complexity of a Fully Connected Neural Network?
O(d²), since predictions require forward propagation through layers. d is number of features.
What does a Convolutional Neural Network (CNN) do?
A deep learning model designed for image processing using convolutional filters.
What is the training complexity of a CNN?
O(n d² c e), where c = number of filters, d² = image size, e = epochs. n is number of samples
What is the inference complexity of a CNN?
O(d² c), since convolution operations dominate the forward pass. d features and c filters.
What does an RNN (Recurrent Neural Network) do?
A neural network designed for sequential data, where previous states influence the current state.
What is the training complexity of an RNN?
O(n d² e), due to sequential backpropagation through time (BPTT). e = epochs, n = samples, d = features.
What is the inference complexity of an RNN?
O(d²), since each time step requires a matrix multiplication. d= features
What does an LSTM (Long Short-Term Memory) network do?
An advanced RNN that prevents vanishing gradients using memory cells and gates.
What is the training complexity of an LSTM?
O(n d² e), since it extends RNN training with additional gate calculations. n = samples, e = epochs, d = features
What is the inference complexity of an LSTM?
O(d²), as each time step processes multiple gating functions. d= features
What does t-SNE (t-Distributed Stochastic Neighbor Embedding) do?
A dimensionality reduction technique that visualizes high-dimensional data by preserving local structure.
What is the training complexity of t-SNE?
O(n²), since it requires pairwise distance calculations for all points. n = # of samples
What is the inference complexity of t-SNE?
O(n²), making it impractical for large datasets. n = # of samples
What does the meltTable function do, and what is its time and space complexity?
The function meltTable transforms a wide-format DataFrame into a long-format DataFrame by unpivoting columns into row values.
ex: df.melt(
id_vars=’column_i_want_to_be_index’,
value_vars=df.columns,
var_name=’name_of_column_that_stores_og_column_names’,
value_name=’store_values_of var_name_column_here’)
Time: O(NM), N rows & M columns being melted
Space: O(NM), output frame has NM rows so space usage increases accordingly
Try replacing all the NaN’s in the sf_permits data with the one that comes directly after it and then replacing any remaining NaN’s with 0.
Set the result to a new DataFrame sf_permits_with_na_imputed.
sf_permits_with_na_imputed = sf_permits.fillna(method=’bfill’,axis=0).fillna(0)
If you removed all of the rows of sf_permits
with missing values, how many rows are left?
sf_permits.dropna()
Create a new DataFrame called sf_permits_with_na_dropped that has all of the columns with empty values removed.
How many columns were removed from the original sf_permits DataFrame? Use this number to set the value of the dropped_columns variable below.
sf_permits_with_na_dropped = sf_permits.dropna(axis=1)
dropped_columns = sf_permits.shape[1] - sf_permits_with_na_dropped.shape[1]
get the total number of missing values in a dataframe
df.isnull().sum().sum()
get the total number of missing values (data points) in a dataframe per column
df.isnull().sum()
when is multiplication possible with matrices in Linear Algebra
columns of first matrix must match the rows of the second matrix
ex: first matrix is (1x2) and second matrix is (2x2)
A = [12]
B = [2 4
24 6]
Multiply A*B
C11 = (1x2) + (2x24) = 2 + 48 = 50
C12 = (1x4) + (2x6) = 4 + 12 = 16
C = (50 16)
dimensions of C where
C = A * B, and A is size (mxn) and B is (nxp)
C is (mxp) size, it inherits row size of A and column size of B.
Softmax Function
Softmax converts a vector of raw scores (logits) into probabilities by exponentiating and normalizing them. It is used in multi-class classification problems to ensure the output sums to 1, making it interpretable as probabilities.
Use it when assigning probabilities to multiple classes in models like neural networks.
ReLU(x) =
ReLU(x) = max(0,x)
Sigmoid activation function:
σ(x) =
σ(x) = 1 / (1 +( e^-x))
Input x is mapped to a value between 0 and 1.
As x -> ∞, σ(x) -> 1
Tanh (Hyperbolic Tangent) activation function:
tanh(x) =
tanh(x) = (e^x - e^-x) / (e^x + e^-x)
Function maps any input x to a value between -1 and 1. A x tends to ∞, tanh(x) tends to 1.
Sigmoid vs Softmax
Sigmoid: One output per class (label/target type), independent probabilities (good for binary or multi-label problems).
Softmax: One output per class, probabilities that sum to 1 (good for multi-class problems with exclusive classes).
What makes a function convex
A function is convex if, for any two points on the function, the line segment connecting them lies above or on the function (one global minimum).
Should be shaped liked a bowl with single minimum
Why should cost function be convex
If cost function is not convex, it can lead to optimization problems in machine learning algorithms as local minima can trap the algorithm, preventing it from finding the global minimum.
This prevents model from learning optimal parameters and therefore poor performance on the test data.
What is gradient descent
Gradient descent is an iterative optimization algorithm used to find the minimum of a function.
It works by moving in the opposite direction of the gradient (as gradient points in direction of increase) and in optimization we are looking for the minimum.
x_new = x_old - (LR * gradient). Keep iterating until x_new is the global minimum. x represents the weight or parameter being optimized.
Difference between gradient descent and backpropagation
Gradient Descent is an optimization algorithm that updates model parameters (weights) to minimize a loss function, while Backpropagation is a technique that computes the gradients of the loss function with respect to each weight using the chain rule. Backpropagation provides the gradients, and Gradient Descent uses them to update the weights.
Essentially, backpropagation technique requires doing backpropagation (chain rule to get gradient of loss function for each weight/parameter) AND THEN doing gradient descent to find the optimal parameter/weight by minimizing cost (loss) function.
convert a column of values that should be dates but are objects into dates (ie 1/17/07)
df[‘date_parsed’] = pd.to_datetime(df[‘date’], format=”%m/%d/%y”)