ML Flashcards
Reinforcement learning is what type of learning?
A combination of unsupervised and supervised learning
Which of the following is true about supervised learning? The outcomes is Expected or it is not expected?
Expected outcome is defined
What type of supervised learning problem is categorized into regression problem?
Predict the cost of a car on the basis of given parameters
Which of the following types of machine learning algorithms forms a significant part of the human learning?
Unsupervised Learning
Clustering algorithms fall under which of the following categories of machine learning models?
Unsupervised Learning
Which of the following is the equation for linear regression?
y = β0 + β1x1
Which of the below-mentioned machine learning algorithms is/are used to predict continuously valued quantities?
2) Linear Regression
Name a type of unsupervised machine learning
k means clustering
K-means differs from other clustering methods
There are a predetermined amount of clusters in K-means
how to import SVC
from sklearn.svm import SVC
Create classifier model ?
# Instantiate SVC() svc = SVC() clf = svc.fit(X_train,y_train)
The process to activate live trading
Initialise function schedule function optional function data fetching order placement
What is a persistent namespace in blueshift
persistent namespace for you to store variables you need to access
What is in schedule_function(
schedule_function( func = <>, date_rule = <>, time_rule = <> )
Difference between linear regression & logistic regression
Both are supervised models however the linear regression is used to solve regression problems whereas the logistic regression is used to solve classification problems.
SVMs
Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.
KNN
K-Nearest Neighbors (KNN) is one of the simplest algorithms used in Machine Learning for regression and classification problem. KNN algorithms use data and classify new data points based on similarity measures (e.g. distance function). Classification is done by a majority vote to its neighbors.
Which of the following is incorrect about the random forest algorithm?
Random forest is a supervised ensemble technique that can be used to solve a regression or a classification problem. A random forest operates by building multiple decision trees and combines their prediction to get a more accurate and stable overall prediction.
What is the purpose of performing cross-validation?
A statistical analysis on an independent data. It is one of the methods for assessing and choosing the best parameters in a prediction or machine learning task. The process of cross validation includes keeping aside a sample dataset, then training the model on the remaining dataset and finally, using the dataset kept aside to test if the model gives a positive result or not.
What does fully connected mean in neural networks
All neurons are full connected to each hidden layer neuron . P*H = connections in Feed network
What does the activation function do
The activation function helps to incorporate non-linear mathematics into the neural network full stop
Name 2 activation function s
Sigmoid function or 8 TanH : Sigmoid between 0 and 1
What is the distance between high bias and high variance ?
High bias is machine that has not been able to do well leaning the data.
High variance is a machine that has over learned the training data and does badly on test data.
What are CNN
Convolutional neural networks, used to help find the right indicator
If your model requires a labelled target data set, then it is called?
Classification model and if the model requires a target dataset then it is a supervised model.
Which of the following is not a supervised algorithm?
KMeansis a clustering technique and is an unsupervised learning method. K Means uses the distance from centroids to cluster the data.
A Decision Tree divides the input data into
A Decision Tree uses the greater than and less than operations to split the data into different parts. When plotted these splits appear at right angles to each other.
K-Fold cross-validation involves
The K in K-folds represents the number of parts into which the data would be split. K is always an integer.
Which of the following metrics is not used in measuring the performance of a classification model?
Accuracy, Precision and Specificity are used as metrics while solving a regression problem.
Which of the following is not a wrapper method?
Principal Component Analysis (PCA) is an unsupervised method. Recursive Feature Elimination(RFE), Forward Selection and Backwards Elimination are wrappers of feature selection.
Logistic Regression is a
Although the name suggests a regression, it is used to classify the data into different labels.
Which of the following is not a technique used in training a neuron?
Dropout layer is used to switch off the weights of neurons from performing updation.
Which if the following is an activation function used in training a neural network?
Tanh
ReLU
Sigmoid
Which of the following is not a part of a neuron?
A neuron consists of Kernel Weights, an Activation function and a Bias term. Not a Variance
Back-propagation is used in supervised learning?
Backpropagation is used to improve the prediction quality. This is used when there is a target dataset available.
Gradient Descent is used to
Gradient descent is used the reduce the loss or error in prediction.
An Artificial Neural Network does not contain
An ANN is not built with decision trees, but it can contain multiple neurons.
The Bias of a Neuron is a
A constant term is added to the kernel at every layer and this is known as the bias.
Can Neural Networks solve nonlinear problems
Yes
Neural Networks can be used in
Supervised Learning
Unsupervised Learning
Reinforcement Learning
A decision tree have
A decision tree ends with leaves and contains branches and a root.
Precision is defined as
Predicted positive outcomes that were actually positive
The Predictor and Category are independent
A T-test is used to check the relation between the dependent and independent variables.
Which of the following is used in hyperparameter tuning
Sklearn provides both Grid and Random search methods for cross-validation.
After solving the Maximum Likelihood Estimate, we get
z-score and the P-values along with the coefficients
The Decision Tree Classifier model in sklearn does give the following output
Classes
Feature Importance
Probability
you need to use accuracy score function as extra
In Quadratic Discriminant Analysis
While performing QDA a covariance matrix for each of the classes in the target data set is created.
Which if the following is true about Ridge regression equation
Ridge is an L2 type and minimizes a sum of the square of errors. An L1 type uses the sum of absolute values of the errors.
Regularisation is used to address which problem
Regularization is very effective in solving a model overfitting problem. It penalizes the coefficients to solve this problem.
To improve prediction an ensemble model uses
An Ensemble typically contains multiple models whose predictions are weighted.
Which of the following is not an ensemble technique
When samples are taken from the population with replacement then it is called bootstrapping.
Important parameter/s of Boosting
Number of trees
Learning Rate
Depth of the tree