Machine Learning revision Flashcards

Question

Why do we standardise data?

Answer 1

We standardise data to ensure that one variable in data doesn't have a biased influence on the result from the start (just because its value is larger than all the other values).

Answer 2

Confusion Matrix and accuracy score, CAP curve from sklearn.metrics import comfusion_matrix, accuracy_score cm = confusion_matrix(y_test, y_pred) print(cm) accuracy_score(y_test, y_pred)

Answer 3

from matplotlib.colors import ListedColormap X_set, y_set = fc.inverse_transform(X_train, y_train) X1, X2 = np.meshgrid( np.arrange(start=X_set[:, 0].min() - 10, stop=X_set[:,0].max() + 10, step = 0.25), np.arrange(start=X_set[:, 1].min() - 10, stop=X_set[:,1].max() + 10, step = 0.25)) plt.ccontourf(X1, X2, classifier.predict(fc.transform(np.array([X1.ravel(), X2.ravel()]).T))>reshape(X1.shape), alpha = 0.75, cmap = ListedColormap(['red', 'green'])) plt.xlim(X1.min(), X1.max()) plt.ylim(X2.min(), X2.max()) for i,j in enumerate(np.unique(y_set)): plt.scatter(X_set[y_set == j,0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))[i], label = j) plt.title('...') plt.xlabel('...') plt.ylabel('...') plt.legend() plt.show()

Answer 4

Data should be standardised from sklearn.neighbors import KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski, p = 2) classifier.fit(X_train, y_train)

Answer 5

Data should be standardised from sklearn.svm import SVC classifier = SVC(kernel = 'linear') #'poly'/'rbf'/'sigmoid' etc. classifier.fit(X_train, y_train)

Answer 6

Data should be standardised from sklearn.naive_bayes import GaussianNB classifier = GaussianNB() classifier.fit(X_train, y_train)

Answer 7

Data should be standardised from sklearn.tree import DecisionTreeClassifier classifier = DecisionTreeClassifier(criterion = 'entropy') classifier.fit(X_train, y_train)

Answer 8

Data should be standardised ``` from sklearn.ensemble import RandomForestClassifier classifier = RandomForestClassifier( n_estimators = 10, criterion = 'entropy') classifier.fit(X_train, y_train) ```

Answer 9

- Logistic Regression model - K-NN - Support Vector Machine Classification - Naive-Bayes - Decision Tree - Random Forest

Answer 10

- K-Means Clustering (K-Means++ avoids random initialisation trap). - Hierarchal Clustering a. Agglomerative (bottom-up approach) b. Divisive (top-to-bottom approach)

Answer 11

1. Choose the number K of clusters 2. Select at random K points, the centroids (not necessarily from your dataset). 3. Assign each data point to the closest centroid -> forming K clusters. 4. Compute and place the new centroid of each cluster. 5. Reassign each data point to the new closest centroid. If reassignment took place, go to step 4, otherwise FINSIH.

Answer 12

Clustering is an unsupervised model!

Answer 13

from sklearn.cluster import KMeans kmeans = KMeans(n_clusters = 5, init = 'k-means++') y_kmeans = kmeans.fit_predict(X)

Answer 14

``` from sklearn.cluster import KMeans wcss = [] for i in range(1, 11): #11 not fixed, depends on the data. kmeans = KMeans(n_clusters = i, init = 'k-means++') kmeans.fit(X) wcss.append(kmeans.inertia_) plt.plot(range(1, 11), wcss) plt.title(...) plt.xlabel(...) plt.ylabel(...) plt.show() ```

Answer 15

1. Make each data point a single-point cluster, forming N clusters 2. Take two closest data points and make them one cluster => N-1 clusters 3. Take two closest cluster and make them one cluster => N-2 cluster 4. Repeat Step 3till only one cluster in left. FINISH. 5. Use dendrograms to determine the number of clusters

Answer 16

from sklearn.cluster import AgglomerativeClustering hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward') y_hc = hc.fit.predict(X) #'ward' is the best to minimise variance

Answer 17

``` import scipy.cluster.hierarchy as sch dendrogram = sch.dendrogram(sch.linkage(X,method='ward')) plt.title(...) plt.xlabel(...) plt.ylabel(...) plt.show() ```

Answer 18

sklearn Clustering performance evaluation :- 1. Adjusted Rand Index (ARI requires knowledge of the ground truth classes) 2. Silhoutte Coefficient (when ground truth is unknown) there are a lot of ways to do both cases, and the sklearn library should be considered before choosing which one to use.

Answer 19

People who did A, also did B. 1. APRIORI 2. ECLAT

Answer 20

1. Set a minimum support and confidence 2. Take all the subsets in transactions having higher support than minimum support. 3. Take all the rules of these subsets having higher confidence than minimum confidence 4. Sort the rules by decreasing lift. ``` #note Support(event A) = (Times event A occurs/ Times all events occur) ``` Confidence(event A -> event B) = (Times event B followed by event A occurs/ times event A happened) Lift(A -> B) = Confidence(A -> B) / Support(B)

Answer 21

Support(event A) = (Times event A occurs/ Times all events occur) Confidence(event A -> event B) = (Times event B followed by event A occurs/ times event A happened) Lift(A -> B) = Confidence(A -> B) / Support(B)

Answer 22

transactions = [] for i in range(0, len(dataset[0])): #columns transactions.append([str(dataset.values[i,j]) for j in range (0, len(dataset)) #rows from apyori import apriori rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)

Answer 23

ECLAT uses support only. | Can be implemented using apyrori only by only including min_support as an argument.

Answer 24

1. Upper Confidence Bound 2. Thompson Sampling Code implemented on own- see downloaded documents.

Answer 25

1. Clean the text i. get only the relevant words ii. get rid of words like 'the', 'a', 'on', '...', etc. iii. can get rid of numbers unless they have significant impact. iv. Stemming. (transform words to their roots eg. Singing -> sing) 2. Bag of words model 3. Use Classification (common models: Naive-Bayes and Random Forest). // Standard Scaling won't be needed as its mostly 0s and 1s.

Answer 26

1. Feature selection:- - backward elimination - forward selection - bidirectional elimination - score comparison etc. 2. Feature Extraction - Principal Component Analysis (PCA). - Linear Discriminant Analysis (LDA). - Kernel PCA. - Quadratic Discriminant Analysis (QDA).

Answer 27

- PCA is an unsupervised model - It is used for noise filtering, visualisation, feature extraction, stock market predictions, gene data analysis. - PCA identifies patterns in data and detects the correlation between the variable. - GOAL: Reduce the dimensions of a d-dimensional dataset by projecting it into a k-dimensional subspace (where k

Answer 28

1. Standardise the data 2. Obtain the Eigenvectors or Eigenvalues from the covariance matrix or correlation matrix, or perform Singular Vector Decomposition. 3. Sort eigenvalues in descending order and choose the k eigenvectors that correspond to the k largest eigenvalues where k is the number of dimensions of the new feature subspace (k <= d) 4. Construct the projection matrix W from the selected k eigenvectors. 5. Transform the original dataset X via W to obtain a k-dimensional feature subspace Y.

Answer 29

Apply PCA generally after feature scaling and before training the model ``` Code:- from sklearn.decomposition import PCA pca = PCA(n_components = 2) X_train = pca.fit_transform(X_train) X_test = pca.transform(X_test) ``` ...train the model...

Answer 30

- commonly used - supervised - Goal of LDA is to project a feature space onto a small subspace k while maintaining the class-discriminatory information.

Answer 31

- Both PCA and LDA are linear transformation techniques used for dimensional reduction. - LDA => supervised because of the relation to the dependent variable. PCA=> unsupervised. -

Answer 32

1. Compute the d-dimensional mean vectors for the different classes from the dataset. 2. Compute the scatter matrices (in-between-class and within_class scatter matrix). 3. Compute the eigenvectors (e1, e2, ..., eN) and considering eigenvalues (l1, l2, ..., lN) for the scatter matrices. 4. Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to forma. dxk-dimensional matrix W(where every column represents an eigenvector). 5. Use this dxK eigenvectors matrix to transform the samples onto the new subspace => Y = XxW (where X is a non-dimensional matrix representing the n samples, and y are the transformed nxk dimensional samples in the subspace.

Answer 33

after feature scaling froms sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA lda = LDA(n_components = 2) X_train = lda.fit_transform(X_train, y_train) X_test = lda.transform(X,test) ..train the model ....

Answer 34

Apply PCA generally after feature scaling and before training the model Code:- from sklearn.decomposition import KernelPCA pca = KernelPCA(n_components = 2, kernel = 'rbf) X_train = pca.fit_transform(X_train) X_test = pca.transform(X_test) ...train the model...

Answer 35

- to avoid the case that we just got lucky. - divides training_set to smaller parts and tests the accuracies. -After training the model Code: from sklearn.model_selection import cross_val_score accuracies = cross_val_score( estimator = NAME_OF_MODEL, X = X_train, y = y_train, cv = 10) #10 is most common. print("Accuracy:{%.2f}%".format(accuracies.mean()*100)) print("Std Dev:{%.2f}%".format(accuracies.std()*100))

Answer 36

After training the model and applying k-Fold Cross Validation:- ``` For Kernel SVM:- from sklearn.model_selection import GridSearchCV parameters = #what we want to tune [ {'c': [0.25, 0.5, 0.75, 1], 'kernel': ['linear']}, {'c': [0.25, 0.5, 0.75, 1], 'kernel':['rbf'], 'gamma':[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]}] ``` grid_search = GridSearchCV(estimator = classifier, param_guide = parameters, scoring = 'accuracy', cv = 10, n_jobs = -1) grid_search.fit(X_train, y_train) best_accuracy = grid_search.best_score_ best_parameters = grid_search.best_params_ print("Best Accuracy:{.2f}%".format(best_accuracy * 100)) print("Best Parameters:", best_parameters

Machine Learning revision Flashcards

(60 cards)