4 Flashcards
SVM: SVM maximizes the
Margin from both clusters
SVM: the margin is
The space between the closest data point and the line
SVM: SVM is an
Algorithm that separates data with a line
SVM: SVM prioritizes
Correct classification over maximizing margin
SVM: To create your SVM classifier, type
from sklearn.svm import SVC
clf = SVC(kernel=linear)
SVM: SVM might work poorly if
There are more features than samples
SVM: Can an SVM create a non linear decision boundary?
Yes, using the kernel trick
SVM: SVM stands for
Support vector machine
Python: To run a script from within a script but pull in variables from one script to the next, the best way is to
Use import
datetime: To switch the position of month and day by converting a date string and then replace slashes with dashes, type
from datetime import datetime
date = datetime.strptime(“05/01/15”, “%m/%d/%y”)
date.strftime(“%d-%m-%y”)
SVM: For a linear kernel, a gamma of 1.0 will produce a
Straight decision boundary
SVM: Some important parameters for SVM are
Kernel
Gamma
C
SVM: Some common kernels are
rbf and linear
SVM: the c parameter controls
The degree to which the decision boundary will curve to contain all of the training data
SVM: increasing the value of the c parameter will
Increase the number of data points correctly in their decision boundary but may over compensate to the point that it is no longer predictive
ML: For a very large data set with a lot of noise and without a clear decision boundary, the better model than SVM is
Naive bayes
Pandas: the Series.replace({“value”:”value2”}) replaces
every occurance
ML: Accuracy improves proportional to
Traning data size with diminishing returns beginning after 700 samples.
ML: To see if your models prediction accuracy would benefit from more samples you could
Slice you training data into 4 sections and test the accuracy of each cumulative sum. 200, 400, 600, 800. The improvement in accuracy should start suffering diminishing returns at higher sample sizes.
Python: One way to schedule a script is to
while True: if int(datetime.now().strftime("%M")) == 30: print("30 Mintues!") time.sleep(61) time.sleep(2)
ML: Usually, to increase accuracy it is better to have
More data rather than a more finely tuned algorithm.
datetime: To return the current date, type
import datetime
datetime.datetime.now().strftime(“%d/%m/%y”)
ML: A high information gain feature is one that is
very common in one classification and not in others
Pandas: To change all of a newly uploaded df’s dates to proper format and also turn numbers into numeric types, type
df = df.convert_objects(convert_numeric=True)