Data Science - MODULE 3 CODE Flashcards

1
Q

As jy train_test_split gebruik, wat is die default split

A

Kwart van jou data word dan test data, driekwart om mee te train

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Drie imports vanaf sckikit learn wanneer jy n tree based classifier wil doen, met pruning

A

From sklearn.tree import DecisionTreeClassifier, plot_tree
From sklearn.model_selection import train_test_split, cross_val_score
From sklearn.metrics import accuracy_score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

To read a csv

A

Import pandas as pd

Df = pd.read_csv(‘naam.csv’, delimiter = “,”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Metode om n seker hoeveelheid rekords te sample?

A

Df.sample(10,random_state =0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Df.count()

A

Generate a text table met die colum headings en die hoeveelheid entries in elke kolom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

As j in een veranderlike die aantal rye soek

A

=len(df[:])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Se maar jy wil net die rye tel, waar n seker kolomwaarde geld

A

Len(df[df.columnName==”yes”])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Wanneer jy features en responses split hardloop altyd daarna die head() funksie

A

Jup, om seker te maak jynhet die regte data. Moet column headings bold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

As jy nou features matriks het, sowel as die responses. Hoe split jy nou die data in train en test data?

A

Deur gebruik te maak van die scikit livrary wat geimport was
From sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hoe selekteer train_test_split die data

A

Randomly, dit is nie net n cut nie. As jy bv head hardloop op altwee nuwe matrikse, sal jy sien dieselfde rye word gereference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Om n instance te skep van die DecisionTreeClassifier en die fit te doen?

A

Classifier = DecisionTreeClassifier(random_state = 0)
Classifier.fit(X_train,y_train)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hoe format ek n float om 2 desimale te wys?

A

{:2.2%}.format(variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hoe lry ek die diepte en die aantal leaves van n decision tree?

A

Classifier.get_depth()
Classifier.get_n_leaves()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Refresher, hoe generate ek die matriks van die samples wat ek wil toets met die cross validation

A

Samples = [sample for sample in range(0,50)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Hoe lyk die for dan?

A

For sample in samples:
(Indent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hoe skep hy aanvanklik die plekhouer vir die classifiers, en hoe sit jy die nuwe in?

A

Classifiers = []
Classifiers.append(temp_classifier)
Hy fit dit ook voor hy dit insit

17
Q

So daar is n makliker manier ook om die akkuraatheid te kry?

A

Jip, die classifier het ook n score method, so dit is
Classifier.score(X_test, y_test)

18
Q

So na ons nou al die classifiers generate het, en die modelle opgestel jet met fit, hoe kan jy al die accuracies kry?

A

Bv. Train_scores = [clf.score(X_train,y_train) for clf in classifiers]

19
Q

Hoe het hy aanvanklik die figure verklaar nou met die twee accuracy lyne?

A

Fig, ax = plt.subplots()

20
Q

Hoe set jy die labels vir die figuur met die asse?

A

ax.set_xlabel(“x as naam”)
Selfde vir y label

21
Q

Hoe doen mens die titel van die figuur?

A

ax.set_title(“titel”)

22
Q

Hoe populate jy dan plot met een van die datastelle?

A

Ax.plot(samples, train_scores, marker=’o’, label=”train”, drawstyle=”steps-post”)

23
Q

As jybdie legen op n figuur wil sit?

A

ax.legend()

24
Q

Hoe generate ek n array van al dir leaves per classifier?

A

Nr_leaves = [clf.get_n_leaves() for clf in classifiers]

25
Q

Hoe generate jy die scores vir die cross validation

A

Classifier_temp_cross_val = DecisionTreeClassifier (random_state=1, min_samples_leaf=sample)
Score = cross_val_score(estimator=classifier declared, X=X_train, y=y_train, cv=5)
Validation_scores.append(score.mean())

26
Q

Twee arrays, joe extract ek dan bv die hoeveelheid damples wat gepaardgaan met die maksimum score?

A

Samples[validation_scores.index(max(validation_scores))]

27
Q

Watter ander metodes is daar om te prune?

A

Max_depth
Max_leaf_nodes
Min_impurity_decrease
Min_samples_split

28
Q

Waarvoor moet j versigtig wees as jy met kategoriee bv werk as input met n decision tree?

A

Dat daar nie onnodige waarde toegeskry word daaraan nie. Dit is net n nommer, en het nie verdere betekenis, normaal geassosieer met n nommer nie. Ons moet one hot encoding doen

29
Q

You can explor the unique values within a column?

A

Df[‘kolomNaam’].unique()

30
Q

Hierdie is n moeilike een, hoe skep mens die one hot encoder?

A

Encoder = OneHotEncoder(categories=’auto’)
Xd = Encoder.fit_transform(df.KolomNaam.values.reshape(-1,1)).toArray()
Df_ohe = pd.DataFrame(Xd, columns=[“KolomNaam”+str(int(i)) for i in range (Xd.shape[1])])

31
Q

Importing the one hot encoder

A

From sklearn.preprocessing import OneHotEncoder

32
Q

Hoe convert jy n pandas index na n list to?

A

Df.index.tolist()

33
Q

Hoe save ek n figuur?

A

Plt.savefig(‘naam.png’)