Data Science - MODULE 4 CODE Flashcards

1
Q

Vir die neurale netwerke, wate word als geimport?

A

Ook accuracy_score, train_test_split, cross_val_score en dan twee nuwe goed:
From sklearn.neural_network import MLPClassifier
From sklearn.preprocessing import StandardScaler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you drop missing entries from a dataframe?

A

Df = df.replace(“?”, np.nan)
Df = df.dropna()

So die vraagteken kan verander na aanleiding van hoe die data lyk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

As jy klaar gesplit het in x en y, moet jy y flatten vir NN, hoe doen jy dit?

A

Y = np.ravel(y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hoe lyk die kode vir die scaling?

A

Scaler = StandardScaler()
Scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hoe skep jy die regressor vir die neurale netwerk?

A

Reg = MLPClassifier(max_iter=2000, hidden_layer_sizes=(5,5), random_state=1)
En dan verder is dit dieselfde
Reg.fit(X_train)
Y_pred = reg.predict(X_test)
Accuracy_score(y_pred, y_test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 pitfalls of neural networks

A
  • not reaching the global minimum error (belangrik om aantal itterasies te verander, ook gebruik te maak van die stochastic gradient descent)
  • overfitting (cross validation is baie belangrik, hy het ook gepraat van regulisation?)
  • interpreting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ravel jy voor j split?

A

Dit lyk so

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Wat bedoel ons met scaling in die konteks van NN

A

Bereken basies die Z scores van alles, omdat die netwerk baie blootgestel is aan absolute waarde. As daar n veranderlike is met groot waardes, sal dit die netwerk domineer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Python het interessante indexing, hoe doen jy n nested for basies sodat die cross validation scores bereken kan word?

A

For hidden_layer_size in [(i,j) for i in range(3,7) for j in range (3,7)]:
(Indent hom dan)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hoe lyk die cross validation gedeelte dan binne die for by n neurale netwerk met twee layers?

A

Reg = MLPClassifier(max_iter=2000, hidden_layer_sizes=hidden_layer_size, random_state=1)
Score = cross_val_score(estimator=reg, X=X_train, y=y_train, CV=2)
Validation_scores[hidden_layer_size] = score.mean()

En dan kan jy maar net elke waarde print om te sien hoe die score verander

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Die imports vir n 3d surface plot?

A

From mpl_toolkits.mplot3d import Axes3D
From matplotlib import cm
From matplotlib.ticker import linearLocator, FormatStrFormatter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hoe generate ek die x,y, en z values vir die surface plot?

A

Px,py = np.meshgrid(np.arange(3,7),np.arange(3,7))
Pz = np.array([[validation_scores[(i,j)] for i in range (3,7)] for j in range(3,7)])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

As jy klaar alles geimport het, en jy het die px,py,pz waardes gevorm, hoe maak ons nou om te plot?

A

Fig = plt.figure()
Ax = fig.add_subplot(projection=’3d’)
Surf = ax.plot_surface(px,py,pz)
Plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hoe kan jy die maksimum waarde kry van die validation scores? Wat in (i,j) formaat is

A

Max(validation_scores.values())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Wat is n response function?

A

So ons kyk dan basies, wanneermons een van die veranderlikes in die inset verander, hoe affekteer dir die predictor value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Wat is die algemene stappe om die response function waardes te generate

A

Copy eers al die x’e
X_design = X.copy()

Dan generate on die eerste ry van die inset met medians
X_design_vec = pd.Dataframe(X_design.median()).transpose()

Dan verander ek.in elke ry die waarde wat se impak rk wil toets, maar jy moet eers die reeks generate
Min_resultant = min(X.loc[:,”Resultant”])
Max_resultant (dieselfe)
Seq = np.linspace(start=min, stop=max, num=50)

17
Q

As jy nou klaar die base ry het, en die vektor van verskille waardes om in die NN wat dan?

A

Generate eers klaar die nuwe x:
To_predict=[]
For result in seq:(wat nou die vektor is van verskillende waarde)
X_design_vec.loc[0,”Resultant”] = result
To_predict.append(X_design_vec.copy())

En dan op die ou end kan ons nou weer al hierdie vektore bymekaar sit

To_predict = pd.concat(to_predict)
To_predict = scaler.transform(to_predict)
Predictions = clf.predict(to_predict)

En dan kan jy maar net plot

18
Q

Met die MLP classifier, is daar n metode om die probability te predict?

A

Predictions = clf.predict_proba(to_predict)