Data Science MODULE 2 CODE Flashcards

1
Q

Wat se library bring jy in vir die decision tree regressor?

A

From sklearn.tree import DecisionTreeRegressor, plot_tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hoe bring jy die data in?

A

Df = pd.read_csv(‘Boston.csv’,delimiter=”,”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hoe sien jy die dimensies van die data stel ingebring?

A

Df.shape

Gee vir jou die aantal rye en kolomme. Headings word outomaties gestoor as headings, en hy gee die actual rye

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hoe sal jy die eerste paar, of laaste paar van die rekords display?

A

Df.head(5)
Df.tail(10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

To display the amount of null values per column?

A

Print(df.isnull().sum())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hoe check ek of daar null values is in die dataframe?

A

Df.isnull().sum().sum()

Of df.isnull().values.any()

Eerste sal n nil return, tweede een n false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Om die median van n kolom te stoor in n veranderlike?

A

X = df[‘kolomNaam’].median()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Om all null waardes in n kolom te vervang met n waarde

A

Df[‘kolomNaam’].fillna(waarde,inplace=true)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Om die regressor te hardloop moet jy nou twee mayrikse skep vanaf die oorspronklike? Hoe doen mens dit?

A

Waar
X = df.loc[:,[“kolom1”,”kolom2” ens]]
Belangrik is die dubbel aanhaling
Die aanvanklike dubbelpunt se maar net al die rye
Jy kan ook die “kolomnaam” met n syfer vervang om presies te se wat jy soek

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hoe skep ek n instance van die regressor en fit die data?

A

Regressor = DecisionTreeRegressor(random_state=0)
Regressor= regressor.fit(x,y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

As jy hulp soek?

A

Help(regressor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

As jy nou die regressor gefit het, hoe plot jy die hele decision tree?

A

Plt.figure()
Plot_tree(regressor, feature_names=x.columns)
Plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Hoe plot jy bv net die eerste 3 layers van die decisiontree?

A

Plt.figure(figsize=[5,5], dpi=100)
Plot_tree(regressor, max_depth=3, feature_names=x.columns, impurity=false)
Plt.show()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Hoe sal jy die average bepaal van n kolom, maar met n filter

A

X_mean = df.loc[df[‘rm’]>6][‘medv’].mean()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Count met n filter

A

Presies dieselfde metode, net met n count
Daar is ook n len() en j prop joi mayriks in, dan heen jy die rye

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ons het nou meestal kolom name vir die selection gebruik, hoe kan jy die numeriese waardes gebruik

A

First_three = df.iloc[:,0:3]

17
Q

iloc word gebruik

A

Waar ons integer based referencing het

18
Q

Met loc kan jy condition based slicing doen, maar kolom labels dan in

A