Scitkit learn Flashcards
How do you bring in data from scitkit learn?
from sklearn.datasets import iris
How do you generate random data?
from sklearn.datasets import make_blobs
Whats the notation for make blobs?
X, y = make_blobs(
n_samples=150, n_features=2,
centers=3, cluster_std=0.5,
shuffle=True)
What do X and y represent in make blobs?
X is the data, y are the labels
What are the outputs of make_blobs?
X (the samples), y (the labels), centers
How do you specify the standard dev within clusters
cluster_std =
How do you randomly allocate clusters within the dataset?
shuffle = True
How do you import KMeans?
from sklearn.cluster import KMeans
How do you get the centroid location/
model.cluster_centres_
What is the notation for make blobs?
make_blobs(n_samples=100, n_features=2, *, centers=None, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None, return_centers=False)
For make blobs is it clusters and center_std or centers and cluster_std?
centers and cluster_std
For make blobs is it center or centers
centers
For Kmeans what is the notation?
Means(n_clusters=8, *, init=’k-means++’, n_init=10, max_iter=300, tol=0.0001, precompute_distances=’deprecated’, verbose=0, random_state=None, copy_x=True, n_jobs=’deprecated’, algorithm=’auto’)
Which way around for Kmeans and make blobs?
KMeans = n_clusters
make blobs = centers
How do you import the thing to normalise the data?
from sklearn.preprocessing import StandardScaler
How do you normalise the data?
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = scaler.fit_transform(df1)
How do you import DBSCAN?
from sklearn.cluster import DBSCAN
How do you iterate to find a sensible value of epsilon (without nearest neighbour)?
for epsilon in np.arange(0.1,1,0.1): object = DBSCAN(eps = epsilon) y = object.fit(df1) print(epsilon, np.unique(y.labels_)[-1]) Gives you the number of clusters for a given value of epsilon
Can you apply unique on y.labels_?
No
How do you get the unique labels for labels of dbscan?
np.unique(y.labels_)
Can you do fit_transform on DBSCAN?
No
How do you bring in iris data
from sklearn.datasets import load_iris