201 - 250 Flashcards

1
Q

pandas.DataFrame.boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, backend=None, **kwargs)

A

Make a box plot from DataFrame columns.

data[['GrLivArea']].boxplot()
np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4), columns=['Col1', 'Col2', 'Col3', 'Col4'])
boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3'])
df = pd.DataFrame(np.random.randn(10, 2), columns=['Col1', 'Col2'])
df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'])
boxplot = df.boxplot(by='X')
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

numpy.nan

A

NaNs can be used as a poor man’s mask (if you don’t care what the original value was)

myarr = np.array([1., 0., np.nan, 3.])
np.nonzero(myarr == np.nan)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

numpy.logical_xor(x1, x2, /, out=None, *, where=True, casting=’same_kind’, order=’K’, dtype=None, subok=True[, signature, extobj])

A

Compute the truth value of x1 XOR x2, element-wise.

np.logical_xor(True, False)
👉 True
np.logical_xor([True, True, False, False], [True, False, True, False])
👉 array([False,  True,  True, False])
arr1 = [8, 2, False, 4]
arr2 = [3, 0, False, False]
out_arr = np.logical_xor(arr1, arr2)
👉 [False  True False  True]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

sklearn.base.Transformer

A

MixinMixin class for all transformers in scikit-learn. Самый простой способ создать собственный трансформатор - это импортировать FunctionTransformer из sklearn.preprocessing.

from sklearn.base import BaseEstimator, TransformerMixin

class That(BaseEstimator, TransformerMixin):
    def \_\_init\_\_(self, this = True):
        self.this = this
    def fit(self, X, y=None):
        return self
    def transform(self, X, y=None):
        that = self.this
        return that
transformer = That(this=False)
my_data = transformer.transform(my_data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

scipy.stats.uniform(x : array_like, q : array_like, loc : array_like - optional, scale :
array_like, optional)

A

A uniform continuous random variable. In the standard form, the distribution is uniform on [0, 1].

from scipy.stats import uniform

mean, var, skew, kurt = uniform.stats(moments='mvsk')
x = np.linspace(uniform.ppf(0.01), uniform.ppf(0.99), 100)
ax.plot(x, uniform.pdf(x), 'r-', lw=5, alpha=0.6, label='uniform pdf')
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

sklearn.datasets.make_blobs(n_samples=100, n_features=2, *, centers=None, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None, return_centers=False)

A

Generate isotropic Gaussian blobs for clustering. Модуль для генерации данных.

from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=10, centers=3, n_features=2, random_state=0)

print(X.shape)
👉 (10, 2)

y
👉 array([0, 0, 1, 0, 2, 2, 2, 1, 1, 0])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sklearn.compose.make_column_selector(pattern=None, *, dtype_include=None, dtype_exclude=None)

A

Create a callable to select columns to be used with ColumnTransformer. Can select columns based on datatype or the columns name with a regex.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

enumerate(iterable, start=0)

A

function takes a collection (e.g. a tuple) and returns it as an enumerate object. Покажет порядковые номер элементов.

x = ('apple', 'banana', 'cherry')
y = enumerate(x)
print(list(y))
👉 [(0, 'apple'), (1, 'banana'), (2, 'cherry')]
l1 = ["eat", "sleep", "repeat"]
s1 = "geek"
  
#changing start index to 2 from 0
print (list(enumerate(s1, 2)))

👉 [(2, 'g'), (3, 'e'), (4, 'e'), (5, 'k')]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

iter()

A

function returns an iterator object. Итератор (iterator) - это объект, который возвращает свои элементы по одному за раз.

x = iter(["apple", "banana", "cherry"])
print(next(x))
👉 apple

print(next(x))
👉 banana
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sklearn.compose.make_column_transformer(*transformers, remainder=’drop’, sparse_threshold=0.3, n_jobs=None, verbose=False, verbose_feature_names_out=True)

A

Construct a ColumnTransformer from the given transformers.

from sklearn.compose import make_column_transformer
make_column_transformer(
    (StandardScaler(), ['numerical_column']),
    (OneHotEncoder(), ['categorical_column']))

ColumnTransformer(transformers=[('standardscaler', StandardScaler(...),
                                 ['numerical_column']),
                                ('onehotencoder', OneHotEncoder(...),
                                 ['categorical_column'])])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

sklearn.pipeline.make_union(*transformers, n_jobs=None, verbose=False)

A

Construct a FeatureUnion from the given transformers.

from sklearn.decomposition import PCA, TruncatedSVD
from sklearn.pipeline import make_union
make_union(PCA(), TruncatedSVD())

FeatureUnion(transformer_list=[('pca', PCA()), ('truncatedsvd', TruncatedSVD())])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

sklearn.feature_selection.mutual_info_regression(X, y, *,discrete_features=’auto’, n_neighbors=3, copy=True, random_state=None)

A

Оцените взаимную информацию для дискретной целевой переменной.
Взаимная информация (MI) [1] между двумя случайными переменными является неотрицательным значением, которое измеряет зависимость между переменными.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

numpy.random.uniform(low=0.0, high=1.0, size=None)

A

Помогает нам, получая случайные выборки из равномерного распределения данных. Затем он возвращает случайные выборки в виде массива NumPy.

np.random.uniform(2, 8, (2, 10))
👉 array([[ 3.1517914 ,  3.10313483,  2.84007134,  3.21556436,  4.64531786,
            2.99232714,  7.03064897,  4.38691765,  5.27488548,  2.63472454],
          [ 6.39470358,  5.63084131,  4.69996748,  7.07260546,  7.44340813,
            4.10722203,  7.52956646,  4.8596943 ,  3.97923973,  5.64505363]])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

numpy.random.RandomState(seed=None)

A

Container for the Mersenne Twister pseudo-random number generator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

matplotlib.quiver(*args, data=None, **kwargs)

A

помогает отображать векторы скорости в виде стрелок с компонентами (u, v) в точках (x, y).

fig,.subplots().quiver(x,y,u,v)
plt.show()
U = [[1, 1, 1, 1], [-2, -2, -2, -2], [3, 3, 3, 3], [-3, -3, -3, -3]]
V = [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]

fig, ax = plt.subplots()

ax. quiver(U, V)
fig. set_figwidth(8)
fig. set_figheight(8)
plt. show()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

matplotlib.tight_layout(*, pad=1.08, h_pad=None, w_pad=None, rect=None)

A

Adjust the padding between and around subplots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

pandas.DataFrame.shift(periods=1, freq=None, axis=0, fill_value=NoDefault.no_default)

A

Shift index by desired number of periods with an optional time freq.

df = pd.DataFrame({"Col1": [10, 20, 15, 30, "Col2": [13, 23, 18, 33, 48], "Col3": [17, 27, 22, 37, 52]},
index=pd.date_range("2020-01-01", "2020-01-05"))

df.shift(periods=3)
            Col1  Col2  Col3
2020-01-01   NaN   NaN   NaN
2020-01-02   NaN   NaN   NaN
2020-01-03   NaN   NaN   NaN
2020-01-04  10.0  13.0  17.0
2020-01-05  20.0  23.0  27.0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

pandas.DataFrame.diff(periods=1, axis=0)

A

First discrete difference of element. Calculates the difference of a Dataframe element compared with another element in the Dataframe (default is an element in the previous row).

df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
                                'b': [1, 1, 2, 3, 5, 8],
                                'c': [1, 4, 9, 16, 25, 36]})
df.diff()
     a    b     c
0  NaN  NaN   NaN
1  1.0  0.0   3.0
2  1.0  1.0   5.0
3  1.0  1.0   7.0
4  1.0  2.0   9.0
5  1.0  3.0  11.0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

numpy.convolve(a, v, mode=’full’)

A

Возвращает дискретную, линейную свертку двух одномерных
последовательностей.

a = np.array([0, 1, 2, 3, 2, 1, 0])
b = np.array([1, 2, 2.7, 2.9, 1, 2, 2.7])
np.convolve(a, b)
👉 array([ 0. ,  1. ,  4. ,  9.7, 16.3, 19.9, 20.1, 18.2, 16.3, 13.1,  7.4, 2.7,  0. ])
a = np.array([0, 1, 2, 3, 2, 1, 0])
b = np.array([1, 2, 2.7, 2.9, 1, 2, 2.7])
np.convolve(a, b, mode = 'same')
👉 array([ 9.7, 16.3, 19.9, 20.1, 18.2, 16.3, 13.1])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

matplotlib.subplot(*args, **kwargs)

A

Add an Axes to the current figure or retrieve an existing Axes. The function you can draw multiple plots in one figure.

x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])

plt. subplot(1, 2, 1)
plt. plot(x,y)

x = np.array([0, 1, 2, 3])
y = np.array([10, 20, 30, 40])

plt. subplot(1, 2, 2)
plt. plot(x,y)
plt. show()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

numpy.asarray(a, dtype=None, order=None, *, like=None)

A

Convert the input to an array.

a = [1, 2]
np.asarray(a)
👉 array([1, 2])
a = np.array([1, 2], dtype=np.float32)

np.asarray(a, dtype=np.float32) is a
👉 True

np.asarray(a, dtype=np.float64) is a
👉 False
21
Q

sklearn.pipeline.make_pipeline(*steps, memory=None, verbose=False)

A

Construct a Pipeline from the given estimators.

from sklearn.pipeline import make_pipeline

make_pipeline(StandardScaler(), GaussianNB(priors=None))
Pipeline(steps=[('standardscaler', StandardScaler()), ('gaussiannb', GaussianNB())])
22
Q

ipywidgets.interact

A

Automatically creates user interface (UI) controls for exploring code and data interactively. It is the easiest way to get started using IPython’s widgets.

def say_something(x):
    print(f'Widget says: {x}')

widgets.interact(say_something, x=[0, 1, 2, 3])
widgets.interact(say_something, x=(0, 10, 1))
widgets.interact(say_something, x=(0, 10, .5))
_ = widgets.interact(say_something, x=True)
22
Q

sklearn.metrics.classification_report(y_true, y_pred, *, labels=None, target_names=None, sample_weight=None, digits=2, output_dict=False, zero_division=’warn’)

A

Build a text report showing the main classification metrics.

from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
23
Q

sklearn.tree.export_graphviz(decision_tree, out_file=None, *, max_depth=None, feature_names=None, class_names=None, label=’all’, filled=False, leaves_parallel=False, impurity=True, node_ids=False, proportion=False, rotate=False, rounded=False, special_characters=False, precision=3,
fontname=’helvetica’)

A

Визуализировать деревья решений. Эта функция генерирует GraphViz - представление дерева решений, которое затем записывается в out_file. После экспорта графические изображения могут быть созданы.

clf = DecisionTreeClassifier(max_depth = 2, random_state = 0)
clf.fit(X_train, Y_train)

tree.plot_tree(clf);
n=['sepal length (cm)','sepal width (cm)','petal length (cm)','petal width (cm)']
cn=['setosa', 'versicolor', 'virginica']
fig, axes = plt.subplots(nrows = 1,ncols = 1,figsize = (4,4), dpi=300)
tree.plot_tree(clf, feature_names = fn, class_names=cn, filled = True);
fig.savefig('imagename.png')

ree.export_graphviz(clf, out_file="tree.dot", feature_names = fn, class_names=cn, filled = True)
24
Q

sklearn.ensemble.BaggingClassifier(base_estimator=None, n_estimators=10, *, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)

A

Висящий как мешок классификатор является метасредством оценки ансамбля, которое соответствует основным классификаторам каждый на случайных подмножествах исходного набора данных, и затем агрегируйте их отдельные прогнозы для формирования заключительного прогноза.

👨‍💻👨‍💻👨‍💻👨‍💻👨‍💻👨‍💻👨‍💻👨‍💻👨‍💻👨‍💻
from sklearn.ensemble import BaggingClassifier
X, y = make_classification(n_samples=100, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)

clf = BaggingClassifier(base_estimator=SVC(), n_estimators=10, random_state=0).fit(X, y)

clf.predict([[0, 0, 0, 0]])
👉 array([1])

25
Q

matplotlib.hlines(y, xmin, xmax, colors=None, linestyles=’solid’, label=’’, *, data=None, **kwargs)

A

Plot horizontal lines at each y from xmin to xmax.

plt. hlines(y = 1, xmin = 1, xmax = 4)
plt. hlines(y = 1.6, xmin = 1.5, xmax = 4.5)
plt. hlines(y = 2, xmin = 2, xmax = 5)
plt. hlines(y = 1, xmin = 1, xmax = 4, label ="black line")
plt. hlines(y = 1.6, xmin = 1.5, xmax = 4.5, color ='r')
plt. text(1, 1.6, 'Red line', ha ='left', va ='center')
plt. hlines(y = 2, xmin = 2, xmax = 5)
26
Q

numpy.apply_along_axis(func1d, axis, arr, *args, **kwargs)

A

Apply a function to 1-D slices along the given axis.

b = np.array([[8,1,7], [4,3,9], [5,2,6]])
np.apply_along_axis(sorted, 1, b)

👉 array([[1, 7, 8],
          [3, 4, 9],
          [2, 5, 6]])
27
Q

numpy.ndarray.astype(dtype, order=’K’, casting=’unsafe’, subok=True, copy=True)

A

Copy of the array, cast to a specified type.

x = np.array([1, 2, 2.5])
x.astype(int)
👉 array([1, 2, 2])
28
Q

numpy.cumsum(a, axis=None, dtype=None, out=None)

A

Return the cumulative sum of the elements along a given axis.

a = np.array([[1,2,3], [4,5,6]])

np.cumsum(a)
👉 array([ 1,  3,  6, 10, 15, 21])

np.cumsum(a, dtype=float)     # specifies type of output value(s)
👉 array([  1.,   3.,   6.,  1 0.,  15.,  21.])
29
Q

numpy.linalg.eig(a)

A

Compute the eigenvalues and right eigenvectors of a square array.

from numpy import linalg as LA
w, v = LA.eig(np.diag((1, 2, 3)))

👉 array([1., 2., 3.])
👉 array([[1., 0., 0.],
         [0., 1., 0.],
         [0., 0., 1.]])
Real matrix possessing comple
30
Q

sklearn.model_selection.cross_val_score(estimator, X, y=None, , groups=None, scoring=None, cv=None, n_jobs=None, verbose=0, fit_params=None, pre_dispatch=’2n_jobs’, error_score=nan)

A

Проверить стабильность модели на тренировочном наборе и вывести k с точностью прогнозирования. Начальная обучающая выборка делится на k частей, из которых (k-1) части используются в качестве обучающего набора, а оставшаяся часть используется в качестве оценочного набора.

from sklearn.model_selection import cross_val_score
clf = sklearn.linear_model.LogisticRegression()
cross_val_score(clf, X, y, cv=10)
clf = svm.SVC(kernel='linear', C=1, random_state=42)
scores = cross_val_score(clf, X, y, cv=5)
👉 array([0.96..., 1. , 0.96..., 0.96..., 1. ])
31
Q

sklearn.set_config(assume_finite=None, working_memory=None, print_changed_only=None, display=None, pairwise_dist_chunk_size=None, enable_cython_pairwise_dist=None)

A

Установить глобальную научную конфигурацию.

sklearn.set_config(display='diagram')
from sklearn import set_config; set_config(display = "diagram")
32
Q

statsmodels.tsa.arima_process.ArmaProcess(ar=None, ma=None, nobs=100)

A

Теоретические свойства ARMA-процесса для заданных лаг-полиномов. Анализ временных рядов и прогнозирование.

np.random.seed(12345)
arparams = np.array([.75, -.25])
maparams = np.array([.65, .35])
ar = np.r_[1, -ar] # добавить нулевую задержку и отрицать
ma = np.r_[1, ma] # добавить нулевую задержку
arma_process = sm.tsa.ArmaProcess(ar, ma)

arma_process.isstationary
👉 True

arma_process.isinvertible
👉 True
32
Q

statsmodels.graphics.tsaplots.plot_acf and plot_pacf(x, ax=None, lags=None, *, alpha=0.05, use_vlines=True, adjusted=False, fft=False, missing=’none’, title=’Autocorrelation’, zero=True, auto_ylims=False, bartlett_confint=True, vlines_kwargs=None, **kwargs)

A

Plot the autocorrelation function. Plots lags on the horizontal and the correlations on the vertical axis.

import statsmodels.api as sm

dta = sm.datasets.sunspots.load_pandas().data
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('1700', '2008'))
del dta["YEAR"]
sm.graphics.tsa.plot_acf(dta.values.squeeze(), lags=40)
plt.show()
33
Q

statsmodels.tsa.stattools.pacf(x, nlags=None, method=’ywadjusted’, alpha=None)

A

Частичная автокорреляция оценка.

34
Q

statsmodels.tsa.stattools.acf(x, adjusted=False, nlags=None, qstat=False, fft=True, alpha=None, bartlett_confint=True, missing=’none’)

A

Функция автокорреляции для 1d массивов.

35
Q

statsmodels.tsa.stattools.adfuller(x, maxlag=None, regression=’c’, autolag=’AIC’, store=False, regresults=False)

A

ADF — способ проверить, является ли последовательность гладкой, проверяет null hypothesis(Suggests that there is no relationship between the two variables.). Как правило, это тестовый метод во временных рядах.

adfuller(result_add.resid.dropna())[1]
👉 0.0002852221054737700
adfuller(result_mul.resid.dropna())[1])
👉 1.747259579533223e-07
36
Q

sklearn.cluster.MiniBatchKMeans(n_clusters=8, *, init=’k-means++’, max_iter=100, batch_size=1024, verbose=0, compute_labels=True, random_state=None, tol=0.0, max_no_improvement=10, init_size=None, n_init=3, reassignment_ratio=0.01)

A

Самой быстрой кластеризации текста. Кластеризация – это представление множества объектов в виде векторов, а далее разделение их на разные группы по степени схожести друг с другом

from sklearn.cluster import MiniBatchKMeans

X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 0], [4, 4], [4, 5], [0, 1], [2, 2], [3, 2], [5, 5], [1, -1]])

kmeans = MiniBatchKMeans(n_clusters=2, andom_state=0, batch_size=6)
kmeans = kmeans.partial_fit(X[0:6,:])
kmeans = kmeans.partial_fit(X[6:12,:])

kmeans.cluster_centers_
👉 array([[2. , 1. ], [3.5, 4.5]])

kmeans.predict([[0, 0], [4, 4]])
👉 array([0, 1], dtype=int32)
37
Q

scipy.cluster.hierarchy.dendrogram(Z, p=30, truncate_mode=None, color_threshold=None, get_leaves=True, orientation=’top’, labels=None, count_sort=False, distance_sort=False, show_leaf_counts=True, no_plot=False, no_labels=False, leaf_font_size=None, leaf_rotation=None, leaf_label_func=None, show_contracted=False, link_color_func=None, ax=None, above_threshold_color=’C0’)

A

Plot the hierarchical clustering as a dendrogram. The dendrogram illustrates how each cluster is composed by drawing a U-shaped link between a non-singleton cluster and its children.

38
Q

numpy.exp(x, /, out=None, *, where=True, casting=’same_kind’, order=’K’, dtype=None, subok=True[, signature, extobj])

A

Calculate the exponential of all elements in the input array.

np_array_1d = np.array([0,1,2,3,4])
👉 array([ 1. ,  2.71828183,  7.3890561 , 20.08553692, 54.59815003])
39
Q

pmdarima.arima.auto_arima(y, X=None, start_p=2, d=None, start_q=2, max_p=5, max_d=2, max_q=5, start_P=1, D=None, start_Q=1, max_P=2, max_D=1, max_Q=2, max_order=5, m=1, seasonal=True, stationary=False, information_criterion=’aic’, alpha=0.05, test=’kpss’, seasonal_test=’ocsb’,
stepwise=True, n_jobs=1, start_params=None, trend=None, method=’lbfgs’, maxiter=50, offset_test_args=None, seasonal_test_args=None, suppress_warnings=True, error_action=’trace’, trace=False, random=False, random_state=None, n_fits=10, return_valid_fits=False, out_of_sample_size=0, scoring=’mse’, scoring_args=None, with_intercept=’auto’, sarimax_kwargs=None, **fit_args)

A

Создана для поиска оптимального порядка и оптимального сезонного порядка на основе определенного критерия, такого как AIC, BIC и т. д.

import pmdarima as pm
smodel = pm.auto_arima(df['linearized'], start_p=1, max_p=2, start_q=1, max_q=2, trend='t', seasonal=False, trace=True)
40
Q

sklearn.feature_extraction.text.TfidfVectorizer(*, input=’content’, encoding=’utf-8’,
decode_error=’strict’, strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, analyzer=’word’, stop_words=None, token_pattern=’(?u)\b\w\w+\b’, ngram_range=(1, 1), max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=, norm=’l2’, use_idf=True, smooth_idf=True, sublinear_tf=False)

A

преобразует документ матрицу состоящую из количества разных слов в документе.

from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ['This is the first document.', 'This document is the second document.',
'And this is the third one.', 'Is this the first document?',]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
vectorizer.get_feature_names_out()

array(['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this'], ...)
print(X.shape)
👉 (4, 9)
41
Q

sklearn.feature_extraction.text.CountVectorizer(*, input=’content’, encoding=’utf-8’, decode_error=’strict’, strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, stop_words=None, token_pattern=’(?u)\b\w\w+\b’, ngram_range=(1, 1), analyzer=’word’, max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=)

A

Преобразование коллекции текстовых документов в матрицу подсчета токенов (Подсчитайте количество слов, Т. Е. Тф).

from sklearn.feature_extraction.text import CountVectorizer

texts=["dog cat fish","dog cat cat","fish bird", 'bird']
cv = CountVectorizer()
cv_fit=cv.fit_transform(texts)

print((cv.get_feature_names()))
print (cv.vocabulary_) # index
print((cv_fit.toarray()))
42
Q

statsmodels.tsa.seasonal.seasonal_decompose(x, model=’additive’, filt=None, period=None, two_sided=True, extrapolate_trend=0)

A

Сезонное разложение с использованием скользящих средних.

result_mul = seasonal_decompose(df['value'], model='multiplicative', extrapolate_trend='freq')
result_add = seasonal_decompose(df['value'], model='additive', extrapolate_trend='freq')
43
Q

statsmodels.tsa.statespace.sarimax.SARIMAX(endog, exog=None, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, measurement_error=False, time_varying_regression=False, mle_regression=True, simple_differencing=False, enforce_stationarity=True, enforce_invertibility=True, hamilton_representation=False, concentrate_scale=False, trend_offset=1, use_exact_diffuse=False, dates=None, freq=None, missing=’none’, validate_specification=True, **kwargs)

A

Сезонное авторегрессивное интегрированное скользящее среднее с моделью экзогенных регрессоров.

from statsmodels.tsa.statespace.sarimax import SARIMAX

sarima = SARIMAX(train, order=(0, 1, 1), seasonal_order=(2, 0, 2, 12))
sarima = sarima.fit(maxiter=75)

#Forecast
results = sarima.get_forecast(len(test), alpha=0.05)
forecast = results.predicted_mean
confidence_int = results.conf_int()
44
Q

sklearn.model_selection.TimeSeriesSplit(n_splits=5, *, max_train_size=None, test_size=None, gap=0)

A

Кросс-валидатор временных рядов. Предоставляет индексы поездов/тестов для разделения образцов данных временных рядов, которые наблюдаются через фиксированные промежутки времени, в наборах поездов/тестов. В каждом разделе тестовые индексы должны быть выше, чем раньше, и поэтому перестановка в перекрестном валидаторе неуместна.

from sklearn.model_selection import TimeSeriesSplit
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4, 5, 6])
tscv = TimeSeriesSplit()
print(tscv)
TimeSeriesSplit(gap=0, max_train_size=None, n_splits=5, test_size=None)
for train_index, test_index in tscv.split(X):
....print("TRAIN:", train_index, "TEST:", test_index)
....X_train, X_test = X[train_index], X[test_index]
....y_train, y_test = y[train_index], y[test_index]

TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]
TRAIN: [0 1 2 3] TEST: [4]
TRAIN: [0 1 2 3 4] TEST: [5]
45
Q

statsmodels.tsa.arima_model.ARIMAResults.plot_predict(start=None, end=None, exog=None, dynamic=False, alpha=0.05, plot_insample=True, ax=None)

A

Прогнозы участков

import statsmodels.api as sm

dta = sm.datasets.sunspots.load_pandas().data[['SUNACTIVITY']]
dta.index = pd.DatetimeIndex(start='1700', end='2009', freq='A')
res = sm.tsa.ARMA(dta, (3, 0)).fit()
fig, ax = plt.subplots()
ax = dta.loc['1950':].plot(ax=ax)
fig = res.plot_predict('1990', '2012', dynamic=True, ax=ax, plot_insample=False)

plt.show()
46
Q

Стационарная серия

A

это серия, где свойства не меняются со временем.

47
Q

statsmodels.tsa.arima.model.ARIMA(endog, exog=None, order=(0, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, enforce_stationarity=True, enforce_invertibility=True, concentrate_scale=False, trend_offset=1, dates=None, freq=None, missing=’none’, validate_specification=True)

A

одним из популярных и мощных алгоритмов временных серий для анализа и прогнозирования данных временных рядов. Прошедшие значения, используемые для прогнозирования следующего значения.

from statsmodels.tsa.arima.model import ARIMA
arima = ARIMA(df['linearized'], order=(2, 1, 1), trend='t')
arima = arima.fit()