Intermediate ML Kaggle Flashcards

1
Q

ускорить (одним словом)

A

to accelerate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

справляться с типами данных

A

tackle data types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

проектировать цепочки обработки для анализа данных

A

design pipelines for data analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Утечка в дата сайнс, когда модель подсматривает правильный ответ

A

leakage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Что такое Pipeline?

Объединяет всю обработку данных и обучение модели в одну цепочку.

A

Pipeline combines all data preprocessing and model training into a single workflow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Валидация и крос валидация

A

In cross-validation, we run our modeling process on different subsets of the data to get multiple measures of model quality.

For example, we could begin by dividing the data into 5 pieces, each 20% of the full dataset. In this case, we say that we have broken the data into 5 “folds”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The most accurate modeling technique for structured data

A

XGBoost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ensemble method

A

We refer to the random forest method as an “ensemble method”. By definition, ensemble methods combine the prediction of several models (e.g., several trees, in the case of random forests).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly