Intermediate ML Kaggle Flashcards
ускорить (одним словом)
to accelerate
справляться с типами данных
tackle data types
проектировать цепочки обработки для анализа данных
design pipelines for data analysis
Утечка в дата сайнс, когда модель подсматривает правильный ответ
leakage
Что такое Pipeline?
Объединяет всю обработку данных и обучение модели в одну цепочку.
Pipeline combines all data preprocessing and model training into a single workflow.
Валидация и крос валидация
In cross-validation, we run our modeling process on different subsets of the data to get multiple measures of model quality.
For example, we could begin by dividing the data into 5 pieces, each 20% of the full dataset. In this case, we say that we have broken the data into 5 “folds”
The most accurate modeling technique for structured data
XGBoost
ensemble method
We refer to the random forest method as an “ensemble method”. By definition, ensemble methods combine the prediction of several models (e.g., several trees, in the case of random forests).