Class One Flashcards
What is data pre-processing in machine learning?
Data pre-processing refers to the techniques and methods used to transform raw data into a clean and meaningful format suitable for machine learning algorithms.
What are the advantages of data pre-processing?
Advantages of data pre-processing include improved data quality, reduced noise and outliers, enhanced model performance, and better interpretability of results.
What are the steps involved in data pre-processing?
The steps in data pre-processing typically include data cleaning, handling missing values, handling outliers, feature scaling, and feature encoding.
What is exploratory data analysis (EDA)?
Exploratory data analysis is the process of analyzing and visualizing data to gain insights, understand the underlying patterns, detect outliers, and make informed decisions about further analysis.
What are the goals of exploratory data analysis?
The goals of exploratory data analysis are to understand the distribution of variables, identify relationships between variables, detect anomalies or outliers, and uncover hidden patterns in the data.
What are some common techniques used in exploratory data analysis?
Some common techniques used in exploratory data analysis include summary statistics, data visualization (e.g., histograms, scatter plots), correlation analysis, and dimensionality reduction.
Why is exploratory data analysis important in machine learning?
Exploratory data analysis helps in understanding the characteristics and structure of the data, identifying data quality issues, selecting appropriate features, and guiding the choice of machine learning models.
When should you use data pre-processing techniques?
Data pre-processing techniques should be used when dealing with raw, noisy, or incomplete data, or when preparing data for machine learning algorithms that have specific requirements
What are some common challenges in data pre-processing?
Common challenges in data pre-processing include handling missing data, dealing with outliers, selecting appropriate feature scaling methods, and determining the best strategy for feature encoding.
How does data pre-processing impact machine learning model performance?
Proper data pre-processing can significantly improve machine learning model performance by reducing noise, removing bias, handling missing values, and ensuring that the data is in a suitable format for the chosen algorithm.
What is Data Science?
The ability to take data - to be able
to understand it, to process it, to
extract value from it, to visualize it,
to communicate it.
What is Precision?
TP/(TP+FP)
What is Recall?
TP/(TP+FN)
What is Accuracy?
(TN+TP) / (TN+FP+FN+TP)
What is F1 Score?
2 ( Precision* Recall)/(Precision+ Recall)