Big Data Projects Flashcards
Identify asteps in a data analysis project.
Conceptualization of the modeling task.
Data Collecor
Data preparation and wrangling.
Data exploration.
Model training.
Conceptualization of the modeling task.
define the problem at hand
Data preparation and wrangling
cleaning the data set and preparing it for the model
Data exploration.
feature selection and engineering and initial data analysis
Model training
determining the appropriate ML algorithm to use, evaluating the algorithm using a training data set, and tuning the model.
steps of preparing and wrangling data.
critical step involves cleansing and organizing raw data for use in a model
Data cleansing
deals with reducing errors in the raw data
Data wrangling
involves preprocessing data for model use
Preprocessing includes
data transformation and scaling.
scaling
Conversion of data features to a common unit of measurement
Two common methods of scaling
normalization and standardization.
Normalization scales
normalized X=(X−Xmin)/(Xmax−Xmin)
Standardization Scales
standardized Xi=Xi−μσ
Text processing
cleansing and preprocessing of text-based data.
Text cleansing involves the following steps:
Remove HTML tags.
Remove punctuations.
Remove numbers.
Remove white spaces.
Cleansed text is then normalized using the following steps:
Lowercasing.
Removal of stop words
Stemming.
Lemmatization.