Big Data Projects Flashcards
Identify asteps in a data analysis project.
Conceptualization of the modeling task.
Data Collecor
Data preparation and wrangling.
Data exploration.
Model training.
Conceptualization of the modeling task.
define the problem at hand
Data preparation and wrangling
cleaning the data set and preparing it for the model
Data exploration.
feature selection and engineering and initial data analysis
Model training
determining the appropriate ML algorithm to use, evaluating the algorithm using a training data set, and tuning the model.
steps of preparing and wrangling data.
critical step involves cleansing and organizing raw data for use in a model
Data cleansing
deals with reducing errors in the raw data
Data wrangling
involves preprocessing data for model use
Preprocessing includes
data transformation and scaling.
scaling
Conversion of data features to a common unit of measurement
Two common methods of scaling
normalization and standardization.
Normalization scales
normalized X=(X−Xmin)/(Xmax−Xmin)
Standardization Scales
standardized Xi=Xi−μσ
Text processing
cleansing and preprocessing of text-based data.
Text cleansing involves the following steps:
Remove HTML tags.
Remove punctuations.
Remove numbers.
Remove white spaces.
Cleansed text is then normalized using the following steps:
Lowercasing.
Removal of stop words
Stemming.
Lemmatization.
Stemming
converts all variations of a word into a common value
Lemmatization
conversion of inflected forms of a word into its lemma
tokenization
is the process of splitting a sentence into tokens
Data exploration
seeks to evaluate the data set and determine the most appropriate way to configure it for model training
Steps in data exploration include the following:
Exploratory data analysis (EDA)
Feature selection.
Feature engineering
Exploratory data analysis (EDA)
involves looking at data descriptors
Feature selection
is a process to select only the needed attributes of the data for ML model training.
Feature engineering
is the process of creating new features by transforming