Big Data Projects Flashcards
Steps in Big Data Analysis/Projects: Traditional with strucutred data.
**Conceptualize the task **-> Collect data -> Data Preperation & processing -> Data Exploration -> Model traning.
Steps in Big Data Analysis/Projects: Textual Bid Data.
Text probelm formulation -> Data Curation ->** Text preperation and processing** -> Text exploration -> Classifier output.
Preperation in strucutred data: Extraction
Creating a new variable from an already existing one for easing the analysis.
Example: Date of birth -> Age
Preperation in strucutred data: Aggregation
2 or more variables aggregated into one signle variable.
Preperation in strucutred data: Filtration
Eliminate data rows which are not needed.
[We filter out the information that is not relevant]
CFA Lv 2 Candidates only
Preperation in strucutred data: Selection
Columns that can be eliminated
Preperation in strucutred data: Conversion
Nominal, ordinal, integer, ratio, categorical.
Cleansing strucutred data: Incomplete
Missing entries
Cleansing strucutred data: Invalid
Outside a meaningful range
Cleansing strucutred data: Inconsistent
Some data conflicts with other data.
Cleansing strucutred data: Inaccurate
Not a true value
Cleansing strucutred data: non-uniform
Non identical data format
American date (M/D/Y) vs European (D/M/Y)
Cleansing strucutred data: Duplication
Multiple identical observation
Adjusting the range of a feature: Normalization
Rescales in the rage 0-1
Sensitive to outliers.
Xi- Xmin /(Range)
Xi- Xmin /(Xmax -Xmin)
Adjusting the range of a feature: Standardization
Centers and Rescales
Requiers normal distribution
(Xi - u) / Standard deviation