Lecture 4 Flashcards
Data Wrangling
the process of cleaning, structuring, and transforming raw data into a format suitable for analysis.
Aspects of Data Wrangling
(2CTIR)
- Data Collection
- Data Cleaning
- Data Transformation
- Data Integration
- Data Reduction
Data Collection
process of gathering data from various sources
Data Cleaning
address missing values, duplicates, and inconsistencies in the dataset
Data Tranformation
Convert data types, handle categorical variables, and normalize numerical features.
Data Integration
Combine data from multiple sources or tables if necessary
Data Reduction
Reduce the dataset’s dimensionality through techniques like feature selection or extraction.
Data Wrangling vs Data cleaning
Data Wrangling: Data wrangling refers to the broader process of collecting, cleaning, and transforming raw data into a format suitable for analysis. I
Data Cleaning: a subset of data wrangling, a process of cleaning the data by handling missing values, duplicates and inconsistencies in the dataset