M6S1 Flashcards
Process of converting raw data into a usable form
Data Wrangling
Also called as data mugging or data remediation.
Data Wrangling
Data Wrangling Steps
- Discovery
- Transformation
- Validation
- Publishing
Data examination for next step
Discovery
Data structuring, normalization, & denormalization, data cleaning, data enriching
Transformation
verifying data consistency, sufficient quality, and secure
Validation
Preferred format for sharing with teammates.
Publishing
Library that provides several functions for loading dataset.
Pandas
Data structure that holds input datasets
DataFrame
Comma-Separated Values (CSV)
- Tabular Format
- Very lightweight
- May not be easy to read visually
CSV to DataFrame
pandas.read_csv(filename.csv)
What reads the text file into chunks?
pandas.io.parsers.TextFileReader
Load and read Excel file
- pandas.io.excel._base.ExcelFile
- pandas.ExcelFile(‘<Excel>')</Excel>
- pandas.read_excel(<ExcelFile>, '<sheet>')</sheet></ExcelFile>
Directly Read Excel File
- pandas.read_excel(‘<Excel>', '<sheet>')</sheet></Excel>