CPA ISC - S2 M5 Data Life Cycle Flashcards
What are the 8 steps to the data life cycle?
- Definition
- Capture
- Preparation
- Synthesis
- Analytics and Usage
- Publication
- Archival
- Purging
Data Life Cycle: Definition
This is step 1. Definition is defining what data a business needs and where to capture or retrieve such data.
Data Life Cycle: Capture/Creation
This is step 2. Capture/Creation is done by either creating data internally or capturing data from where it has been created externally.
Data Life Cycle: Preparation
This is step 3. Preparation determines whether the data is complete, clean, current, encrypted, and user-friendly.
What are the steps to validate completeness of captured data when data is moved from one location to another?
- Compare the number of records
- Compare descriptive statistics for numeric fields
- Validate that field formats are consistent with the source
- Compare character limits
What are the steps to clean data?
- Remove unnecessary headings or subtotals
- Clean leading zeroes and nonprintable characters
- Format negative numbers consistently
- Identify and correct inconsistencies across data in general
- Address inconsistent data types
Data Life Cycle: Synthesis
This is step 4. Synthesis is a bridge between preparation and usage.
Data Life Cycle: Analytics and Usage
This is step 5. Analytics and usage is when the data is ready for practical use in the organization to create reports and inform decisions.
Data Life Cycle: Publication
This is step 6. Publication is when data can be shared with external users.
Data Life Cycle: Archival
This is step 7. Archival is when data sets are moved from active systems to passive systems for archiving to free up storage resources for the active systems, enhance active system performance, and reduce security risks.
Data Life Cycle: Purging
This is step 8 and the final step. Data is completely removed (purged) from the company’s storage systems (archived and otherwise).
Extract, Transform, and Load (ETL)
ETL is when data already exists. This data must be extracted from its original source, transformed into useful information, and loaded into the tool you choose to use for analysis.
Active Data Collection
Active Data Collection is when you need to intentionally collect new data.
Passive Data Collection
Passive Data Collection is when you gather information without direct permission from the users (such as via cookies, IoT devices, etc.).