Stages of the data life cycle Flashcards
What are the stages in the data life cycle?
- Plan
- Capture
- Manage
- Analyse
- Archive
- Destroy
Define the Planning stage of the data life cycle
Deciding what kind of data is needed, how it will be managed throughout it’s life cycle, who will be responsible for it, and the optimal outcomes.
Define the Capture phase of the data life cycle
Where data is collected from a variety of different sources and brought into an organisation. The data could be publicly available or from the company’s own database
Define the Manage stage of the data life cycle
How we care for our data, how and where it’s stored, the tools used to keep it safe and secure, and the actions taken to make sure that it’s properly maintained.
Define the Analyse stage of the data life cycle
Data is used to solve problems, make great decisions, and support business goals.
Define the Archive stage of the data life cycle
Keep relevant data stored for long-term and future reference.
Define the Destroy stage of the data life cycle
Safely and securely disposing of data using secure data erasure software and shredding of physical documents to protect the private information of the company and it’s customers.
What is a database?
A collection of data stored in a computer system.
What is a stakeholder?
People who have invested time and resources into a project and are interested in the outcome.
How do you determine stakeholder expectations?
By working out who the stakeholders are, what they want, when they want it, why they want it, and how best to communicate with them.
What does it mean to define a problem?
Looking at the current state and identifying how it’s different from the ideal state.
What is a spreadsheet formula?
A set of instructions that performs a specific calculation using the data in a spreadsheet.
What is a spreadsheet function?
A preset command that automatically performs a specific process or task using the data in a spreadsheet.
What is the difference between a formula and a function?
A formula is a set of instructions, whereas a function is a preset command. Formulas perform a specific calculation. Functions are preset commands that automatically perform a process or task, making it more efficient.
What is an attribute in reference to a spreadsheet?
A characteristic or quality of data used to label a column in a table.
What is an observation in relation to a spreadsheet?
All of the attributes for something contained in a row of a data table.
What is a query?
A request for data or information from a database.
What basic syntax forms every SQL query?
Select: to choose the columns you want to return.
From: to choose the tables where those columns are located.
Where: to filter for certain information.
What is used to separate fields/ variables in a SELECT command?
a comma
What is used to connect conditions in a WHERE command?
the word ‘AND’
What is fairness in data analytics?
Ensuring that your analysis doesn’t create or reinforce bias.
What is self-reporting?
A data collection technique where participants provide information about themselves.
What is oversampling?
The process of increasing the sample size of non-dominant groups in a population.
List five best practices to support fair analysis.
- Consider all of the available data.
- Identify surrounding factors.
- Include self-reported data.
- Use oversampling effectively.
- Think about fairness from beginning to end.