Data Standards Flashcards
Data stewardship
Careful, responsible management of something entrusted to one’s care (in this case data) on behalf of others
How can data stewardship be enforced?
Assigning people responsible for deciding/acting on how data is stored/accessed
Data governance policy
Determines how an organisation collects/stores/uses data
Data governance policies should…
Comply with relevant laws and cover the entire life cycle of data (from collection to deletion)
Four pillars of data governance
- Stewardship
- Quality
- Management
- Use cases
Why is data governance important?
It helps to prevent data breaches, where sensitive data can be leaked and used for things like blackmail or identity theft
General Data Protection Regulation (GDPR)
- You should collect the minimum amount of data needed
- You should only collect relevant data
- Steps should be taken to protect data and report breaches
- Data should be retained for the shortest time possible
- Relevant people can request access to their data and request their data is deleted
True or false: Anonymous data is not protected under GDPR
True! Data that can’t be linked to a person isn’t protected. However, pseudonymous data is as it can be reverse engineered to identify someone.
True or false: Under the GDPR, organisations cannot share your data with third parties for any reason
False! Data can be shared with other organisations in certain circumstances (e.g. for a criminal investigation).
What makes data valuable?
o Relevance of the data
o Correctness of the data
o Potential to make money
What costs can come with data?
o Storing and retrieving data o Ensuring the data is appropriately protected o Hardware and software costs o Staff costs o Legal costs
Thematic content analysis
Categorising data based on themes
Data versioning
Any changes made to data should be recorded and the original copy retained
How can data quality be improved?
o Thematic content analysis
o Merging data sources
o Recording relevant metadata
Information life cycle
- Tier 1 - Peak value, should be processed and interpreted to maximise value
- Tier 2 - New, unprocessed data, or older data that may not be as relevant
- Tier 3 - Old data that is to be archived and is unlikely to be useful