Information Management… Data Quality Flashcards
Why is data quality Important( benefits and consequences of good and bad data)
Benefits:
-Productivity
- compliance
- decision making
Consequences:
- costs
- unnecessary risks
- automated processes and AI
What’s the cost of data issues
. Data related problems cost companies millions every year
- reputational damage
- lost revenue
- waste of resources
- fines for poor compliance
- missed opportunities
What are the issues with automation and AI, machine learning
Increasing reliance on automated processes/ decision making
Data dependency
Opacity of data and algorithms leads or lack of awareness that the outcome/ recommendation is wrong
What is data quality? 5 dimensions
A variety or qualities over 5 main dimensions
1. Completeness
Is truth about the object present in the record?
Accuracy, availability, presence, quality
2. Correctness
Is an element that is present in the record true?
Accuracy, errors, misleading, validity
3. Concordance
Is there agreement between elements in the record, or between records in different data sources
Agreement, consistency,reliability, variation
4. Plausibility
Does the data make sense in light of other knowledge about the object the data represent
Accuracy, believability, validity
5. Currency
Is an element in the record a relevant representation of the object state at a given point in the times
Recency, timeliness
What are factors that contribute to poor data quality
- IT infrastructure e.g. parallel systems, old legacy systems
- changes in data format
- Processes/ workflows, e.g. workarounds in use of data fields
- weak control systems
What is the first law of informatics
Data shall be used only for the purpose for which they were collected.
If no purpose was defined pekoe to the collection of data, then the data should not be used
Why is data context-dependant
Data generated and used in a specific context for a specific purpose has a specific meaning…. Meaning of the data piece may then differ across contexts
Differences across organisations
Implications for merging of data sets
E.g same name for different items within an organisation could cause mismatches and inefficiency. Errors. E.g. meaning of ‘train in railway business means different things to different people in the business for example the passenger, regulator, operations planner, operates , staff planning and infrastructure
Give example or