Chapter 5 - Data Governance Key to Unlocking Dark Data Flashcards
What is dark data?
Data that a company has that are not used to acquire knowledge or make decisions because they have not been brought to light.
In the July 2022 survey, what percentage of organizations stated that most of their data is dark data. In addition, what percentage of that data is dark data?
20% of organizations said that 70% of their data was dark data.
Of the data mentioned in the survey. What type of data is it, and why is it an issue?
It is typically unstructured data such as images, videos, and emails, which cannot be easily analyzed as the data does not fit neatly into the database.
Why do company’s collect dark data if they cannot utilize them?
1) To comply with governmental regulations.
2) Fear that they may need this information in the future even if it is not that helpful.
What can dark data be, and where can dark data come from? Who generates them?
Dark data can come from anything, and come from anywhere. Customers, employees, and business processes.
What can dark data be used for? What is the example provided in the book?
They can be potentially useful information that can increase productivity and boost growth. An example if sensors within a building keep track of gas leaks, and have collected some information about it, but because they have not been structured they are in the dark, and thus an accident happens and they realize that if they had detected and used this dark data, then issues could have been mitigated.
What is the IT solution steps to solving dark data?
1) Dark data is instructed and unused, thus the first step is to understand the data available and the issues that may be presented by them.
2) Create connections between the dark data and the good operations data. Doing so creates valuable insights that drive business operations.
3) A company may choose to leave the data where they currently reside, but if they do add a metadata layer to make it easier to search for these data.
What is metadata?
Meta data is data that defines data. It intelligently classifies projects, customers, workflow, status, or some other criteria relating to critical business components. This takes the data out of the dark.