Chapter 5 Flashcards
5.1 Enumerate the 10 points of difficulty managing data.
- Amount - high volumes and the variety of data (big data) being collected increase complexity
- Placement - data are scattered throughout organization
- Its generation - data increases exponentially overtime, new sources of data
- Time - data becomes less current and outdated overtime
- Data Rot - old media, medium degenerates
- The law - legal requirements relating to data (wrt/ data-storage methods or management procedures) (wrt/ data security, quality and integrity) also differ among countries as well as among industries, and they change frequently.
- Lack of unity and cross-departmental cooperation - repetition (redundancy) and conflicts (inconsistency) across the organization’s departments, information systems do not communicate with each other
- Government regulations - ex.: Bill 198
- Unstructured data - companies are drowning in data, much of which are unstructured. The amount of data is increasing exponentially
- Big Data
5.1 What are the multiple sources of data (one of the difficulties of managing data)? Give examples. Hint: IPEN
Internal sources
• Corporate databases, company documents
Personal sources
• Personal thoughts, opinions, experiences
External sources
• Commercial databases, government reports, corporate websites, clickstream data
New sources
• Blogs, Tweets, videos, sensor tags
5.1 What is the main solution to the difficulties of managing data?
Solutions to these difficulties include effective data governance.
5.1 What is data governance?
Data governance is an approach to managing information across an entire organization. It involves a formal set of business processes and policies that are designed to ensure that data are handled in a certain, well-defined fashion.
5.1 What are the objectives of data governance? How do organizations accomplish these? Hint: ATU
- To make information available
- To ensure transparency of information
- To enhance usefulness of information
How?
Using business processes and policies for handling data in a certain well-defined way. Following unambiguous rules to create, collect, handle and protect data.
5.1 What strategy does data governance use to implement sound data governance?
Master Data Management
5.1 What is Master Data Management?
Master data management is a process that spans all of an organization’s business processes and applications.
It provides companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely “single version of the truth” for the company’s master data.
5.1 Why aren’t master data and transactional data the same?
Master data are a set of core data, such as customer, product, employee, vendor, geographic location, and so on, that span the enterprise’s information systems.
Transactional data are generated and captured by operational systems, describe the business’s activities, or transactions.
Master data are applied to multiple transactions, and they are used to categorize, aggregate, and evaluate the transactional data.
5.1 What are the resulting benefits of master data management? Hint: ASF = E
Master data management leads to
- Increasing the accuracy of data
- This helps with streamlining new product entry into the database management system
- Thus, it is a way to facilitate the processing of transactions (e.g. at retail stores)
- In short, we have data that makes us effective – while reaching our goal to serve customers seamlessly.
5.3 What is structured data? What is unstructured data? Give examples.
Structured data fits into predefined fields and can be organized into a spreadsheet or a relational database.
Examples: names, dates, addresses, credit card numbers, etc.
Unstructured data is heterogenous and does not fall within standard fields.
Example: email messages, audio files, Facebook posts, ratings, recommendations.
5.3 Define Big Data.
We refer to the superabundance of data available today as Big Data. Big Data is a collection of data that is so large and complex that it is difficult to manage using traditional database management systems.
Essentially, Big Data is about predictions that come from applying mathematics to huge quantities of data to infer probabilities.
5.3 Where do Big Data come from (sources)?
- traditional enterprise data
- machine-generated/sensor data
- social data
- images captured by billions of devices around the world (digital cameras, camera phones, medical scanners, security cameras)
5.3 What are the three distinct characteristics of Big Data?
Volume + Velocity + Variety
- Volume: We noted the huge volume of Big Data. Consider machine-generated data, which are generated in much larger quantities than nontraditional data. Smart electrical meters, sensors in heavy industrial equipment, and telemetry from automobiles compound the volume problem.
- Velocity: The rate at which data flow into an organization is rapidly increasing. Velocity is critical because it increases the speed of the feedback loop between a company, its customers, its suppliers, and its business partners. Companies can gain a competitive advantage if they can quickly use that information.
- Variety: Traditional data formats tend to be structured and relatively well described, and they change slowly. Traditional data include financial market data, point-of-sale transactions, and much more. In contrast, Big Data formats change rapidly.
5.3 What are the three issues with Big Data?
- Big data can come from untrusted sources (can be sources internal or external to the org., the data can come from an unverified source, reported data itself may be false or misleading)
- Big Data is dirty (inaccurate, incomplete, incorrect, duplicate, or erroneous data, ex.: misspelling of words, duplicate data like retweets)
- Big Data changes, especially in data streams (data quality in an analysis can change, or the data themselves can change because the conditions under which the data are captured can change)
5.3 Name 5 functional areas of the organization where Big Data is used. Hint: HP + OMG
- Human resources (managing benefits to reduce cost, hiring)
- Product development (Ford’s work with auto-enthusiast sites/forums for information on turn indicators)
- Operations (UPS reduced fuel consumption by 32M liters)
- Marketing (used to better understand cx & target mkt efforts → craft more personalized messages)
- Government operations (United Kingdom congestion example)