Chapter 5 Flashcards

1
Q

5.1 Enumerate the 10 points of difficulty managing data.

A
  1. Amount - high volumes and the variety of data (big data) being collected increase complexity
  2. Placement - data are scattered throughout organization
  3. Its generation - data increases exponentially overtime, new sources of data
  4. Time - data becomes less current and outdated overtime
  5. Data Rot - old media, medium degenerates
  6. The law - legal requirements relating to data (wrt/ data-storage methods or management procedures) (wrt/ data security, quality and integrity) also differ among countries as well as among industries, and they change frequently.
  7. Lack of unity and cross-departmental cooperation - repetition (redundancy) and conflicts (inconsistency) across the organization’s departments, information systems do not communicate with each other
  8. Government regulations - ex.: Bill 198
  9. Unstructured data - companies are drowning in data, much of which are unstructured. The amount of data is increasing exponentially
  10. Big Data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

5.1 What are the multiple sources of data (one of the difficulties of managing data)? Give examples. Hint: IPEN

A

Internal sources
• Corporate databases, company documents

Personal sources
• Personal thoughts, opinions, experiences

External sources
• Commercial databases, government reports, corporate websites, clickstream data

New sources
• Blogs, Tweets, videos, sensor tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

5.1 What is the main solution to the difficulties of managing data?

A

Solutions to these difficulties include effective data governance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

5.1 What is data governance?

A

Data governance is an approach to managing information across an entire organization. It involves a formal set of business processes and policies that are designed to ensure that data are handled in a certain, well-defined fashion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

5.1 What are the objectives of data governance? How do organizations accomplish these? Hint: ATU

A
  • To make information available
  • To ensure transparency of information
  • To enhance usefulness of information

How?

Using business processes and policies for handling data in a certain well-defined way. Following unambiguous rules to create, collect, handle and protect data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

5.1 What strategy does data governance use to implement sound data governance?

A

Master Data Management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

5.1 What is Master Data Management?

A

Master data management is a process that spans all of an organization’s business processes and applications.

It provides companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely “single version of the truth” for the company’s master data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

5.1 Why aren’t master data and transactional data the same?

A

Master data are a set of core data, such as customer, product, employee, vendor, geographic location, and so on, that span the enterprise’s information systems.

Transactional data are generated and captured by operational systems, describe the business’s activities, or transactions.

Master data are applied to multiple transactions, and they are used to categorize, aggregate, and evaluate the transactional data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

5.1 What are the resulting benefits of master data management? Hint: ASF = E

A

Master data management leads to

  • Increasing the accuracy of data
  • This helps with streamlining new product entry into the database management system
  • Thus, it is a way to facilitate the processing of transactions (e.g. at retail stores)
  • In short, we have data that makes us effective – while reaching our goal to serve customers seamlessly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

5.3 What is structured data? What is unstructured data? Give examples.

A

Structured data fits into predefined fields and can be organized into a spreadsheet or a relational database.
Examples: names, dates, addresses, credit card numbers, etc.

Unstructured data is heterogenous and does not fall within standard fields.
Example: email messages, audio files, Facebook posts, ratings, recommendations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

5.3 Define Big Data.

A

We refer to the superabundance of data available today as Big Data. Big Data is a collection of data that is so large and complex that it is difficult to manage using traditional database management systems.

Essentially, Big Data is about predictions that come from applying mathematics to huge quantities of data to infer probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

5.3 Where do Big Data come from (sources)?

A
  • traditional enterprise data
  • machine-generated/sensor data
  • social data
  • images captured by billions of devices around the world (digital cameras, camera phones, medical scanners, security cameras)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

5.3 What are the three distinct characteristics of Big Data?

A

Volume + Velocity + Variety

  1. Volume: We noted the huge volume of Big Data. Consider machine-generated data, which are generated in much larger quantities than nontraditional data. Smart electrical meters, sensors in heavy industrial equipment, and telemetry from automobiles compound the volume problem.
  2. Velocity: The rate at which data flow into an organization is rapidly increasing. Velocity is critical because it increases the speed of the feedback loop between a company, its customers, its suppliers, and its business partners. Companies can gain a competitive advantage if they can quickly use that information.
  3. Variety: Traditional data formats tend to be structured and relatively well described, and they change slowly. Traditional data include financial market data, point-of-sale transactions, and much more. In contrast, Big Data formats change rapidly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

5.3 What are the three issues with Big Data?

A
  1. Big data can come from untrusted sources (can be sources internal or external to the org., the data can come from an unverified source, reported data itself may be false or misleading)
  2. Big Data is dirty (inaccurate, incomplete, incorrect, duplicate, or erroneous data, ex.: misspelling of words, duplicate data like retweets)
  3. Big Data changes, especially in data streams (data quality in an analysis can change, or the data themselves can change because the conditions under which the data are captured can change)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

5.3 Name 5 functional areas of the organization where Big Data is used. Hint: HP + OMG

A
  1. Human resources (managing benefits to reduce cost, hiring)
  2. Product development (Ford’s work with auto-enthusiast sites/forums for information on turn indicators)
  3. Operations (UPS reduced fuel consumption by 32M liters)
  4. Marketing (used to better understand cx & target mkt efforts → craft more personalized messages)
  5. Government operations (United Kingdom congestion example)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

5.4 What are the elements of a generic warehouse environment?

A
  1. Source systems: where you collect data (website? ERP?)
  2. Data integration technology and processes that prepare the data for use
  3. ETL process: extract data, transform it, and load it to the warehouse
  4. Different architectures for storing data: data warehouses* or data marts*
  5. Different tools and applications for the variety of users
  6. Metadata*: Data about data (ex.: What format does the data look like?) It is up to data governance to ensure that the data warehouse or data mart meets its purposes.
17
Q

5.4 What is a data warehouse? What is a data mart?

A

A data warehouse is a repository of historical data that are organized by subject to support decision makers within the organization. Because data warehouses are so expensive, they are used primarily by large companies.
A data mart is a low-cost, scaled-down version of a data warehouse that is designed for the end-user needs in a strategic business unit (SBU) or an individual department. Data marts can be implemented more quickly than data warehouses, often in less than 90 days. They support local rather than central control by conferring power on the user group.

18
Q

5.4 What is metadata?

A

Metadata is data about the data in a repository

19
Q

5.4 What is a data lake?

A

A data lake is a vast pool of raw data, the purpose for which is not yet defined.

Data lake includes structured and unstructured data, whereas a data warehouse is a repository for structured, filtered data that has already been processed for a specific purpose.

20
Q

5.2 What is a data file?

A

A data file is a collection of logically related records. (ex.: a shopping list)
In a file management environment, each application has a specific data file related to it.

21
Q

5.2 What are the three issues with file management system? Hint: RICK

A
  • data redundancy
  • data isolation
  • data inconsistency
22
Q

5.2 What is the database and which problems does it minimize? Hint: RICK

A

Database:

  • An organized collection of data, generally stored and accessed electronically from a computer system.
  • Provides all users with access to all data

Databases minimize the following problems:

  • Data redundancy: The same data are stored in many places.
  • Data isolation: Applications cannot access data associated with other applications.
  • Data inconsistency: Various copies of the data do not agree.
23
Q

5.4 What are the differences between data warehouses and databases? Hint: think about content + search time + goal

A

Database:

  • data content is from current operations, normally updated in real time, with high volumes.
  • searching in a database could slow down operations (with many searches)
  • content changes frequently due to transaction processing and changes to master data
  • it is optimized for online processing of single transactions

Data warehouse

  • data content from past and current data that is updated at regular intervals (e.g. hourly, daily)
  • searching in a data warehouse can be done with a long turnaround time or even overnight
  • content is read-only, data cannot be changed only added to
  • goal is to support business intelligence applications, such as complex manipulations of arrays
24
Q

5.2 What do Database Management Systems maximize? Hint: ISIS

A

Data security:

  • High security measures in place to deter mistakes and attacks since data is stored in one place.

Data integrity:

  • Data must meet certain constraints,
    E.g., no alphabetic characters in a SIN field.

Data independence:

  • Applications and data are independent of one another, All applications are able to access the same data.
25
Q

5.2 What is the data hierarchy?

A

[From lowest to highest]

  • Bit: (binary digit) represents the smallest unit of data a computer can process (1 or 0).
  • Byte: represents a single character, often composed of 8 bits
  • Field: A logical grouping of related characters
  • Record: A logical grouping of related fields
  • File (or table): A logical grouping of related records
  • Database: A logical grouping of related files