Chapter 5 Flashcards

1
Q

what does high quality mean in terms of data

A

accurate, complete, timely, consistent, accessible, relevant, and concise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is managing data difficult (general)?

A

data are processed in several stages and often in multiple locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is managing data difficult (specific)?

A
  • amount of data increase exponentially with time
  • data are also scattered throughout organizations (collected by different individuals using different methods, thus data is stored in many locations and servers and in different systems, databases, formats, and languages(human and computer))
  • data are generated from multiple sources
  • new sources of data are constantly being developed
  • data are subject to data rot
  • data security, quality and integrity are critical but easily jeopardized
  • orgs have different ISs for specifc business processes, and this impose unqiue requirements on data
  • federal gov regulation
  • companies are drwoning in much unstructred data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some sources where data comes from?

A
  • internal sources (ex. corporate databases and company documents)
  • personal sources (ex. personal thoughts, opinions and experiences)
  • external sources (ex. commercial databases, gov reports, corporate websites)
  • the web (clickstream data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

def. clickstream data

A

Data collected about user behaviour and browsing patterns by monitoring users’ activities when they visit a website. (click on hyperlinks)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some examples of data degrading overtime?

A

customers move to new addresses/change names, companies go out of business, new products are developed, companies expand into new countries, employees are hired or fired

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is data rot? What are its two aspects?

A

refers primarily to problems with the media on which the data are stored

Aspects
Physical problems :Over time, temperature, humidity, and exposure to light can cause physical problems with storage media and thus make it difficult to access the data.

Difficulty finding the machines needed to access the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the impact on data of having ISs develop over time?

A

Information systems that specifically support these processes impose unique requirements on data, which results in repetition and conflicts across the organization

-ex. he marketing function might maintain information on customers, sales territories, and markets. These data might be duplicated within the billing or customer service functions. This situation can produce inconsistent data within the enterprise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does inconsistent data prevent a company from developing?

A

a unified view of core business information (data concerning customers, products, finances, etc.) across the org and its ISs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the most significant government regulation affecting data?

A

Bill 198
requires:
(1)public companies evaluate and disclose the effectiveness of their internal financial controls
(2)independent auditors for these companies agree to this disclosure

-also holds CEOs and CFOs personally responcible for these diclosures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do gov regulations impact data?

A

they require companies to account for how information is being managed within their organizations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What must companies do with the amount of data to be able t profit?

A

companies must develop a strategy for managing these data effectively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

def. data governance

A

An approach to managing information across an entire organization which involves a formal set of business processes and policies that are designed to ensure that data are handled in a certain, well-defined fashion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What happens in data governance (general)?

A

the organization follows unambiguous rules for creating, collecting, handling, and protecting its information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the goal of data governance?

A

make information available, transparent, and useful for the people who are authorized to access it, from the moment it enters an organization until it is outdated and deleted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

def. master data management

A

A process that provides companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely “single version of the truth” for a company’s core master data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a strategy for implementing data governance?

A

master data mangement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

def. master data

A

A set of core data, such as customer, product, employee, vendor, geographic location, and so on, that span an enterprise’s information systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the difference between transaction data and master data?

A

Transaction data, which are generated and captured by operational systems, describe the business’s activities or transactions. In contrast, master data are applied to multiple transactions and are used to categorize, aggregate, and evaluate the transaction data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How did businesses manage their data during the first adopted computer applications era?

A

file management environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

def. data file (table)

A

a collection of logically related records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What happens in a file management environment?

A

each application has a specific data file related to it, which contains all of the data record the application requires

over times, orgs evloped numerous applications, each with an associated, application-specific data file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What can the use of databases solve? (6)

A

minimize:

  • data redundancy
  • data isolation
  • data inconsistency

maximize

  • data security
  • data integrity
  • data independence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

how are databases arranged?

A

arranged so that one set of software programs—the database management system—provides all users with access to all of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

def. data redundancy

A

The same data are stored in multiple locations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

def. data isolation

A

Applications cannot access data associated with other applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

def. data inconsistency

A

Various copies of the data do not agree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do database systems maximize data security?

A

Because data are “put in one place” in databases, there is a risk of losing a lot of data at one time. Therefore, databases must have extremely high security measures in place to minimize mistakes and deter attacks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do database systems maximize data integrity?

A

Data meet certain constraints; for example, there are no alphabetic characters in a Social Insurance Number field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How do database systems maximize data independence?

A

Applications and data are independent of one another; that is, applications and data are not linked to each other, so all applications are able to access the same data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

how are data arranged to make them more understandable and useful?

A

in a hierarchy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What does a data hierarchy begin with?

A

bits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

def. bit

A

(binary digit) represents the smallest unit of data a computer can process (0 or 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

def. byte

A

A group of eight bits that represents a single character

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

def. field

A

A characteristic of interest that describes an entity, can also contain data other than text and numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

def. record

A

A grouping of logically related fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

describe the data hierarchy

A
bit
byte
field
record
data file/table
database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

def. database management system (DBMS)

A

The software program (or group of programs) that provides access to a database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What does managing a database involve?

A

adding, deleting, accessing, modifying, and analyzing data stored in a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How can an org access data in a database?

A

by using query and reporting tools that are part of the DBMS or by using application programs specifically written to perform this function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

DBMSs provide mechanisms for ________, _________, and _______

A

maintaining the integrity of stored data, managing security and user access, and recovering information if the system fails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is az type of database architecture that is popular and easy to use?

A

relational database model

ex. Oracle, microsoft Access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How were most data traditionally organized?

A

into simple tables consisting of columns and rows
-Tables allow people to compare information quickly by row or column. In addition, users can retrieve items rather easily by locating the point of intersection of a particular row and column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

def. relational database model

A

A data model based on the simple concept of tables in order to capitalize on characteristics of rows and columns of data.

-generally not one big table (flat file) that contains all of the records and attributes, but is instead a relational database is usually designed with a number of related tables. Each of these tables contains records (listed in rows) and attributes (listed in columns).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What must a relational database do to be valuable?

A

must be organized so that users can retrieve, analyze, and understand the data they need

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

what is key to designing an effective database?

A

data model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

def. data model

A

A diagram that represents entities in the database and their relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

def. entity

A

person, place, thing, or event—such as a customer, an employee, or a product—about which information is maintained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

def. instance (of an entity)

A

Each row in a relational table, which is a specific, unique representation of the entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

def. attribute

A

Each characteristic or quality of a particular entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

def. primary key

A

A field (or attribute) of a record that uniquely identifies that record so that it can be retrieved, updated, and sorted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What must every record in the database contain so that it can be retrieved, updated and sorted? What is it called?

A

must contain at least one field that uniquely identifies that record so that it can be retrieved, updated, and sorted

primary key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

def. secondary key

A

A field that has some identifying information, but typically does not uniquely identify a record with complete accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

def. foreign key

A

A field (or group of fields) in one table that uniquely identifies a row (or record) of another table

-used to establish and enforce a link between two tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

orgs implement databases to ___________

A

efficiently and effectively manage their data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What are the three main operations performed on databases?

A

query languages, normalization, and joins

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

why is it not practical to allow users access to databases?

A

Because databases typically process data in real time, thus the data would change while the user is looking at them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

def. big data

A

A collection of data so large and complex that it is difficult to manage using traditional database management systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What is big data about?

A

predictions, which come from applying mathematics to huge quantities of data to infer probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Why do do big data systems perform well?

A

because they contain huge amounts of data on which to base their predictions, and they are configured to improve themselves over time by searching for the most valuable signals and patterns as more data are input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What are is Gartner’s description of big data?

A

defines Big Data as diverse, high-volume, high-velocity information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

what does The Big Data Institute describe big data as?

A

defines Big Data as vast data sets that perform the following:
•Exhibit variety.
•Include structured, unstructured, and semi-structured data.
•Are generated at high velocity with an uncertain pattern.
•Do not fit neatly into traditional, structured, relational databases.
•Can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

what does big data generally consist of?

A
  • traditional enterprise data
  • machine-generated/sensor data
  • social data
  • images
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What are some examples of traditional enterprise data

A

(ex. customer info from CRM. transactional enterprise resource planning data, web store transactions, operations data, general ledger data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

examples of machine-generated/sensor data?

A

smart meters; manufacturing sensors; sensors integrated into smart phones, automobiles, airplane engines, and industrial machines; equipment logs; and trading systems data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

examples of social data?

A

Examples are customer feedback comments; microblogging sites such as Twitter; and social media sites such as Facebook, YouTube, and LinkedIn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What are the three distinct characteristics of Big Data (general)?

A

Volume, velocity, variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

what is unique about volume in big data

A

-huge volume of Big Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

what is unique about the velocity of big data?

A

The rate at which data flow into an organization is rapidly increasing. Velocity is critical because it increases the speed of the feedback loop between a company, its customers, its suppliers, and its business partners

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

what is unique about variety of big data?

A

Where traditional data formats are structured, well described and change slowly (financial market data, point-of-sale transactions, and much more.), Big Data formats change rapidly
-include: satellite imagery, broadcast audio streams, digital music files, web page content, scans of government documents, and comments posted on social networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Why do certain types of data appear to have no value today?

A

because we have not yet been able to analyze them effectively

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

What are the three big issues with Big Data? (general)

A
  • big data can come from untrusted sources
  • Big Data is “dirty”
  • Big Data changes, especially in data streams
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

Describe the issue with big data coming from untrusted sources

A

since Big Data comes from a wide variety of sources (internal or external), it is hard to know if all of these sources are reliable. Further, the data itself, reported by the source, can be false or misleading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

describe what it means to say Big Data is “dirty”

A

Dirty data are data that are inaccurate, incomplete, incorrect, duplicate, or erroneous

ex. misspellings of words and duplicate data such as retweets or company press releases that appear numerous times in social media

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

describe why Big Data changing presents an issue

A

Organizations must be aware that data quality in an analysis can change, or the data itself can change, because the conditions under which the data are captured can change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

What can big data reveal?

A

valuable patterns and information that were previously hidden because of the amount of work required to discover them

(ex. spot business trends more rapidly and accurately, prevent disease, track crime)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

What is the first two steps most orgs take toward managing data?

A

integrate information silos into a database environment and then to develop data warehouses for decision making

-organizations turned their attention to the business of information management—making sense of their proliferating data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

In addition to existing data-management systems, what else do orgs employ to process big data?

A

NoSQL databases

Think of them as “not only SQL” (structured query language) databases

79
Q

Why are NoSQL databases good for big data?

A

they can manipulate structured as well as unstructured data and inconsistent or missing data.

80
Q

making big data available for relevant stakeholders can do what?

A

help make orgs gain value

81
Q

How does Big Data allow orgs to improve?

A

by conducting controlled experiments

82
Q

What are A/B experiments?

A

experiments that have only two possible outcomes

83
Q

what is microsegmentation

A

dividing customers into very small groups, or even down to the individual level

Big Data can allow this

84
Q

What are the general ways big Data adds value?

A
  • making it available to stakeholders provides adds value for them
  • Enabling Organizations to Conduct Experiments
  • Microsegmenting Customers
  • Creating New Business Models
  • Being Able to Analyze More Data
85
Q

What is an example of Big Data making a new business model?

A

-transportation
a company put sensors on all of its trucks, which collect data on vehicle usage

analyzing this data allowed the company to improve the condition of its trucks and the driving skills of its operators
-also could do risk analysis to lower insurance

86
Q

How does the concept of being able to analyze big data help add value?

A
  • organizations can use Big Data to process all the data relating to a particular phenomenon, meaning that they do not have to rely as much on sampling
  • random sampling works well but isn’t as effective as looking at the entire dat set
  • random sampling has weaknesses (accuracy on ensuring randomness is difficult due to systemic biases in the process of data collection)
87
Q

How is big data useful in human resources (employee benefits)?

A
  • better manage employee benefits (particularly health care)
  • analyzing where people use their health care can help the companies save money, as they can issue reminders and resources to use cheaper treatment
88
Q

How is big data useful in human resources (hiring)?

A
  • if use online assessment, can then figure out HOW candidates answer and not only WHAT they answer
  • it recognizes that people bring different skills to the table and that there is no one-size-fits-all person for any job
  • –analyzing millions of data points can reveal which attributes candidates bring to specific situations
89
Q

How can Big Data help with product development?

A

Big Data can help capture customer preferences and put that information to work in designing new products

ex. using text mining to see consumer feedback

–use of text-mining algorithms was critical in this effort [Ford example] because they provided the company with a complete picture that would not have been available using traditional market research.

90
Q

How can Big Data help operations?

A

companies have been using information technology to make their operations more efficient

ex. sensors placed on delivery vehicles to capture truck’s speed, location, how many times the truck reversed, whether seatbelt is buckled, etc. (UPS)
- -using this data reduced fuel consumption and cut kilometres off routes

91
Q

How can Big Data help marketing?

A
  • for a while, managers have used data to better understand their customers and to target their marketing efforts more directly.
  • Big Data enables marketers to craft much more personalized messages.

ex. analyzing UK’s InterContinental Hotels Group guests
- looks at priority club rewards program members(income levels, what kind of accommodation they prefer)
- combined this data with info obtained from social media into one single data warehouse
- launched new marketing campaign that was able to generate more than 1500 personalized marketing messages (compared to 7-15)
- campaign generated a 35-percent higher rate of customer conversions, or acceptances, than previous similar campaigns.

92
Q

How can big data help gov operations? (dutch example)

A
  • water management is critical
  • gov operates a sophisticated water management system, managing a network of dykes or levees, canals, locks, harbours, dams, rivers, storm-surge barriers, sluices, and pumping stations
  • government makes use of a vast number of sensors embedded in every physical structure used for water control (generate huge amounts of data)
  • dykes specifically: sensors in dykes can provide information on the structure of the dyke, how well it is able to handle the stress of the water it controls, and whether it is likely to fail
  • Dutch authorities have reduced the costs of managing water by 15 percent
93
Q

What do the most successful companies do? What is a key to this response? What is a challenge?

A

those that can respond quickly and flexibly to market changes and opportunities

A key to this response is the effective and efficient use of data and information by analysts and managers

The challenge is providing users with access to corporate data so that they can analyze the data to make better decisions

94
Q

In general, data warehouses and data marts support __________ applications

A

business intelligence (BI)

95
Q

def. data warehouse

A

A repository of historical data that are organized by subject to support decision makers in the organization

96
Q

Who uses data warehouse primarily and why?

A

large companies because they are expensive

97
Q

def. data mart

A

A low-cost, scaled-down version of a data warehouse that is designed for the end-user needs in a strategic business unit (SBU) or a department

98
Q

Which can be implemented more quickly: data warehouses or data marts?

A

data marts (often in less than 30 days)

99
Q

How do data mart support local control over central control?

A

they confer power on the user group

100
Q

What are the basic characteristics of data warehouses and data marts?

A
  • Organized by business dimension or subject (not by business process like transactional systems)
  • Use online analytical processing (OLAP)
  • integrated
  • Time variant
  • Nonvolatile
  • Multidimensional
101
Q

Describe this basic characteristic of data warehouses and data marts: use online analytical processing

A

involves the analysis of accumulated data by end users, rather than processing business tractions as they occur (OLTP)

102
Q

Describe this basic characteristic of data warehouses and data marts: integrated

A

Data are collected from multiple systems and then integrated around subjects

ex. customer data may be extracted from internal (and external) systems and then integrated around a customer identifier, thereby creating a comprehensive view of the customer

103
Q

Describe this basic characteristic of data warehouses and data marts: time variant

A

Data warehouses and data marts maintain historical data (i.e., data that include time as a variable)

  • can store years of data
  • Organizations use historical data to detect deviations, trends, and long-term relationships.
104
Q

Describe this basic characteristic of data warehouses and data marts: nonvolatile

A

users cannot change or update the data.

  • the warehouse or mart reflects history, which is critical for identifying and analyzing trends
  • Warehouses and marts are updated, but through IT-controlled load processes rather than by users.
105
Q

Describe this basic characteristic of data warehouses and data marts: multidimensional

A

have a multidimensional structure

store data in more than two dimensions

106
Q

def. multidimensional structure

A

Storage of data in more than two dimensions; a common representation is the data cube.

107
Q

How are the data in data warehouses and marts organized?

A

by business dimensions (subjects such as product, geographic area, and time period that represent the edges of the data cube)

108
Q

What can business dimensions do for the user in analyzing data?

A
  • Users can view and analyze data from the perspective of these business dimensions.
  • This analysis is intuitive because the dimensions are presented in business terms that users can easily understand.
109
Q

What does the environment for data warehouses and marts include? (5 things)

A
  1. Source systems that provide data to the warehouse or mart.
  2. Data-integration technology and processes that prepare the data for use.
  3. Different architectures for storing data in an organization’s data warehouse or data marts.
  4. Different tools and applications for the variety of users.
  5. Metadata, data quality, and governance processes that ensure that the warehouse or mart meets its purposes.
110
Q

What causes orgs to develop its BI capabilities? What does it lead to?

A

“organizational pain”

this pain leads to information requirements, BI applications, and source system data requirements

111
Q

What are some different source systems modern orgs can choose from?

A
  • operational/transactional systems,
  • enterprise resource planning (ERP) systems,
  • website data,
  • third-party data (e.g., customer demographic data)
112
Q

what is the trend to include with source systems

A

more types of data

113
Q

what is a common source for the data in data warehouses?

A

company’s operational databases, which can be relational databases

114
Q

What should happen to counteract the “bad data” that is present in source systems that have been in use for many years?

A

data-profiling software should be used at the beginning of a warehousing project to better understand the data

115
Q

What other things must orgs do to address source system issues?

A
  • Often there are multiple systems that contain some of the same data and the best system must be selected as the source system
  • must also decide how granular (i.e., detailed) the data should be
116
Q

What is the conventional wisdom about how granular data should be?

A

it is best to store data at a highly granular level because someone will likely request the data at some point

117
Q

What does the data integration process look like?

A

organizations need to extract the data, transform them, and then load them into a data mart or warehouse (this order referred to as ETL)

118
Q

How can data extraction be performed? What do most companies use?

A

by handwritten code (e.g., SQL queries) or by commercial data-integration software

  • most companies employ commercial software,
  • –as it makes it relatively easy to specify the tables and attributes in the source systems that are to be used, map and schedule the movement of the data to the target (e.g., a data mart or warehouse), make the required transformations, and ultimately load the data
119
Q

Why are data transformed

A

to make them more useful

120
Q

What are some examples of kinds of transformations?

A
  • format changes
  • aggregations
  • data-cleansing (ex. eliminating duplicate records for the same customer)
121
Q

How are data loaded into warehouses or marts?

A

loaded into the warehouse or mart during a specific period known as the “load window.

122
Q

What is happening to the “load window”? What are companies doing because of it?

A

it is becoming smaller as companies seek to store ever-fresher data in their warehouses

-many companies have moved to real-time data warehousing where data are moved (using data-integration processes) from source systems to the data warehouse or mart almost instantly

123
Q

What is the most common data storing architecture? Why?

A

one central enterprise data warehouse, without data marts

-Most organizations use this approach, because the data stored in the warehouse are accessed by all users and represent the single version of the truth.

124
Q

Describe the independent data marts architecture

A
  • stores data for a single application or a few applications, such as marketing and finance
  • Limited thought is given to how the data might be used for other applications or by other functional areas in the organization.
  • This is a very application-centric approach to storing data.
  • not particularly effective, does not represent an enterprise-wide approach to data management
  • expensive to build and maintain, and can contain inconsistent data
125
Q

describe the hub and spoke data warehouse architecture

A
  • contains a central data warehouse that stores the data plus multiple dependent data marts that source their data from the central repository
  • Because the marts obtain their data from the central repository, the data in these marts still compose the single version of the truth for decision-support purposes
  • dependent data marts store the data in a format that is appropriate for how the data will be used and for providing faster response times to queries and applications, making it easier for users to view and analyze the data
126
Q

What is metadata?

A

data about the data

127
Q

Who needs metadata?

A

the IT personnel who operate and manage the data warehouse and the users who access the data need metadata

IT: need information about data sources; database, table, and column names; refresh schedules; and data-usage measures

User: needs include data definitions, report/query tools, report distribution information, and contact information for the help desk

128
Q

What happens if the quality of the data in the warehouse does not meet users’ needs?

A

-users will not trust the data and ultimately will not use it

129
Q

What are the two main solutions to improve data quality?

A
  • data-cleansing software
  • better, long-term solution: improve the quality at the source system level
  • —his approach requires the business owners of the data to assume responsibility for making any necessary changes to implement this solution
130
Q

What do orgs need to implement to plan and control BI activities?

A

governance

131
Q

What does governance require?

A

that people, committees, and processes be in place

132
Q

What do companies that are effective in BI governance usually do?

A

often create a senior-level committee composed of vice presidents and directors who (1) ensure that the business strategies and BI strategies are aligned, (2) prioritize projects, and (3) allocate resources

also establish middle-management-level committee that oversees various projects in BI portfolio

lower-level operational committees perform tasks such as creating data definitions

133
Q

At what point can orgs begin to obtain business value from BI?

A

Once the data are loaded in a data mart or warehouse, and they can be accessed by users

134
Q

What is an information producer?

A

primary role is to create information for other users

IT developers and analysts typically fall into this category

135
Q

What is an information consumer?

A

they use information created by others

managers and executives fall into this categroy

136
Q

What are some benefits of data warehousing? (3)

A
  • End users can access needed data quickly and easily via web browsers because these data are located in one place.
  • End users can conduct extensive analysis with data in ways that were not previously possible.
  • End users can obtain a consolidated view of organizational data.
137
Q

How do the end user benefits of data warehousing benefit a business?

A

they can improve business knowledge, provide competitive advantage, enhance customer service and satisfaction, facilitate decision making, and streamline business processes

138
Q

What are the limitations of data warehouses?

A
  • very expensive to build and maintain
  • incorporating data from obsolete mainframe systems can be difficult and expensive
  • people in one department might be reluctant to share data with other departments
139
Q

Where is most of a compnay’s knowledge located? What arises from this?

A

dispersed in emails, word processing documents, spreadsheets, presentations on individual computers, and people’s heads

  • This arrangement makes it extremely difficult for companies to access and integrate this knowledge.
  • The result frequently is less-effective decision making.
140
Q

def. knowledge management (KM)

A

A process that helps organizations identify, select, organize, disseminate, transfer, and apply information and expertise that are part of the organization’s memory and that typically reside within the organization in an unstructured manner

141
Q

for an org to be successful, what must happen with knowledge?

A

knowledge, as a form of capital, must exist in a format that can be exchanged among persons.
–In addition, it must be able to grow.

142
Q

What is knowledge?

A

information that is contextual, relevant, and useful

–it is information in action

143
Q

def. intellectual capital (or intellectual assets)

A

another term for knowledge

144
Q

What distinguished knowledge from information?

A

knowledge has strong experiential and reflective elements that distinguish it from information in a given context

knowledge can be used to solve a problem

145
Q

def. explicit knowledge

A

more objective, rational, and technical types of knowledge

146
Q

What does explicit knowledge consist of in organizations?

A
  • policies
  • procedural guides
  • reports
  • products,
  • strategies,
  • goals,
  • core competencies
  • IT infrastructure of the enterprise.
147
Q

In other words, explicit knowledge is the knowledge that has been codified (documented) in a form _________________________

A

that can be distributed to others or transformed into a process or a strategy

148
Q

def. tacit knowldge

A

The cumulative store of subjective or experiential learning, which is highly personal and hard to formalize

149
Q

What does tacit knowledge consist of in orgs?

A
  • organization’s experiences,
  • insights
  • expertise,
  • know-how,
  • trade secrets,
  • skill sets,
  • understanding,
  • learning
  • org culture
150
Q

Tacit knowledge is generally _______ (precise or imprecise) and _____ (costly or not costly) to transfer

A

imprecise, costly

151
Q

T or F: Tacit knowledge is personal

A

T

152
Q

Is tacit knowledge difficult to codify/formalize? Why or why not?

A

yes because it is unstructured

153
Q

what is the goal of knowledge managment?

A

to help an organization make the most productive use of the knowledge it has accumulated

154
Q

def. knowledge management systems (KMSs)

A

Information technologies used to systematize, enhance, and expedite intra- and inter-firm knowledge management.

155
Q

What are KMSs intended to help orgs cope with? How do the do this?

A

turnover, rapid change, and downsizing

by making the expertise of the organization’s human capital widely accessible

156
Q

what is the most important benefit of KMSs?

A

they make the best practices readily available to a wide range of employees

-which improves overall org performance

157
Q

def.best practice

A

The most effective and efficient ways to do things

158
Q

What are the challenges with implementing effective KMSs?

A
  1. employees must be willing to share their personal tacit knowledge
    - —(To encourage this behaviour, organizations must create a knowledge management culture that rewards employees who add their expertise to the knowledge base)
  2. the organization must continually maintain and upgrade its knowledge base
  3. companies must be willing to invest in the resources needed to carry out these operations
159
Q

What are the six steps of a functioning KMS cycle?

A
  1. Create knowledge: Knowledge is created as people determine new ways of doing things or develop know-how. Sometimes external knowledge is brought in.
  2. Capture knowledge: New knowledge must be identified as valuable and be represented in a reasonable way.
  3. Refine knowledge: New knowledge must be placed in context so that it is actionable. This is where tacit qualities (human insights) must be captured along with explicit facts.
  4. Store knowledge: Useful knowledge must then be stored in a reasonable format in a knowledge repository so that other people in the organization can access it.
  5. Manage knowledge: Like a library, the knowledge must be kept current. It must be reviewed regularly to verify that it is relevant and accurate.
  6. Disseminate knowledge: Knowledge must be made available in a useful format to anyone in the organization who needs it, anywhere and anytime.
160
Q

T or F: the knowledge in an effective KMS can be finalized

A

F: The knowledge in an effective KMS is never finalized because the environment changes over time and knowledge must be updated to reflect these changes

its a cyclical system

161
Q

what feature allows great flexibility in the variety of queries that can be made?

A

the uniqueness of the primary key tells the DBMS which records are joined with others in related tables

162
Q

What is the main disadvantage of the relational database model?

A

large-scale databases can be composed of many interrelated tables, the overall design can be complex, leading to slow search and access times

163
Q

what is the most commonly performed database operation?

A

searching for information

164
Q

def. structured query language (SQL)

A

The most popular query language for requesting information from a relational database
-allows people to perform complicated searches by using relatively simple statements or key words

165
Q

what are typical key words fro SQL?

A
  • select (to choose a desired attribute)
  • from(to specify the table or tables to be used)
  • where (to specify conditions to apply in the query)
166
Q

def. query by example (QBE)

A

Obtaining information from a relational database by filling out a grid or template—also known as a form—to construct a sample or a description of the data desired

167
Q

def. entity-relationship (ER) modelling

A

The process of designing a database by organizing data entities to be used and identifying the relationships among them

168
Q

def. entity-relationship (ER) diagram

A

A document that shows data entities and attributes and relationships among them

169
Q

What do ER diagrams consist of?

A

entities, attributes, and relationships

170
Q

what is the first step in properly identifying entities, attributes, and relationships

A

database designers must identify the business rules for the particular data model

171
Q

def. business rules

A

precise descriptions of policies, procedures, or principles in any organization that stores and uses data to generate information

172
Q

What are business rules derived from?

A

a description of an organization’s operations, and help create and enforce business processes in that organization (you determine these rules not the MIS department)

173
Q

def. data dictionary

A

A collection of definitions of data elements; data characteristics that use the data elements; and the individuals, business functions, applications, and reports that use these data elements

can also provide info about why the attribute is needed and how often the attribute should be updated

174
Q

why is ER modelling valuable?

A

because it allows database designers to communicate with users throughout the organization to ensure that all entities and the relationships among the entities are represented

175
Q

What are the ER diagram pictures/representations?

A

Entities are pictured in rectangles, and relationships are described on the line between two entities. The attributes for each entity are listed, and the primary key is underlined.

176
Q

def. relationships

A

Operators that illustrate an association between two entities

177
Q

def. degree of relationship

A

indicated the number of entities associated with a relationship

178
Q

def. unary relationship

A

exists when an association is maintained within a single entity

179
Q

def. binary relationship

A

exists when two entities are associated

180
Q

def. ternary relationship

A

exists when three entities are associated

181
Q

what are the most common relationships?

A

binary

182
Q

how can entity relationships be classified?

A

one-to-one, one-to-many, or many-to-many

183
Q

def. connectivity

A

The classification of a relationship: one-to-one, one-to-many, or many-to-many.

184
Q

def. cardinality

A

The uniqueness of data values with a column in a database. High cardinality means that the column has mostly unique values. Low cardinality means that the column has several “repeats” in its data range

185
Q

What are cardinality and connectivity established by?

A

the business rules of a relationship

186
Q

Cardinality can be _____, _____, ______, or _______

A

mandatory single, optional single, mandatory many, or optional many

187
Q

entity instances have ____ or _____ which are _______ that are unqiue to that entity instance

A

identifiers or primary keys

are attributes

188
Q

entities have _____________ that describe the entity’s characteristics

A

attributes (or properties)

189
Q

What is a one-to-one entity relationship (1:1)?

A

a single-entity instance of one type is related to a single-entity instance of another type

190
Q

def. normalization

A

a method for analyzing and reducing a relational database to its most streamlined form to ensure minimum redundancy, maximum data integrity, and optimal processing performance

191
Q

what is data normalization?

A

Data normalization is a methodology for organizing attributes into tables so that redundancy among the non-key attributes is eliminated

192
Q

what is the result of the data normalization process?

A

a properly structured relational database

193
Q

def. functional dependencies

A

A means of expressing that the value of one particular attribute is associated with, or determines, a specific single value of another attribute

194
Q

def. join operation

A

A database operation that combines records from two or more tables in a database