Chapter 5 Flashcards
what does high quality mean in terms of data
accurate, complete, timely, consistent, accessible, relevant, and concise
Why is managing data difficult (general)?
data are processed in several stages and often in multiple locations
Why is managing data difficult (specific)?
- amount of data increase exponentially with time
- data are also scattered throughout organizations (collected by different individuals using different methods, thus data is stored in many locations and servers and in different systems, databases, formats, and languages(human and computer))
- data are generated from multiple sources
- new sources of data are constantly being developed
- data are subject to data rot
- data security, quality and integrity are critical but easily jeopardized
- orgs have different ISs for specifc business processes, and this impose unqiue requirements on data
- federal gov regulation
- companies are drwoning in much unstructred data
What are some sources where data comes from?
- internal sources (ex. corporate databases and company documents)
- personal sources (ex. personal thoughts, opinions and experiences)
- external sources (ex. commercial databases, gov reports, corporate websites)
- the web (clickstream data)
def. clickstream data
Data collected about user behaviour and browsing patterns by monitoring users’ activities when they visit a website. (click on hyperlinks)
What are some examples of data degrading overtime?
customers move to new addresses/change names, companies go out of business, new products are developed, companies expand into new countries, employees are hired or fired
What is data rot? What are its two aspects?
refers primarily to problems with the media on which the data are stored
Aspects
Physical problems :Over time, temperature, humidity, and exposure to light can cause physical problems with storage media and thus make it difficult to access the data.
Difficulty finding the machines needed to access the data
What is the impact on data of having ISs develop over time?
Information systems that specifically support these processes impose unique requirements on data, which results in repetition and conflicts across the organization
-ex. he marketing function might maintain information on customers, sales territories, and markets. These data might be duplicated within the billing or customer service functions. This situation can produce inconsistent data within the enterprise
What does inconsistent data prevent a company from developing?
a unified view of core business information (data concerning customers, products, finances, etc.) across the org and its ISs
What is the most significant government regulation affecting data?
Bill 198
requires:
(1)public companies evaluate and disclose the effectiveness of their internal financial controls
(2)independent auditors for these companies agree to this disclosure
-also holds CEOs and CFOs personally responcible for these diclosures
How do gov regulations impact data?
they require companies to account for how information is being managed within their organizations
What must companies do with the amount of data to be able t profit?
companies must develop a strategy for managing these data effectively
def. data governance
An approach to managing information across an entire organization which involves a formal set of business processes and policies that are designed to ensure that data are handled in a certain, well-defined fashion
What happens in data governance (general)?
the organization follows unambiguous rules for creating, collecting, handling, and protecting its information
What is the goal of data governance?
make information available, transparent, and useful for the people who are authorized to access it, from the moment it enters an organization until it is outdated and deleted
def. master data management
A process that provides companies with the ability to store, maintain, exchange, and synchronize a consistent, accurate, and timely “single version of the truth” for a company’s core master data.
What is a strategy for implementing data governance?
master data mangement
def. master data
A set of core data, such as customer, product, employee, vendor, geographic location, and so on, that span an enterprise’s information systems.
What is the difference between transaction data and master data?
Transaction data, which are generated and captured by operational systems, describe the business’s activities or transactions. In contrast, master data are applied to multiple transactions and are used to categorize, aggregate, and evaluate the transaction data
How did businesses manage their data during the first adopted computer applications era?
file management environment
def. data file (table)
a collection of logically related records
What happens in a file management environment?
each application has a specific data file related to it, which contains all of the data record the application requires
over times, orgs evloped numerous applications, each with an associated, application-specific data file
What can the use of databases solve? (6)
minimize:
- data redundancy
- data isolation
- data inconsistency
maximize
- data security
- data integrity
- data independence
how are databases arranged?
arranged so that one set of software programs—the database management system—provides all users with access to all of the data.
def. data redundancy
The same data are stored in multiple locations.
def. data isolation
Applications cannot access data associated with other applications
def. data inconsistency
Various copies of the data do not agree.
How do database systems maximize data security?
Because data are “put in one place” in databases, there is a risk of losing a lot of data at one time. Therefore, databases must have extremely high security measures in place to minimize mistakes and deter attacks.
How do database systems maximize data integrity?
Data meet certain constraints; for example, there are no alphabetic characters in a Social Insurance Number field.
How do database systems maximize data independence?
Applications and data are independent of one another; that is, applications and data are not linked to each other, so all applications are able to access the same data.
how are data arranged to make them more understandable and useful?
in a hierarchy
What does a data hierarchy begin with?
bits
def. bit
(binary digit) represents the smallest unit of data a computer can process (0 or 1)
def. byte
A group of eight bits that represents a single character
def. field
A characteristic of interest that describes an entity, can also contain data other than text and numbers
def. record
A grouping of logically related fields
describe the data hierarchy
bit byte field record data file/table database
def. database management system (DBMS)
The software program (or group of programs) that provides access to a database.
What does managing a database involve?
adding, deleting, accessing, modifying, and analyzing data stored in a database
How can an org access data in a database?
by using query and reporting tools that are part of the DBMS or by using application programs specifically written to perform this function
DBMSs provide mechanisms for ________, _________, and _______
maintaining the integrity of stored data, managing security and user access, and recovering information if the system fails
What is az type of database architecture that is popular and easy to use?
relational database model
ex. Oracle, microsoft Access
How were most data traditionally organized?
into simple tables consisting of columns and rows
-Tables allow people to compare information quickly by row or column. In addition, users can retrieve items rather easily by locating the point of intersection of a particular row and column.
def. relational database model
A data model based on the simple concept of tables in order to capitalize on characteristics of rows and columns of data.
-generally not one big table (flat file) that contains all of the records and attributes, but is instead a relational database is usually designed with a number of related tables. Each of these tables contains records (listed in rows) and attributes (listed in columns).
What must a relational database do to be valuable?
must be organized so that users can retrieve, analyze, and understand the data they need
what is key to designing an effective database?
data model
def. data model
A diagram that represents entities in the database and their relationships
def. entity
person, place, thing, or event—such as a customer, an employee, or a product—about which information is maintained
def. instance (of an entity)
Each row in a relational table, which is a specific, unique representation of the entity
def. attribute
Each characteristic or quality of a particular entity
def. primary key
A field (or attribute) of a record that uniquely identifies that record so that it can be retrieved, updated, and sorted
What must every record in the database contain so that it can be retrieved, updated and sorted? What is it called?
must contain at least one field that uniquely identifies that record so that it can be retrieved, updated, and sorted
primary key
def. secondary key
A field that has some identifying information, but typically does not uniquely identify a record with complete accuracy
def. foreign key
A field (or group of fields) in one table that uniquely identifies a row (or record) of another table
-used to establish and enforce a link between two tables
orgs implement databases to ___________
efficiently and effectively manage their data
What are the three main operations performed on databases?
query languages, normalization, and joins
why is it not practical to allow users access to databases?
Because databases typically process data in real time, thus the data would change while the user is looking at them
def. big data
A collection of data so large and complex that it is difficult to manage using traditional database management systems
What is big data about?
predictions, which come from applying mathematics to huge quantities of data to infer probabilities
Why do do big data systems perform well?
because they contain huge amounts of data on which to base their predictions, and they are configured to improve themselves over time by searching for the most valuable signals and patterns as more data are input
What are is Gartner’s description of big data?
defines Big Data as diverse, high-volume, high-velocity information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization
what does The Big Data Institute describe big data as?
defines Big Data as vast data sets that perform the following:
•Exhibit variety.
•Include structured, unstructured, and semi-structured data.
•Are generated at high velocity with an uncertain pattern.
•Do not fit neatly into traditional, structured, relational databases.
•Can be captured, processed, transformed, and analyzed in a reasonable amount of time only by sophisticated information systems.
what does big data generally consist of?
- traditional enterprise data
- machine-generated/sensor data
- social data
- images
What are some examples of traditional enterprise data
(ex. customer info from CRM. transactional enterprise resource planning data, web store transactions, operations data, general ledger data)
examples of machine-generated/sensor data?
smart meters; manufacturing sensors; sensors integrated into smart phones, automobiles, airplane engines, and industrial machines; equipment logs; and trading systems data.
examples of social data?
Examples are customer feedback comments; microblogging sites such as Twitter; and social media sites such as Facebook, YouTube, and LinkedIn.
What are the three distinct characteristics of Big Data (general)?
Volume, velocity, variety
what is unique about volume in big data
-huge volume of Big Data
what is unique about the velocity of big data?
The rate at which data flow into an organization is rapidly increasing. Velocity is critical because it increases the speed of the feedback loop between a company, its customers, its suppliers, and its business partners
what is unique about variety of big data?
Where traditional data formats are structured, well described and change slowly (financial market data, point-of-sale transactions, and much more.), Big Data formats change rapidly
-include: satellite imagery, broadcast audio streams, digital music files, web page content, scans of government documents, and comments posted on social networks.
Why do certain types of data appear to have no value today?
because we have not yet been able to analyze them effectively
What are the three big issues with Big Data? (general)
- big data can come from untrusted sources
- Big Data is “dirty”
- Big Data changes, especially in data streams
Describe the issue with big data coming from untrusted sources
since Big Data comes from a wide variety of sources (internal or external), it is hard to know if all of these sources are reliable. Further, the data itself, reported by the source, can be false or misleading
describe what it means to say Big Data is “dirty”
Dirty data are data that are inaccurate, incomplete, incorrect, duplicate, or erroneous
ex. misspellings of words and duplicate data such as retweets or company press releases that appear numerous times in social media
describe why Big Data changing presents an issue
Organizations must be aware that data quality in an analysis can change, or the data itself can change, because the conditions under which the data are captured can change
What can big data reveal?
valuable patterns and information that were previously hidden because of the amount of work required to discover them
(ex. spot business trends more rapidly and accurately, prevent disease, track crime)
What is the first two steps most orgs take toward managing data?
integrate information silos into a database environment and then to develop data warehouses for decision making
-organizations turned their attention to the business of information management—making sense of their proliferating data