Database and Information Management Flashcards

1
Q

Effective Information Systems

A

Provide users with information that is accurate, timely and relevant

Accurate: free of errors

Timely: available when decision makers need it

Relevant: useful and appropriate for the types of work and decisions that require it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Bit

A

the smallest unit of data a computer can handle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Byte

A

a group of bits. Represents a single character, which can be a letter, a number or another symbol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Field

A

a grouping of characters into a word, a group of words, a complete number (eg. Person’s name or age)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Record

A

a group of related fields, such as a student’s name, the course taken, the date and the grade.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

File

A

a group of records of the same type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Entity

A

a person, place, thing or event on which we store and maintain information. (a record describes an entity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Attribute

A

each characteristic or quality describing a particular entity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data Redundancy and Inconsistency (problems with traditional file processing)

A

Data Redundancy: duplicate data in multiple files

Data Inconsistency: the same attribute has different values because it’s only updated in some systems but not others – or the same attribute has different names (e.g. Student_ID vs just ID)

Difficult to implement CRM, SCM and ERP systems that integrate data from different sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Program-Data Dependence (problems with traditional file processing)

A

coupling of data stored in files and the specific programs required to update and maintain those files – changes to the programs requires changes to the data

e.g. a new software program, which requires changes in the data accessed by the program, e.g. Requires nine-digit ZIP codes instead of five-digit ZIP codes, then when you change the data, other programs which required the five-digit ZIP code will no longer work properly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Lack of Flexibility (problems with traditional file processing)

A

Traditional file systems can deliver scheduled routine reports after extensive programming efforts, but it cannot deliver ad hoc reports or respond to unanticipated information requirements in a timely fashion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Poor Security (problems with traditional file processing)

A

Little control and management of data, access to and dissemination of information - management might not know who has access to or who make changes to the organization’s data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lack of Data Sharing and Availability (problems with traditional file processing)

A

Information is in different files in different departments, so it is impossible to share and access the information in a timely manner –> and users can find different values for the same piece of information in two different systems, which create distrust to both systems (Data Redundancy and Inconsistency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Database

A

A collection of data organized to serve many applications efficiently by centralizing the data and controlling redundant data

E.g. a human resource database with multiple views (so one big file instead of having different files for personnel, payroll and benefits and instead extracting that information as multiple views)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Database Management Systems (DBMS)

A

A software that enables an organization to centralize data, manage them efficiently and provide access to the stored data by application programs

DBMS provides an interface between application programs and physical data files, i.e. when the application programs calls for a data item, the DBMS finds the item and presents it in the application program

Separates logical and physical view of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Logical view of data

A

Presents data as it would be perceived by end users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Physical view of data

A

How data is actually organized and structured on physical storage media

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How does DBMS solve problems of the traditional file environment?

A

Reduces data redundancy and inconsistency by minimizing isolated files where the same data is repeated (even if there’s redundancy, it can eliminate inconsistency by ensuring it has the same value)

Easier data sharing because the data is in one single location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Relational DBMS

A

Most popular type of DBMS

Presents data as relations (two-dimensional tables), where each table contains data on an entity and its attributes

Fields in a relational database is also called columns and rows are called records

Key field: the unique identifier for all the information in any row of the table
Primary key: unique key, cannot be duplicated
Foreign key: a lookup field to look up data, and it will be the primary key in another table

Three basic operations of a relational DBMS:
Select: creates subset with the records that meet state criteria
Join: combines relational tables
Project: creates subset consisting of columns in a table, allows the user to create new tables with only the info required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Key field

A

the unique identifier for all the information in any row of the table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Primary key

A

unique key, cannot be duplicated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Foreign key

A

a lookup field to look up data, and it will be the primary key in another table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Three basic operations of a Relational DBMS

A

Select: creates a subset consisting of all records in the file that meet stated criteria
owe want to select records (rows) from the PART table where the Part_Number equals 137 or 150.

Join: combines relational tables to provide the user with more information than is available in individual tables.
we want to join the now-shortened PART table (only parts 137 or 150 will be presented) and the SUPPLIER table into a single new table.

Project: creates a subset consisting of columns in a table, permitting the user to create new tables that contain only the information required
we want to extract from the new table only the following columns: Part_Number, Part_Name, Supplier_Number, and Supplier_Name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Data definition (as a tool for organizing, managing, and accessing data in the database)

A

capability to specify the structure of the content of the database

used to create database tables to define the characteristics of the fields in each table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Data dictionary (as a tool for organizing, managing, and accessing data in the database)

A

where the information about the database is documented

automated or manual file that stores definitions of data elements and their characteristics

26
Q

Data manipulation language (as a tool for organizing, managing, and accessing data in the database)

A

used to add, change, delete, and retrieve data in the database

extract data from the database to satisfy information requests

27
Q

Querying and reporting (as a tool for organizing, managing, and accessing data in the database)

A

most prominent data manipulation is Structured Query Language (SQL)

Query: a request for data from a database

28
Q

Conceptual design and physical design (database design)

A

Conceptual/logical design: an abstract model of the database from a business perspective

Physical design: shows how the database is arranged on direct-access storage devices

29
Q

Database Normalization (database design)

A

Structuring a database to reduce redundancy and inconsistency by splitting it up into different logical groups which can be connected with primary / foreign keys to avoid duplicates and data inconsistency

The process of creating small, stable, yet flexible and adaptive data structures from complex groups of data

See model

We split up the different logical groups (of the ORDER before normalization) into separate tables, e.g. Parts, suppliers, orders and line items: for each order, we might have multiple parts so we can store each order number with multiple parts in the line items –> using foreign keys

30
Q

Referential integrity rules

A

To ensure relationships between coupled tables remain consistent, relational database systems try to enforce referential integrity rules

For example: When you have a table with a foreign key that points to another table, you may not add a record to the table with the foreign key unless there is a corresponding record in the linked table

31
Q

Entity-relationship diagram (database design)

A

How database designers document their data model

See model:
provide
SUPPLIER || —– |< PART || —- |< LINE ITEM >| —- || ORDER

the relationship between the entities SUPPLIER, PART, LINE_ITEM, and ORDER. The boxes represent entities. The lines connecting the boxes represent relationships. A line connecting two entities that ends in two short marks designates a one-to-one relationship. A line connecting two entities that ends with a crow’s foot topped by a short mark indicates a one-to-many relationship

One ORDER can contain many LINE_ITEMs. (A PART can be ordered many times and appear many times as a line item in a single order.) Each PART can have only one SUPPLIER, but many PARTs can be provided by the same SUPPLIER.

32
Q

Entity-relationship diagram (database design)

A

How database designers document their data model

See model:
provide is ordered belongs to
SUPPLIER || —– |< PART || —- |< LINE ITEM >| —- || ORDER
is supplied by contains include

the relationship between the entities SUPPLIER, PART, LINE_ITEM, and ORDER. The boxes represent entities. The lines connecting the boxes represent relationships. A line connecting two entities that ends in two short marks designates a one-to-one relationship. A line connecting two entities that ends with a crow’s foot topped by a short mark indicates a one-to-many relationship

One ORDER can contain many LINE_ITEMs. (A PART can be ordered many times and appear many times as a line item in a single order.) Each PART can have only one SUPPLIER, but many PARTs can be provided by the same SUPPLIER.

33
Q

Nonrelational Database Management System

A

Uses a more flexible data model and are designed for large data sets across many distributed machines and for easily scaling up or down

Useful for accelerating simple queries against large volumes of structured and unstructured data, including web, social media, graphics and other forms of data that are difficult to analyze with traditional SQL tools

34
Q

Cloud Databases

A

Lower cost than in-house database products

Cloud computing vendors who provide relational database engines, such as Amazon Relational Database Service, which offers MySQL, Oracle Database or Amazon Aurora Database Engines

35
Q

Distributed Databases

A

One that is stored in multiple physical locations – some parts or copies are physically stored in one location, whereas other parts or copies are maintained in other locations

36
Q

Blockchain

A

Distributed database technology that enables firms and organizations to verify transactions on a network nearly instantaneously without a central authority

High encryption

The blockchain maintains a continuously growing list of records called blocks. Each block contains a timestamp and a link to a previous block, and once a block of data is recorded on the blockchain ledger, it cannot be altered retroactively. When someone wants to add a transaction, participants in the network (all of whom have copies of the existing blockchain) run algorithms to evaluate and verify the proposed transaction. Legitimate changes to the ledger are recorded across the blockchain in a matter of seconds or minutes and records are protected through cryptography

37
Q

Big Data

A

big data sets with volumes so huge they are beyond the ability of typical DBMS to capture, store and analyze

Extreme VOLUME of data
Wide VARIETY of data
VELOCITY of which data must be processed

No specific quantity to big data, but usually data in the petabyte and exabyte range (billions to trillions records, many from different sources)

38
Q

Business Intelligence Infrastructure

A

Data Warehouse: database that stores current and historical data of potential interest to decision makers throughout the company

Data Marts: a subset of a data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specific population of users –> like a decentralized warehouse

Data Lake: handles unstructured (e.g. video data), semi structured (e.g. Twitter feeds) and structured data (e.g. transactional data)
Hadoop: open source software framework (managed by the Apache Software Foundation), which enables distributed parallel processing of huge amounts of data across inexpensive computers

In-Memory Computing: relies primarily on RAM memory (main memory) for data storage, where users’ access data is stored in system primary memory which shortens query response time

Analytic Platforms: Commercial database vendors have developed specialized high-speed analytic platforms using both relational and nonrelational technology that are optimized for analyzing large data sets

39
Q

Data Warehouse

A

database that stores current and historical data of potential interest to decision makers throughout the company

consolidates and standardizes information for use across enterprise, but data cannot be altered

40
Q

Data Marts

A

a subset of a data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specific population of users –> like a decentralized warehouse

41
Q

Hadoop Key Services (3 services)

A

Hadoop Distributed File System (HDFS): for data storage, it links together the file systems on the numerous nodes in a Hadoop cluster to turn them into one big file system

MapReduce: for high performance parallel data processing, breaks down processing of huge data sets and assigns work to the various nodes in a cluster

Hbase: Hadoop’s nonrelational database, provides rapid access to the data stored on HDFS and a transactional platform for running high-scale real time application

42
Q

Online Analytical Processing (OLAP)

A

Support multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions

Requires you to already know what you want from the data (which question you want answered)  users can answer ad hoc questions quickly: e.g. a product manager could use a multidimensional data analysis tool to learn how many washers were sold in the East in June, how that compares with the previous month and the previous June, and how it compares with the sales forecast

See model

43
Q

Data Pipelines (analytical application)

A

Data engineers build to make sure you can access specific information

44
Q

Data mining

A

Provides insight into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior

45
Q

Associations (information available from data mining)

A

Occurrences linked to a single event

E.g. study of supermarket purchasing patterns might reveal that, when corn chips are purchased, a cola drink is purchased 65 percent of the time, but when there is a promotion, cola is purchased 85 percent of the time

46
Q

Sequences (information available from data mining)

A

Events are linked over time

if a house is purchased, a new refrigerator will be purchased within two weeks 65 percent of the time, and an oven will be bought within one month of the home purchase 45 percent of the time

47
Q

Classifications (information available from data mining)

A

recognize patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules

businesses such as credit card or telephone companies worry about the loss of steady customers. Classification helps discover the characteristics of customers who are likely to leave and can provide a model to help managers predict who those customers are so that the managers can devise special campaigns to retain such customers

48
Q

Clusters (information available from data mining)

A

similar to classification, when no groups have yet been defined

A data mining tool can discover different groupings within data, such as finding affinity groups for bank cards or partitioning a database into groups of customers based on demographics and types of personal investments.

49
Q

Forecasts (information available from data mining)

A

Uses a series of existing values to forecast what other values will be

forecasting might find patterns in data to help managers estimate the future value of continuous variables, such as sales figures

50
Q

Text mining tools

A

helps businesses analyze unstructured data by extracting key elements from unstructured natural language text, discovering patterns and relationships and summarizing the information

51
Q

Sentiment analysis software (web mining tool)

A

mine text comments in an email message, blog, social media conversation or survey forms to detect favorable and unfavorable opinions about specific subjects

E.g. Use sentiment analysis to tune in to consumer conversations about its products across social networks, blogs and other websites

52
Q

Web mining (+ three types: content-, structure-, and usage mining)

A

The discovery and analysis of useful patterns and information from the World Wide Web

Content mining: process of extracting knowledge from the content of web pages - includes text, image, audio and video data

Structure mining: examines data related to the structure of a particular website (E.g. links pointing to a document indicate the popularity of the document, while links coming out of a document indicate the richness or perhaps the variety of topics covered in the document)

Usage mining: examines user interaction data recorded by a web server whenever requests for a website’s resources are received

53
Q

Linking internal databases to the web and the advantages of using web access to internal databases

A

Users access an organization’s internal database through the web using their desktop PC browsers or mobile apps

See model

The web browser software is easier to use than proprietary query tools

The web interface requires few or no changes to the internal database
It costs much less to add a web interface in front of a legacy system than to redesign and rebuild the system to improve user access.

54
Q

Data Governance (Information Policy)

A

Deals with policies and processes for managing availability, usability, integrity, and security of data, especially regarding government regulations

(Encompasses policies and procedures to manage data – the rules for sharing, disseminating, acquiring, standardizing, classifying and inventorying information)

55
Q

Database administration (Information Policy)

A

Creating and maintaining databases

56
Q

Data administration (Information Policy)

A

establishes policies and procedures to manage data

57
Q

Information Policy

A

Rules, procedures, roles for sharing, managing, standardizing data

Data administration: establishes policies and procedures to manage data

Data Governance: Deals with policies and processes for managing availability, usability, integrity, and security of data, especially regarding government regulations

Database administration: Creating and maintaining databases

58
Q

Data quality audit

A

A structured survey of the accuracy and level of completeness of the data in an information system

59
Q

Data cleaning / data scrubbing

A

Activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant

You can get data cleansing software for automatically surveying data files and correcting errors and integrating a consistent companywide format

60
Q

Structured Query Language (SQL)

A

Created to manipulate databases

Standard way of interacting with RDBs (Relational Databases