Database and Information Management Flashcards

Question

Data dictionary (as a tool for organizing, managing, and accessing data in the database)

Answer 1

where the information about the database is documented automated or manual file that stores definitions of data elements and their characteristics

Answer 2

used to add, change, delete, and retrieve data in the database extract data from the database to satisfy information requests

Answer 3

most prominent data manipulation is Structured Query Language (SQL) Query: a request for data from a database

Answer 4

Conceptual/logical design: an abstract model of the database from a business perspective Physical design: shows how the database is arranged on direct-access storage devices

Answer 5

Structuring a database to reduce redundancy and inconsistency by splitting it up into different logical groups which can be connected with primary / foreign keys to avoid duplicates and data inconsistency The process of creating small, stable, yet flexible and adaptive data structures from complex groups of data See model We split up the different logical groups (of the ORDER before normalization) into separate tables, e.g. Parts, suppliers, orders and line items: for each order, we might have multiple parts so we can store each order number with multiple parts in the line items --> using foreign keys

Answer 6

To ensure relationships between coupled tables remain consistent, relational database systems try to enforce referential integrity rules For example: When you have a table with a foreign key that points to another table, you may not add a record to the table with the foreign key unless there is a corresponding record in the linked table

Answer 7

How database designers document their data model See model: provide SUPPLIER || ----- |< PART || ---- |< LINE ITEM >| ---- || ORDER the relationship between the entities SUPPLIER, PART, LINE_ITEM, and ORDER. The boxes represent entities. The lines connecting the boxes represent relationships. A line connecting two entities that ends in two short marks designates a one-to-one relationship. A line connecting two entities that ends with a crow’s foot topped by a short mark indicates a one-to-many relationship One ORDER can contain many LINE_ITEMs. (A PART can be ordered many times and appear many times as a line item in a single order.) Each PART can have only one SUPPLIER, but many PARTs can be provided by the same SUPPLIER.

Answer 8

How database designers document their data model See model: provide is ordered belongs to SUPPLIER || ----- |< PART || ---- |< LINE ITEM >| ---- || ORDER is supplied by contains include the relationship between the entities SUPPLIER, PART, LINE_ITEM, and ORDER. The boxes represent entities. The lines connecting the boxes represent relationships. A line connecting two entities that ends in two short marks designates a one-to-one relationship. A line connecting two entities that ends with a crow’s foot topped by a short mark indicates a one-to-many relationship One ORDER can contain many LINE_ITEMs. (A PART can be ordered many times and appear many times as a line item in a single order.) Each PART can have only one SUPPLIER, but many PARTs can be provided by the same SUPPLIER.

Answer 9

Uses a more flexible data model and are designed for large data sets across many distributed machines and for easily scaling up or down Useful for accelerating simple queries against large volumes of structured and unstructured data, including web, social media, graphics and other forms of data that are difficult to analyze with traditional SQL tools

Answer 10

Lower cost than in-house database products Cloud computing vendors who provide relational database engines, such as Amazon Relational Database Service, which offers MySQL, Oracle Database or Amazon Aurora Database Engines

Answer 11

One that is stored in multiple physical locations – some parts or copies are physically stored in one location, whereas other parts or copies are maintained in other locations

Answer 12

Distributed database technology that enables firms and organizations to verify transactions on a network nearly instantaneously without a central authority High encryption The blockchain maintains a continuously growing list of records called blocks. Each block contains a timestamp and a link to a previous block, and once a block of data is recorded on the blockchain ledger, it cannot be altered retroactively. When someone wants to add a transaction, participants in the network (all of whom have copies of the existing blockchain) run algorithms to evaluate and verify the proposed transaction. Legitimate changes to the ledger are recorded across the blockchain in a matter of seconds or minutes and records are protected through cryptography

Answer 13

big data sets with volumes so huge they are beyond the ability of typical DBMS to capture, store and analyze Extreme VOLUME of data Wide VARIETY of data VELOCITY of which data must be processed No specific quantity to big data, but usually data in the petabyte and exabyte range (billions to trillions records, many from different sources)

Answer 14

Data Warehouse: database that stores current and historical data of potential interest to decision makers throughout the company Data Marts: a subset of a data warehouse in which a summarized or highly focused portion of the organization's data is placed in a separate database for a specific population of users --> like a decentralized warehouse Data Lake: handles unstructured (e.g. video data), semi structured (e.g. Twitter feeds) and structured data (e.g. transactional data) Hadoop: open source software framework (managed by the Apache Software Foundation), which enables distributed parallel processing of huge amounts of data across inexpensive computers In-Memory Computing: relies primarily on RAM memory (main memory) for data storage, where users’ access data is stored in system primary memory which shortens query response time Analytic Platforms: Commercial database vendors have developed specialized high-speed analytic platforms using both relational and nonrelational technology that are optimized for analyzing large data sets

Answer 15

database that stores current and historical data of potential interest to decision makers throughout the company consolidates and standardizes information for use across enterprise, but data cannot be altered

Answer 16

a subset of a data warehouse in which a summarized or highly focused portion of the organization's data is placed in a separate database for a specific population of users --> like a decentralized warehouse

Answer 17

Hadoop Distributed File System (HDFS): for data storage, it links together the file systems on the numerous nodes in a Hadoop cluster to turn them into one big file system MapReduce: for high performance parallel data processing, breaks down processing of huge data sets and assigns work to the various nodes in a cluster Hbase: Hadoop’s nonrelational database, provides rapid access to the data stored on HDFS and a transactional platform for running high-scale real time application

Answer 18

Support multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions Requires you to already know what you want from the data (which question you want answered)  users can answer ad hoc questions quickly: e.g. a product manager could use a multidimensional data analysis tool to learn how many washers were sold in the East in June, how that compares with the previous month and the previous June, and how it compares with the sales forecast See model

Answer 19

Data engineers build to make sure you can access specific information

Answer 20

Provides insight into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior

Answer 21

Occurrences linked to a single event E.g. study of supermarket purchasing patterns might reveal that, when corn chips are purchased, a cola drink is purchased 65 percent of the time, but when there is a promotion, cola is purchased 85 percent of the time

Answer 22

Events are linked over time if a house is purchased, a new refrigerator will be purchased within two weeks 65 percent of the time, and an oven will be bought within one month of the home purchase 45 percent of the time

Answer 23

recognize patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules businesses such as credit card or telephone companies worry about the loss of steady customers. Classification helps discover the characteristics of customers who are likely to leave and can provide a model to help managers predict who those customers are so that the managers can devise special campaigns to retain such customers

Answer 24

similar to classification, when no groups have yet been defined A data mining tool can discover different groupings within data, such as finding affinity groups for bank cards or partitioning a database into groups of customers based on demographics and types of personal investments.

Answer 25

Uses a series of existing values to forecast what other values will be forecasting might find patterns in data to help managers estimate the future value of continuous variables, such as sales figures

Answer 26

helps businesses analyze unstructured data by extracting key elements from unstructured natural language text, discovering patterns and relationships and summarizing the information

Answer 27

mine text comments in an email message, blog, social media conversation or survey forms to detect favorable and unfavorable opinions about specific subjects E.g. Use sentiment analysis to tune in to consumer conversations about its products across social networks, blogs and other websites

Answer 28

The discovery and analysis of useful patterns and information from the World Wide Web Content mining: process of extracting knowledge from the content of web pages - includes text, image, audio and video data Structure mining: examines data related to the structure of a particular website (E.g. links pointing to a document indicate the popularity of the document, while links coming out of a document indicate the richness or perhaps the variety of topics covered in the document) Usage mining: examines user interaction data recorded by a web server whenever requests for a website's resources are received

Answer 29

Users access an organization’s internal database through the web using their desktop PC browsers or mobile apps See model The web browser software is easier to use than proprietary query tools The web interface requires few or no changes to the internal database It costs much less to add a web interface in front of a legacy system than to redesign and rebuild the system to improve user access.

Answer 30

Deals with policies and processes for managing availability, usability, integrity, and security of data, especially regarding government regulations (Encompasses policies and procedures to manage data – the rules for sharing, disseminating, acquiring, standardizing, classifying and inventorying information)

Answer 31

Creating and maintaining databases

Answer 32

establishes policies and procedures to manage data

Answer 33

Rules, procedures, roles for sharing, managing, standardizing data Data administration: establishes policies and procedures to manage data Data Governance: Deals with policies and processes for managing availability, usability, integrity, and security of data, especially regarding government regulations Database administration: Creating and maintaining databases

Answer 34

A structured survey of the accuracy and level of completeness of the data in an information system

Answer 35

Activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant You can get data cleansing software for automatically surveying data files and correcting errors and integrating a consistent companywide format

Answer 36

Created to manipulate databases Standard way of interacting with RDBs (Relational Databases

Database and Information Management Flashcards

(60 cards)