05 - DB Sys, DCntrs, Bus. Intel Flashcards
Database
An organized collection of data.
Database Management System
DBMS
A group of programs that manipulate the database and provide an interface between the database and the user of the database and other application programs.
Users -> Applications -> DBMS -> DB
The Digital Universe
- 8 Zettabytes
1. 8 trillion gigabytes
Character
A basic building block of most information, consisting of upper and lower case letters, numeric digits and special symbols.
Field
Typically a name, number, or combination of characters that describes an aspect of a business object or activity.
Record
A collection of data fields all related to one object, activity, or individual.
File
A collection of related records.
Databases are a collection of integrated files.
Hierarchy of Data
Bits Characters - Bytes Fields - keyed and computed types Records Files Database
Data Scientist
Help analyze what is stored in vast corporate databases.
Back End Interaction
Entering metadata.
For example - entering survey responses.
Entity
A general class of people, places or things for which data is collected, stored and maintained.
Employees
Inventory
Customers
Records contain the data items pertinent to an entity.
Attributes
A characteristic of an entity.
Employee number
Last name
Hire date
The records contain fields to hold the data points for the attributes.
Data Item
The specific value of an attribute. Found in the fields of the record describing an entity.
Data items are entered into fields.
Key
A field or set of fields in a record that is used to identify a record.
Primary Key
A field or set of fields that UNIQUELY identifies the record.
No two records can share a primary key.
Traditional Approach to Data Management
Where each distinct operational system uses data files dedicated to that system.
A spreadsheet for each data set.
Database Approach to data management
Where multiple information systems share a pool of related data.
Requires a DBMS so a record may only be manipulated by one application program at a time.
Data Modeling Considerations
Content - what data should be collected at what costs.
Access - what’s data should be provided to which users and when
Logical structure - how should data be arranged so that it makes sense to users
Physical organization - where is data physically located.
Data Center
and
Modular Data Center
and
green Data Centers
A climate-controlled building or set of buildings that houses database servers and the systems that deliver mission-critical information and servers.
Modular data centers like HP Ecopod are built inside shipping containers. 700,000sf modular data center in Northlake, IL - 16 football fields 220 shipping containers.
North Carolina - Apple Google Facebook
De-duplication
Eliminating undesired data redundancy.
Only about 1/3 of information is secure.
Data Model
A diagram of data entities and their relationships.
Enterprise data modeling is done at the level of the entire enterprise.
Entity-relationship (ER) diagrams are models that use basic graphical symbols to show the organization of and relationships between data.
Development of ER diagrams helps ensure that the logical structure of application programs is consistent with the data relationships in the database.
Database Models
Flat Files (spreadsheets)
Hierarchical
Network Models
Relational
Relational has become most popular and normally easier for managers to understand.
Relational Database Model logic
All data elements are placed in two-dimensional tables, or relations. As long as they share at least one common element, these relations can be linked to output useful information.
Relational Model
A database model that describes data in which all data elements are placed in two-dimensional tables called relations, which are the logical equivalent of files.
IBM DB2
Oracle - leader with over 1/2 of market
Sybase
MS SQL Server, MS Access and MySQL
Manipulating Data
Basic database manipulations include:
Selecting
Projecting
Joining
Domain
Allowable values for data attributes.
Selecting
Manipulating data to eliminate rows according to certain criteria.
Isolating a record in a table.
Projecting
Manipulating data to eliminate columns in a table.
Joining
Manipulating data to combine two or more tables.
Linking
The ability to combine two or more tables through common data attributes to form a new table with only the unique data attributes.
Linking ability is a primary advantage of the relational database model.
Data Cleanup
The process of looking for and fixing inconsistencies to ensure that data is accurate and complete.
Valuable data - accurate, complete, economical, flexible, reliable, relevant, simple, timely, verifiable, accessible and secure. Cleanup along with proper design helps develop data with these characteristics.
Database Normalization
MS OneNote
Store of random notes that are accessible from other applications like word processors and spreadsheets.
EverNote is a freeware alternative that can store photos, voice and handwritten notes.
Database Types
Flat File
Some spreadsheet and
word-processing apps firms unrelated
MS OneNote, EverNote
Single User
MS Access and FIleMaker Pro
Multiple User
Oracle, MS, Sybase and IBM
Some single user solution can be deployed for multi-user access over a network but usually have limitations.
Schema
A description of the entire database. It is accessed by the DBMS to find where to access required data in relation to the other data.
Used to define the tables and other database features associated with a group of users.
A description that involves “telling” the DBMSS the logical and physical structure of the data and the relationships among the data of each user.
Data Definition Language
A collection of instructions and commands used to define and describe data and relationships in a specific database.
Used to enter and tie schemas together.
Describes logical access paths
File, area, record and set description are terms the DDL defines and uses.
Data Dictionary
A detailed description of all the data used in the database.
Name, alias, value range, type (alpha or numeric), storage required, creator and user and access dates and information. Also who created, who is responsible and who can access the data. Also lists what reports can make use of the data.
Helps improve information reliability and reduces redundancy.
Logical Data Access Path
LAP
The path applications use to locate data when accessing a DBMS.
Physical Data Access Path
PAP
The path DBMS uses to locate data on a storage device.
Concurrency Control
A method of dealing with a situation in which two or more users or applications need to access the same record at the same time.
Query by Example (QBE)
A visual approach to developing database queries or requests. This feature provides a menu and graphical method to perform database manipulation and report setup.
Data Manipulation Language
DML
A specific language, provided with a DBMS, which allows users to access and modify the data, to make queries and to generate reports.
1970’s D.D. Chamberlain and others at IBM developed Structured Query Language (SQL). Adopted in 1986 by ANSI
Structured Query Language (SQL)
Industry leading data manipulating language.
Database Admin (DBA)
Position with typically a degree in computer science or IS and work Experience. Helps users decide optimal design and attributes for desired entities.
Data Administratoa
A nontechnical position responsible for defining and implementing consistent principals for a wide variety of data issues.
Sets standards for consistent nomenclature, attribute meaning and security. Usually a high level position.
DBA’s would report to the DA in larger firms. The DA would report to the CIO or CTO
Open Source DB
PostgreSQL MySQL CouchEB Couchbase Apache Hadoop - can manage unstructured and relational DB's
Database as a Service (DaaS)
Database 2.0
When the database and data are stored on equipment and managed off site.
Emerging solutions.
Administration is provided by service provider.
Database stored at providers site.
DB Virtualization
Uses virtual servers and operating systems to allow two or more database systems, including servers and DBMSs to act like a single, unified DB system.
Allows more efficient use of computing resources, reduces costs and provides better access to critical information.
Special-Purpose Databases
Offer ability to store forms of data such as music and images that do not fit well in conventional tables.
NoSQL Not Only SQL Examples: Hadoop Cassandra Hypertable
Front End Interaction
Data query activity based on key word search through the front end application.
DBMS can act as Front End or Back End applications.
Back End Application
Indirectly interacts with users. Frequently the database that feeds information to the front end application.
DBMS can act as Front End or Back End applications.
Applications
Database Applications manipulate the content of a database to produce useful information.
Common manipulations:
Searching, filtering, synthesizing and assimilating data.
Big Data
Large amounts of unstructured data (various types) that is difficult or impossible to capture, store and manipulate using traditional database management systems.
Hadoop - open source Oracle - Big Data Appliance SAS IBM InfoSphere BigInsights based on Hadoop IBM BigSheets
Semantic Web
Developing a seamless integration of a database with the Internet.
A Semantic Web captures metadata with all Web content using technology called the Resource Description Framework (RDF)
This has helped the entire Web develop into a giant database.
Heightened by increasing use smartphone and tablet computers to connect to DB’s
Data Warehouse
A large database that collects business information from many sources (relational databases, flat files, spreadsheets) it the enterprise, covering all aspects of the company’s processes, products and customers in support of management decision making.
Oracle’s warehouse management can accept RFID signals as data.
Data Mart
A subset of a data warehouse that is used by small to medium sized businesses and departments within large corporations to support decision making.
Data Mining
An information-analysis tool that involves the automated discovery of patterns and relationships in a data warehouse.
Methods and tools to support bottom-up, discovery driven analysis. Requires no assumptions but identifies facts and conclusions based on patterns discovered.
Predictive Analysis
aka - Business Analytics
A firm of data mining that combines historical data with assumptions about future conditions to predict outcomes of events , such as future product sales or the probability that a customer will default on a loan.
Used to upgrade occasional customers into frequent purchasers.
Used to predict future sales up to a year in the future.
Business Intelligence (BI)
The process of gathering enough of the right information in a timely manner and usable form and analyzing it so to have a positive impact on business strategy, tactics or operations.
Gathering via Data Mining
Competitive Intelligence
One aspect of business intelligence limited to information about competitors and the ways that knowledge affects strategy, tactics and operations.
All legal tactics to create whole picture from bits of freely available information.
Counterintelligence
The steps an organization takes to protect information sought by “hostile” intelligence gatherers.
Define and manage “Trade Secret” intelligence assets.
Online Analytical Processing (OLAP)
Software that allows users to explore data from a number of perspectives.
The tools that support top-down, query driven analysis. Requires repetitive testing of user-originated theories.
Data Loss Prevention (DLP)
Systems designed to lock down, identify, monitor and protect data within an organization.
Supports counterintelligence efforts.
A necessity in complying with government regulations that require companies to safeguard private customer data.
Distributed Database
A database and in which the data can be spread across several smaller databases connected through telecommunications devices.
Replicated Database
A DB that holds a duplicate set of frequently used data.
Writing changes in satellite DB’s back to a master DB through the act of Data Synchronization.
Object-Oriented DB
Method - a procedure to compute some example.
Message - a request to execute or run a method.
A DB that stores both data and its processing instructions.
Object-Oriented DB Management System (OODBMS): A group of programs that manipulate on object-oriented DB and provide a user interface and connections to other application programs.
Object-Related DB Management System (ORDBMS): a DBMS capable of manipulating audio, video and graphical data. Allows 3rd parties to add data.
Spatial Data Technology
Using a DB to store and access data according to the locations it describes and permit spatial queries and analysis.
Tools that database designers use to show the logical relationships among data:
Data Models
Enterprise Data Modeling
Entity-Relationship (ER) Diagrams
Relational Models