Slide 2 Flashcards
Database Fundamentals
Database Fundamentals & Advantages
Relational Databases
Database Management Systems (DBMS)
Practice designing databases
Understanding Data as the Backbone of Information
Data Quality
Granularity, redundancy
Data integrity constraints
Access controls
The data hierarchy
Database - file- record-field-byte-bit
Database
Collection of data organized to serve many applications by centralizing data and controlling redundant data
Database management system (DBMS) - software through which users and application programs interact with a database (e.g. Microsoft Access)
Software that manages one or more databases Interfaces between application programs and physical data files Separates logical and physical views of data ( How a data looks to the user (logical) How data is actually stored (physical) )
Enables organization to centralize data management and security
Files are maintained separately by different departments (or apps) - if every app manages its own data
Data redundancy: Presence of duplicate data in multiple files
Data inconsistency: Same attribute has different values in different applications
Lack of flexibility
Poor security
Lack of data sharing and availability
Database examples:
Web applications like Wordpress use a database (typically MySQL)
Search engines use a (massive) database to store pages and indexes
iPhone Apps can use the SQLite database to store and manage their data
Enterprise Applications used by accounting, marketing, and operations use databases to centralize and share information
Database fundamentals
Database – maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)
Relational database (dominant model) – stores information in the form of logically related twodimensional tables
Table
Record(row)
Field(columns)
Table – a collection of similar records
Record (row) – a person, place, thing, transaction, or event about which information is stored
Fields (columns) – characteristics or properties of an record The columns in each table contain the fields
Database Structure
- Represent data as two-dimensional tables called relations or files
- Each table contains data on entity and attributes
Primary key
Field in table used for key fields
Foreign key
Primary key used in second table as look-up field to identify records from original table
• (When a field is used in one table to create a relationship, but it is NOT a primary key, it is then called a foreign key)
Row (tuples)
records for different entities
fields (columns)
Represents attribute for entity
The big three in database
table, queries, reports
Parts of table
Column aka Field, Attribute, or Property Row aka Record or Tuple
Primary Key Foreign Keys
Relationships
Use Foreign Key to link
One-to-One, One-to-Many (Many-to-Many)
Database advantages from a business perspective includ
Increased flexibility
Increased scalability (Reduce and Increase)and performance
Reduced information redundancy (Repeating information)
Increased information integrity (quality)
Increased information security
Information granularity尺度
refers to the extent of detail within the information
Factors affecting granularity
Level of abstraction
(fine or detailed, coarse or summarized)
Frequency of gathering data items
Characteristics of high quality Information:
Accuracy Completeness Consistency Uniqueness Timeliness
The four primary sources of low quality information include
- Online customers intentionally enter inaccurate information to protect their privacy 2. Information from different systems have different entry standards and formats 3. Call center operators enter abbreviated or erroneous information by accident or to save time 4. Third party and external information contains inconsistencies, inaccuracies, and errors
Potential business effects resulting from low quality information include:
Inability to accurately track customers
Difficulty identifying valuable customers
Inability to identify selling opportunities
Marketing to nonexistent customers
Difficulty tracking revenue due to inaccurate invoices
Inability to build strong customer relationships
A well-designed database should: (increased flexibility)
Handle changes quickly and easily
Provide users with different views
Have only one physical view (Physical view – deals with the physical storage of information on a storage device)
Have multiple logical views (Logical view – focuses on how users logically access information)
Databases reduce information redundancy
Inconsistency is one of the primary problems with redundant information
Databases offer several security features including
Password – provides authentication of the user
Access level – determines who has access to the different types of information
Access control – determines types of user access, such as read-only acces
Data quality audit:
Structured survey of the accuracy and level of completeness of the data in an information system
Data cleansing :
Software to detect and correct data that are incorrect, incomplete, improperly formatted, or redundant
Enforces consistency among different sets of data from separate information system
Ensuring data quality:
More than 25% of critical data in Fortune 1000 company databases are inaccurate or incomplete
Most data quality problems stem from faulty input
Before new database in place, need to:
Identify and correct faulty data
Establish better routines for editing data once database in operation
How do organizations use data? Data management is always a work-in-progress!
As senses and alerts to identify threats and opportunities - Relevant and timely
As evidence and input for decisions and analysis - Standardized and legitimate As common language for collaboration and communication - Consistent across different users and departments
As organization of work and responsibilities - Clear roles and authorizations for access and manipulation
Integrity constrain
rules embedded in the database management system that help ensure the quality of information
Relational integrity constraint – rule that enforces basic and fundamental information-based constraints
Business - critical integrity constraint – rule that enforce business rules vital to an organization’s success and often require more insight and knowledge than relational integrity constraints
Increased scalability and performance
A database must scale to meet increased demand, while maintaining acceptable performance levels
Scalability – refers to how well a system can adapt to increased demands
Performance – measures how quickly a system performs a certain process or transaction
Data standardization/centralization allows for consistency, but it is not a trivial task
Different systems may use different data standards (e.g. SCM partners) Data from third parties may not be standardized or aligned with an organization’s data standards (e.g. sources on the Internet) Online customers may feed the system with incorrect data due to privacy concerns Employees may seek shortcuts in data entry, such as abbreviations Different parties may wish to defend their own standard
Most DBMS enforce (or at least give the option to enforce) rules to achieve consistency
Validation rules for data entry
Business rules for entity relationships
Validation rules
work to standardize data structure (formatting, character limitations, mandatory data input, etc.)
eg. “Phone number of a customer must include area code”
Business rules
enforce logical relationships across entities.
eg.“Each part can be provided by only one supplier”