Chapter 6: Foundations of Business Intelligence Flashcards
database management systems, tools for improvement, managing firm's data resources
What are the problems of managing data resources in a traditional file environment?
- hard to keep track
- difficult to organize
- different functional areas develop files independently
- data redundancy & inconsistency
- program-data dependence
- inflexibility
- poor security
- lack of data sharing & availability
What is a database management system (DBMS)?
Software that permits centralization of data (management) so that businesses have a single consistent source for all their data needs
–> minimizing redundant & inconsistent files
What are the capabilities of DMBS?
capability in terms of specifying the structure & content of the database
- data dictionary capability: automated or manual file that stores information about data in this database
- data manipulation language: specialized language for accessing & manipulating data in the database (e.g. SQL)
What is a relational database?
- organization of data in two-dimensional tables called relations with rows & columns
- each table contains data about an entity & its attributes
- each row = a record
- each column = an attribute/ field
- each table with a key field to uniquely identify each record for retrieval or manipulation
What are the characteristics of a relational database?
- flexible & easily accessible
- vertically scalable, not horizontally
- can be combined easily to deliver specific data
- normalization: process of creating small, stable, flexible & adaptive data structures fom complex groups of data when designing relational DB
- a well-designed relational DB will not have many-to-many relationships, all attributes of one entity apply only to it, enforcing referential integrity rules
- entity-related diagrams (ERD) graphically depict the relationship between entities (tables) in a relational DB
What are nonrelational databases?
Non-relational databases are often used when large quantities of complex and diverse data need to be organized (exceeding relational DB capabilities)
What are distributed databases?
- database that runs and stores data across multiple computers
- typically, distributed databases operate on two or more interconnected servers on a computer network (incl. cloud computing services)
What are the requirements for designing a database?
- logical design: models DB from business perspective, reflects key BPs & DM requirements
- physical design: shows how DB is arranged on direct-access storage devices
Where is big data stored in vast quantities?
- data warehouses & data mart
- Hadoop (Google)
- in-memory comuting
- analytical platforms
What tools & tech are there to access information from databases?
- Online Analytical Procesing (OLAP): relationships among data visualized as data cubes & cubes within data cubes
- data mining: finding patterns & rules in large pools of data to predict future behaviour
- text mining tools
- web mining tools (website structure & content, user activity)
What is data governance?
Organizational policies & procedures for maintenance, distribution & use of information in the organization
–> crucial to ensure high data quality, which prevents operational & financial problems caused by flawless software etc.
What steps can be taken to ensure high data quality?
- enterprise-wide data standards
- DBs designed to minimize inconsistent & redundant data
- data quality audits
- data cleansing software