Chapter 2 - Data Models Flashcards
Data Modeling
The first step in designing a database. The process of creating a specific data model for a determined problem domain. Iterative, progressive process.
Problem Domain
Clearly defined area within the real-world environment, with well-defined scope and boundaries that will be systematically addressed.
Data Model
Relative simple representation of more complex real-world data structures. An abstraction.
What should an implementation-ready data model contain?
1) A description of the data structure that will store end-user data.
2) A set of enforceable rules to guarantee data integrity.
3) A data manipulation methodology to support the real-world data transformations.
Entity
A person, place, thing or event about which data will be collected and stored. Each entity occurrence is unique and distinct.
Attribute
A characteristic of an entity. Equivalent of fields in file systems.
Relationship
Describes an association among entities. Three types of relationships. One-to-many, many-to-many, and one-to-one. Relationships are bidirectional.
One-to-many (1:M or 1..*) relationship
Example: A painter creates many different paintings, but each painting has only one painter.
Many-to-Many (M:N or ..) relationship
Example: An employee may learn many job skills, and each job skill may be learned by many employees.
One-to-One (1:1 or 1..1) relationship
Example: Each store managed by single employee, and each employee manages only a single store.
Constraint
A restriction placed on the data. Ensure data integrity.
Example: Student’s GPA must be between 0.00 and 4.00.
Business rule
A brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organization. Used to define entities, attributes, relationships and constraints.
Hierarchical model
Developed in 1960s to manage large amounts of data for complex manufacturing projects. Basic logical structure represented by an upside down tree. Contains segments.
Segment
The equivalent of a file system’s record type. A higher layer is perceived as parent of segment directly beneath it, which is called the child.
Network model
Created to represent complex data relationships more effectively than Hierarchical, improve database performance, and impose database standard. Unlike Hierarchical, allows a record to have more than one parent.
Schema
Conceptual organization of the entire database as viewed by the database administrator.
Subschema
The portion of the database “seen” by the application programs that actually produce desired information from the data within the database.
Data Manipulation Language (DML)
Defines the environment in which data can be managed. Used to work with the data in the database.
Data Definition Language (DDL)
Enables the database administrator to define the schema components.
Relational Model
Introduced in 1970 by E.F. Codd of IBM. Based on mathematical set theory and represents data as independent relations.
Relation (AKA Table)
Matrix composed of intersecting rows and columns.
Tuple
Each row in a relation in a database.
Relational Database Management System (RDBMS)
Collection of programs that manages a relational database. Translates user’s logical requests (queries) into commands that physically locate and retrieve requested data.
Relational Diagram
A representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities.
Entity Relationship (ER) Model
A data model that describes relationships among entities at the conceptual level with help of ER diagrams. Developed by P. Chen in 1975.
3 Vs
Three basic characteristics of Big Data databases: Volume, Velocity, and Variety
American National Standards Institute (ANSI)
The group that accepted the DBTG (Database Task Group) recommendations and augmented database standards in 1975 through its SPARC committee.
Big Data
A movement to find new and better ways to manage large amounts of web-generated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost.
Chen notation
ER model notation by Peter Chen. Connectivites written next to each entity box. Relationships represented by a diamond connected to related entities, with relationship name in the diamond.
class
A collection of similar objects with shared structure (attributes) and behavior (methods). A class encapsulates an object’s data representation and a method’s implementation. Classes organized in a class hierarchy.
class diagram
Diagram used to represent data and their relationships in UML object notation.
class diagram notation
Set of symbols used in the creation of class diagrams.
class hierarchy
The organization of classes in a hierarchical tree in which each parent class is a Superclass and each child class is a Subclass.
client node
One of three types of nodes used in the HDFS. Client node acts as the interface between the user application and the HDFS.
conceptual model
The output of the conceptual design process. Provides a global view of an entire database and describes the main data objects, avoiding details.
conceptual schema
Representation of the conceptual model, usually expressed graphically.
connectivity
The classification of the relationship between entities, such as 1:1, 1:M and M:N
Crow’s Foot notation
Representation of the entity relationship diagram that uses a three-pronged symbol to represent the “many” sides of the relationship.
Data Node
One of three types of HDFS nodes. The data node stores fixed-size data blocks, that could be replicated to other data nodes.
entity instance
in ER modeling, a specific table row. Also known as entity occurrence.
entity relationship diagram (ERD)
A diagram that depicts an entity relationship model’s entities, attributes, and relations.
entity set
In a relational model, a grouping of related entities.
eventual consistency
A model for database consistency in which updates to the database will propagate through the system so that all data copies will be consistent eventually. A feature in some NoSQL databases.
extended relational data model (ERDM)
A model that includes the object-oriented model’s best features in an inherently simpler relational database structural environment.
Extensible Markup Language (XML)
A metalanguage used to represent and manipulate data elements. Permits manipulation of a document’s data elements. Facilitates exchange of structured documents such as orders and invoices over the internet.
external model
Application programmer’s view of the data environment. An external model works with a data subset of the global database schema.
external schema
Specific representation of an external view. The end user’s view of the data environment.
Hadoop
A java based, open source, high speed, fault-tolerant distributed storage and computational framework. Uses low-cost hardware to create clusters of thousands of computer nodes to store and process data.
Hadoop Distributed File System (HDFS)
A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds.
Hardware Independence
A condition in which a model does not depend on the hardware used in the model’s implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level.
Inheritance
In object-oriented data model, the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy.
Internal model
In database modeling, a level of data abstractions that adapts the conceptual model to a specific DBMS model for implementation. It is the representation of a database as “seen” by the DBMS. It requires a designer to match the conceptual model’s characteristics and constraints to those of the selected implementation model.
Internal Schema
A representation of an internal model using the database constructs supported by the chosen database.
key-value
A data model based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values. AKA the associate or attribute-value data model.
logical design
Stage in design phase that matches conceptual design to requirements of the selected DBMS. Software-dependent. Used to translate conceptual design into internal model for a selected DBMS, such as DB2, SQL Server, Oracle, IMS, Informix, Access or Ingress.
logical independence
A condition in which the internal model can be changed without affecting the conceptual model. The internal model is hardware-independent because unaffected by computer on which software is installed. A change in storage devices or OS will not affect internal model.
MapReduce
An open-source application programming interface (API) that provides fast data analytics services. One of the main Big Data technologies that allows organizations to process massive data stores.
method
In the object-oriented data model, a named set of instructions to perform an action. Represent real-world actions, and invoked through messages.
name node
One of three types of nodes used in HDFS. Stores all the metadata about the file system.
NoSQL
A new generation of database management systems that is not based on the traditional relational database model.
object
An abstract representation of a real-world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself.
object/relational database management system (O/R DBMS)
A DBMS based on the extended relational model (ERDM) .The ERDM is relational model’s response to the OODM. This model includes many of the object-oriented model’s best features within an inherently simpler relational database structure.
object-oriented data model (OODM)
A data model whose basic modeling structure is an object. Both data and their relationships are contained in an object.
physical independence
A condition in which the physical model can be changed without affecting the internal model.
physical model
A model in which physical characteristics such as location, path, and format are described for the data. Is both hardware- and software-dependent.
semantic data model
Data model that more closely represented the real world, modeling both data and their relationships in a single structure known as an object. Semantic indicates meaning. The SDM was published in 1981 by M. Hammer and D. McLeod.
software independence
A property of any model or application that does not depend on the software used to implement it.
sparse data
A case in which the number of table attributes is very large but the number of actual data instances is low.
Unified Modeling Language (UML)
A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system.