Ch 2 Flashcards
Data modeling
Data modeling, the first step in designing a database, refers to the process of creating a specific data model for a determined problem domain (a problem domain is a clearly defined are within the real-world environment, with a well-defined scope and boundaries that will be systematically addressed).
Data model
A data model is a relatively simple representation, usually graphical, or more complex real-world data structures. In general terms, a model is an abstraction of a more complex real-world object or event. A model’s main function is to help you understand the complexities of the real-world environment. Database designers use data models to communicate with end users and programmers.
Entity
An entity is a person, place, thing or event about which data will be collected and stored.
Attribute
An attribute is a characteristic of an entity. E.g. a CUSTOMER entity would be described by attributes such as customer last name, customer first name, customer phone number, etc.
Relationship
A relationship describes an association among entities. E.g. a relationship exists between customers and agents that can be described as follows: an agent can serve many customers, and each customer may be served by one agent.
One-to-many (1:M or 1..*) relationship
E.g. a painter creates many different paintings, but each is painted by only one painter.
Many-to-many (M:N or ..) relationship
E.g. an employee may learn many job skills, and each job skill may be learned by many employees.
One-to-one (1:1 or 1..1) relationship
E.g. a retail company’s management structure may require that each of its stores be managed by a single employee. In turn, each store manages, who is an employee, manages only a single store.
Constraint
A constraint is a restriction placed on data and are important because they help to ensure data integrity. Constraints are normally expressed in the form of rules: an employee’s salary must have values that are between 6,000 and 350,000.
Business rule
A business rule is a brief, precise and unambiguous description of a policy, procedure or principle within a specific organization.
Hierarchical model
The hierarchical model’s basic logical structure is represented by an upside-down tree. The hierarchical structure contains levels, or segments. A segment is the equivalent of a file system’s record type. The hierarchical model depicts a set of one-to-many relationships between a parent and its children segments.
Network model
In the network model, the user perceives the network database as a collection of records in 1:M relationships. However, unlike the hierarchical model, the network model allows a record to have more than one parent.
Schema
The schema is the conceptual organization of the entire database as viewed by the database administrator.
Subschema
The subschema defines the portion of the database “seen” by the application programs that actually produce the desired information from the data within the database.
Data manipulation language (DML)
A data manipulation language (DML) defines the environment in which data can be managed and is used to work with the data in the database.
Data definition language (DDL)
A schema data definition language (DDL) enables the database administrator to define the schema components.
Relational model
The relational model is the current database implementation standard. In the relational model, the end user perceives the data as being stored in tables. Tables are related to each other by means of common values in common attributes. The entity relationship (ER) model is a popular graphical tool for data modeling that complements the relational model. The ER model allows database designers to visually present different views of the data - as seen by database designers, programmers and end users- and to integrate the data into a common framework.
Relational database management system (RDBMS)
The relational data model is implemented through a very sophisticated relational database management system (RDBMS). The RDBMS performs the same basic functions provided by the hierarchical and network DBMSs in addition to other functions that make the relational data model easier to understand and implement. Arguably the most important advantage of the RDBMS is its ability to hide the complexities of the relational model from the user. The RDBMS manages all of the physical details, while the user sees the relational database as a collection of tables in which data are stored.
Relational diagram
A relational diagram is a representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities.
Entity relationship diagram (ERD)
ER models are normally represented in an entity relationship diagram (ERD), which uses graphical representations to model database components.
Object-oriented data model (OODM)
The object-oriented data model (OODM) uses objects as the basic modeling structure. Like the relational model’s entity, an object is described by its factual content. Unlike an entity, however, the object also includes information about relationships between the facts, as well as relationships with other objects, thus giving its data more meaning.
Semantic data model
The OODM is said to be a semantic data model because semantic indicates meaning.
Extended relational data model (ERDM)
The relation model has adopted many object-oriented extensions to become the extended relational data model (ERDM). Object/relational database management systems (O/R DBMS) were developed to implement the ERDM. At this point, the OODM is largely used in specialized engineering and scientific applications, while the ERDM is primarily geared to business applications.
Hadoop
Hadoop is a Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce.
Hadoop technologies provide a framework for Big Data analytics in which data (structured or non-structured) is distributed, replicated, and processed in parallel using a network of low-cost commodity hardware.