Ch 2 Flashcards by Yvonne Rogell

Data modeling

Data modeling, the first step in designing a database, refers to the process of creating a specific data model for a determined problem domain (a problem domain is a clearly defined are within the real-world environment, with a well-defined scope and boundaries that will be systematically addressed).

How well did you know this?

Not at all

Perfectly

Data model

A data model is a relatively simple representation, usually graphical, or more complex real-world data structures. In general terms, a model is an abstraction of a more complex real-world object or event. A model’s main function is to help you understand the complexities of the real-world environment. Database designers use data models to communicate with end users and programmers.

How well did you know this?

Not at all

Perfectly

Entity

An entity is a person, place, thing or event about which data will be collected and stored.

How well did you know this?

Not at all

Perfectly

Attribute

An attribute is a characteristic of an entity. E.g. a CUSTOMER entity would be described by attributes such as customer last name, customer first name, customer phone number, etc.

How well did you know this?

Not at all

Perfectly

Relationship

A relationship describes an association among entities. E.g. a relationship exists between customers and agents that can be described as follows: an agent can serve many customers, and each customer may be served by one agent.

How well did you know this?

Not at all

Perfectly

One-to-many (1:M or 1..*) relationship

E.g. a painter creates many different paintings, but each is painted by only one painter.

How well did you know this?

Not at all

Perfectly

Many-to-many (M:N or ..) relationship

E.g. an employee may learn many job skills, and each job skill may be learned by many employees.

How well did you know this?

Not at all

Perfectly

One-to-one (1:1 or 1..1) relationship

E.g. a retail company’s management structure may require that each of its stores be managed by a single employee. In turn, each store manages, who is an employee, manages only a single store.

How well did you know this?

Not at all

Perfectly

Constraint

A constraint is a restriction placed on data and are important because they help to ensure data integrity. Constraints are normally expressed in the form of rules: an employee’s salary must have values that are between 6,000 and 350,000.

How well did you know this?

Not at all

Perfectly

Business rule

A business rule is a brief, precise and unambiguous description of a policy, procedure or principle within a specific organization.

How well did you know this?

Not at all

Perfectly

Hierarchical model

The hierarchical model’s basic logical structure is represented by an upside-down tree. The hierarchical structure contains levels, or segments. A segment is the equivalent of a file system’s record type. The hierarchical model depicts a set of one-to-many relationships between a parent and its children segments.

How well did you know this?

Not at all

Perfectly

Network model

In the network model, the user perceives the network database as a collection of records in 1:M relationships. However, unlike the hierarchical model, the network model allows a record to have more than one parent.

How well did you know this?

Not at all

Perfectly

Schema

The schema is the conceptual organization of the entire database as viewed by the database administrator.

How well did you know this?

Not at all

Perfectly

Subschema

The subschema defines the portion of the database “seen” by the application programs that actually produce the desired information from the data within the database.

How well did you know this?

Not at all

Perfectly

Data manipulation language (DML)

A data manipulation language (DML) defines the environment in which data can be managed and is used to work with the data in the database.

How well did you know this?

Not at all

Perfectly

Data definition language (DDL)

Study These Flashcards

A schema data definition language (DDL) enables the database administrator to define the schema components.

Relational model

Study These Flashcards

The relational model is the current database implementation standard. In the relational model, the end user perceives the data as being stored in tables. Tables are related to each other by means of common values in common attributes. The entity relationship (ER) model is a popular graphical tool for data modeling that complements the relational model. The ER model allows database designers to visually present different views of the data - as seen by database designers, programmers and end users- and to integrate the data into a common framework.

Relational database management system (RDBMS)

Study These Flashcards

The relational data model is implemented through a very sophisticated relational database management system (RDBMS). The RDBMS performs the same basic functions provided by the hierarchical and network DBMSs in addition to other functions that make the relational data model easier to understand and implement. Arguably the most important advantage of the RDBMS is its ability to hide the complexities of the relational model from the user. The RDBMS manages all of the physical details, while the user sees the relational database as a collection of tables in which data are stored.

Relational diagram

Study These Flashcards

A relational diagram is a representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities.

Entity relationship diagram (ERD)

Study These Flashcards

ER models are normally represented in an entity relationship diagram (ERD), which uses graphical representations to model database components.

Object-oriented data model (OODM)

Study These Flashcards

The object-oriented data model (OODM) uses objects as the basic modeling structure. Like the relational model’s entity, an object is described by its factual content. Unlike an entity, however, the object also includes information about relationships between the facts, as well as relationships with other objects, thus giving its data more meaning.

Semantic data model

Study These Flashcards

The OODM is said to be a semantic data model because semantic indicates meaning.

Extended relational data model (ERDM)

Study These Flashcards

The relation model has adopted many object-oriented extensions to become the extended relational data model (ERDM). Object/relational database management systems (O/R DBMS) were developed to implement the ERDM. At this point, the OODM is largely used in specialized engineering and scientific applications, while the ERDM is primarily geared to business applications.

Hadoop

Study These Flashcards

Hadoop is a Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce.

Hadoop technologies provide a framework for Big Data analytics in which data (structured or non-structured) is distributed, replicated, and processed in parallel using a network of low-cost commodity hardware.

Hadoop Distributed File System (HDFS)

Hadoop Distributed File System (HDFS) is a highly distributed fault-tolerant file storage system designed to manage large amounts of data at high speeds. in order to achieve high throughput, HDFS uses the write-once, read many model. This means that once the data is written, it cannot be modified. HDFS uses three types of nodes: a name node that stores all the metadata about the file system, a data node that stores the fixed-size data blocks (that could be replicated to other data nodes), and a client node that acts as the interface between the user application and the HDFS.

MapReduce

MapReduce is an open source application programming interface (API) that provides fast data analytics services. MapReduce distributes the processing of the data among thousands of nodes in parallel. MapReduce works with structured and non-structured data. The MapReduce framework provides two main functions: Map and Reduce. In general terms, the Map function takes a job and divides it into smaller units of work; the Reduce function collects all the output results generated from the nodes and integrates them into a single result set.

NoSQL

NoSQL is a large-scale distributed database system that stores structured and non-structured data in efficient ways. NoSQL refers to a new generation of databases that address the specific challenges of the Big Data and have the following general characteristics: they are not based on the relational model and SQL, hence the name NoSQL; they support distributed database architectures; they provide high scalability, high availability and fault tolerance; they support very large amounts of sparse data; they are geared toward performance rather than transaction consistency.

Big Data Technologies

Emerging Big Data Technologies such as Hadoop, MapReduce and NoSQL provide distributed, fault-tolerant and cost-efficient support for Big Data analytics. NoSQL databases are a new generation of databases that do not use the relational model and are geared to support the very specific needs of Big Data organizations. NoSQL databases offer distributed data stores that provide high scalability, availability, and fault tolerance by sacrificing data consistency and shifting the burden of maintaining relationships and data integrity to the program code.

Data-modeling requirements

Data-modeling requirements are a function of different data views (global vs. local) and the level of data abstraction. There first three levels of data abstraction are external, conceptual and internal. The fourth and lowest level of data abstraction, called the physical level, is concerned exclusively with physical storage methods.

Ch 2 Flashcards

(29 cards)