Quiz 1 Flashcards

1
Q

Types of Databases (5)

A

Single user:

It runs on a single computer or device and doesn’t allow concurrent access by multiple users or applications.
These databases are typically used for personal applications or small-scale scenarios where only one user needs to access the data simultaneously.

Multi-user:

It is designed to handle multiple requests and transactions simultaneously.
Multi-user databases are suitable for scenarios where data needs to be accessed and modified by multiple users concurrently, such as business applications and web services.

Centralized Database:

A database system where all data is stored in a single location or server.
Users and applications access the data from this central server.
Centralized databases are easier to manage and maintain but may have limitations in terms of scalability and fault tolerance.

Enterprise Centralized Database:

An enterprise centralized database is a centralized database system designed to serve the needs of an entire organization.
It stores all the organization’s data in a single central repository, making it accessible to various departments and functions.
Are common in large organizations and are used for critical business operations.

Enterprise Decentralized Database:

An enterprise decentralized database is a database system where data is distributed across multiple locations or servers.
Each location may have its own database instance or shard of the data.
Decentralized databases are often used to improve scalability, reduce latency, and enhance fault tolerance in large, geographically distributed organizations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Anomaly

A

A data anomaly refers to an abnormal or unexpected occurrence or inconsistency in a dataset that deviates from the normal patterns or rules.

Data anomalies can include errors, outliers, inconsistencies, or missing information in the data and can result from various factors such as data entry mistakes, system issues, or incomplete records.

Example: In a sales database, a product sold for a negative price is a data anomaly. This violates the expected rule that prices should be non-negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Metadata

A

Metadata is data about data. It provides information about the characteristics, structure, and context of the data, helping users understand and manage the data effectively.

Metadata can include descriptions, definitions, data types, relationships, and other attributes that help users and systems interpret and use the data correctly.

Example: In a document management system, metadata for a document may include information like its title, author, creation date, and file format. This metadata helps users locate and categorize documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Inconsistency

A

Data inconsistency refers to a condition in which data stored in a database or dataset does not match or align with the expected or intended values or rules.

It can occur when the same data is stored differently in different parts of a database or when data violates integrity constraints, leading to contradictions or errors in the dataset.

Example: In a customer database, if a customer’s address is recorded differently in different records (e.g., “123 Main St.” in one record and “123 Main Street” in another), it represents data inconsistency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Integrity:

A

Data integrity refers to the accuracy, consistency, and reliability of data in a database or dataset.
It ensures that data is complete, valid, and adheres to predefined rules, constraints, and relationships.

Data integrity is maintained through mechanisms such as data validation, constraints, and transaction management in a database system.

Example: In an employee database, a data integrity constraint might ensure that every employee record has a non-null employee ID and that the employee’s birthdate is within a reasonable range, preventing the entry of inconsistent or incomplete data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain why database design is important?

A

Data Organization:
Effective database design structures and organizes data in a way that reflects the relationships between different pieces of information. It ensures that data is stored logically and efficiently, making it easier to understand and use.

Data Integrity:
A well-designed database includes constraints and rules that enforce data integrity, preventing the entry of incorrect or inconsistent data. This ensures the accuracy and reliability of the information stored.

Efficient Data Retrieval:
Properly designed databases are optimized for data retrieval. By using indexes, appropriate data types, and normalization techniques, databases can quickly retrieve the required data without unnecessary processing, resulting in faster query performance.

Scalability:
Database design considerations also impact the scalability of a system. A good design can accommodate growing volumes of data and increased user loads, allowing for system expansion without significant redesign.

Data Security:
Database design can incorporate security measures to protect sensitive information. It enables role-based access control and encryption to safeguard data from unauthorized access and breaches.

Reduced Redundancy: A well-designed database minimizes data redundancy by normalizing the data. Reducing redundancy not only saves storage space but also reduces the risk of data inconsistencies and update anomalies.

Maintenance and Extensibility:
A good database design makes it easier to maintain and extend the database over time. Changes to the data structure or business requirements can be accommodated more smoothly, reducing downtime and disruption.

Data Consistency:
Database design ensures that data remains consistent across all related tables. Changes made to data in one place are automatically reflected in all other relevant areas, reducing the risk of discrepancies.

Data Analysis:
A well-structured database is essential for effective data analysis and reporting. It supports complex queries, data transformations, and reporting needs, allowing organizations to derive valuable insights from their data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Identify the difference between data and information?

A

Data and information are related concepts, but they have distinct differences:

Nature:
    Data: Data refers to raw facts, figures, symbols, or observations that lack context or meaning on their own. Data can be in the form of numbers, text, images, or other representations.
    Information: Information is data that has been processed, organized, or structured in a way that it becomes meaningful and useful. Information provides context and answers "what," "when," "where," "who," and "why."

Context:
    Data: Data is context-neutral and doesn't inherently convey meaning or significance. It requires interpretation to become useful.
    Information: Information is context-specific and has been processed or analyzed to provide meaningful insights or knowledge.

Example:
    Data: "12345" is data. It could represent anything, such as a postal code, a part number, or an employee ID, without additional context.
    Information: "Employee ID: 12345" is information. It provides context by specifying that it is an employee ID.

Usefulness:
    Data: Data is the input for generating information. It serves as the raw material from which information is derived.
    Information: Information is the output of processing or interpreting data. It is what people use to make decisions, gain insights, or perform actions.

Subjectivity:
    Data: Data is objective and factual. It doesn't carry opinions, judgments, or interpretations.
    Information: Information can be subjective to some extent because it involves the interpretation of data, which can vary depending on the context and the person interpreting it.

Volume:
    Data: Data can be vast in volume, including large datasets or unprocessed records.
    Information: Information is typically more condensed and focused, presenting relevant details in a concise manner.

Lifecycle:
    Data: Data is often the starting point in the information lifecycle. It may undergo various stages of processing and analysis to become information.
    Information: Information is a later stage in the lifecycle of data, representing the outcome of processing and analysis.

Example:  Consider a set of temperature readings:
    Data: A list of temperature measurements (e.g., 72°F, 68°F, 75°F) without any context or labels is data.
    Information: A summary report that includes these temperature readings along with context, such as "Daily Average Temperatures for September," is information.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some basic database functions that a spreadsheet can not perform?

A

Here are some basic database functions that a spreadsheet may struggle to perform efficiently:

Data Integrity and Validation:
    Databases enforce data integrity through constraints, ensuring that data conforms to predefined rules. Spreadsheets lack robust data validation and integrity mechanisms, making them susceptible to data entry errors and inconsistencies.

Concurrent Multi-User Access:
    Databases are designed to support multiple users accessing and modifying data simultaneously. Spreadsheets can be accessed by multiple users, but concurrent editing often leads to conflicts and data integrity issues.

Data Normalization:
    Databases use normalization techniques to minimize data redundancy and improve data integrity. Spreadsheets often contain duplicated data, making them less efficient in managing large datasets.

Advanced Query and Reporting:
    Databases offer powerful query languages (e.g., SQL) and reporting tools for complex data retrieval and analysis. Spreadsheets have limited query capabilities, making them less suitable for complex reporting tasks.

Scalability:
    Databases are designed to handle large datasets efficiently. Spreadsheets may become slow or unmanageable when dealing with extensive data, resulting in performance issues.

Data Security and Access Control:
    Databases provide robust security features, allowing administrators to control user access at a granular level. Spreadsheets often lack strong access control and are more susceptible to unauthorized access.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is data independence, and why it is lacking in file systems?

A

Data independence is a concept in database management systems (DBMS) that separates the physical aspects of data storage and access from the logical representation of data. It allows changes to be made to the underlying data storage structure without affecting the way data is accessed or represented at a higher level. There are two types of data independence: physical data independence and logical data independence.

Physical Data Independence:
    Physical data independence refers to the ability to modify the physical storage structure of data without affecting the application programs or queries that interact with the data.
    This means that you can change how data is stored, such as moving it to a different disk, reorganizing storage files, or changing indexing methods, without impacting the way users or applications retrieve and manipulate the data.
    Changes to the physical storage structure should not require modifications to the application's code or queries.

Logical Data Independence:
    Logical data independence refers to the ability to change the logical structure of the data (schema) without affecting the external schema or user applications.
    This allows you to modify the database schema, add or remove tables and columns, or redefine relationships between tables without impacting the programs or queries that use the data.
    Applications remain unaffected by changes in the data model.

Why Data Independence is Lacking in File Systems:

Data independence is lacking in traditional file systems for several reasons:

No Abstraction Layer:  File systems primarily deal with files and directories, and they don't provide an abstraction layer for data and its structure. As a result, the physical and logical aspects of data are tightly coupled in file systems.

Data Organization:  In file systems, the organization of data is closely tied to how data is stored on physical storage devices. Changing the physical layout of data often necessitates changes to how it is accessed.

Application Dependency:  File systems are typically used by applications that manage their data storage directly. These applications are tightly coupled with the file system and rely on specific file paths and formats.

Limited Data Schema:  File systems do not define a structured schema for data, as databases do. Consequently, changes to data organization can have a cascading effect on application code that depends on specific file structures.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Discuss the properties of a relation?

A

In the context of relational databases, a relation refers to a table that stores data. Relations are fundamental to the relational model and possess several important properties that help maintain data integrity and ensure efficient data management. Here are the key properties of a relation:

Uniqueness of Rows:
    In a relation, each row (or tuple) must be unique. This means that no two rows in the table can be identical. This property ensures that each row represents a distinct entity or record.

Atomic Values:
    Each cell (or attribute) in a relation must hold a single, indivisible value. This ensures that data is stored in its most granular form and eliminates the need for complex nested structures within a single cell.

Attribute Names:
    Each column in a relation has a unique name that identifies the attribute it represents. These attribute names serve as headers for the columns and are used to reference specific fields in queries and operations.

Ordering of Rows:
    The rows in a relation have no inherent order. This means that the order in which rows are stored or retrieved is not guaranteed. To impose a specific order, you must use the ORDER BY clause in SQL queries.

No Duplicate Rows:
    A relation cannot contain duplicate rows. Every row must be distinct, ensuring that there are no redundant records in the table.

No Duplicate Columns:
    Each attribute (column) in a relation must have a unique name, and duplicate columns are not allowed within the same relation.

Homogeneous Data Types:
    All values in a column must be of the same data type. This ensures data consistency and facilitates efficient storage and retrieval.

Null Values:
    Relations can contain null values, which represent missing or unknown data. Nulls are distinct from other values and are used to indicate the absence of a known value.

Fixed Number of Columns:
    A relation has a fixed number of columns, and each row must have a value for each column, whether that value is a concrete data value or a null.

Unique Primary Key:
    Every relation should have a primary key, which is one or more columns that uniquely identify each row. The primary key enforces the uniqueness constraint and ensures that each row is uniquely identifiable.

Foreign Keys:
    Relations can have foreign keys that establish relationships between different tables. Foreign keys maintain referential integrity by ensuring that values in one table correspond to values in another.

Constraints:
    Relations can have additional constraints beyond primary keys and foreign keys, such as check constraints and unique constraints, which enforce data integrity rules and business logic.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Key of a relation. Explain what is meant by a foreign key. How do foreign keys of relations relate to candidate keys? Also explain alternate key vs super key vs vs composite key.Give examples to illustrate your answer

A

Key of a Relation:

In the context of a relational database, a key is a set of one or more attributes (columns) that are used to uniquely identify rows (records) in a relation (table).
The primary purpose of a key is to ensure data integrity and enable relationships between tables.

Foreign Key:

A foreign key is an attribute or set of attributes in a relation that refers to the primary key (or a candidate key) in another relation. It establishes a relationship between two tables.
The foreign key enforces referential integrity by ensuring that values in the referencing table (the one with the foreign key) match values in the referenced table (the one with the primary key).
The foreign key constraint ensures that the values in the referencing table's foreign key column(s) exist as values in the referenced table's primary key or candidate key column(s).

Relationship between Foreign Keys and Candidate Keys:

A foreign key relates to a candidate key in the referenced table. While a primary key is a specific type of candidate key, there can be multiple candidate keys in a table.
The foreign key references one of the candidate keys in the referenced table to establish a relationship.
This relationship ensures that each value in the foreign key corresponds to a unique value in the referenced table, ensuring referential integrity.

Alternate Key vs. Super Key vs. Composite Key:

Super Key:
    A super key is any set of attributes (columns) in a relation that can uniquely identify a row.
    It may include extra attributes that are not strictly necessary for uniqueness.
    A super key can be a candidate key, but it may also contain additional attributes.

Example:
In a "Customers" table, {CustomerID} and {Email} are super keys because they can uniquely identify customers. However, {CustomerID, FirstName} is also a super key because it contains extra information.

Candidate Key:
    A candidate key is a minimal super key, meaning it is a set of attributes that uniquely identifies each row, and no subset of those attributes can also uniquely identify the row.

Example:
In the same "Customers" table, {CustomerID} and {Email} are candidate keys because they are minimal and uniquely identify customers.

Alternate Key:
    An alternate key is a candidate key that is not chosen as the primary key.
    In a relation, you can have multiple candidate keys, and one of them is selected as the primary key.

Example:
If {CustomerID} is chosen as the primary key in the "Customers" table, then {Email} becomes an alternate key because it's a candidate key but not the primary key.

Composite Key:
    A composite key is a key formed by combining two or more attributes to uniquely identify rows.
    It is a special case of a candidate key where multiple attributes are used together.

Example:
In an "Orders" table, {OrderID, ProductID} is a composite key because it combines both attributes to ensure uniqueness, meaning that the same product can be ordered multiple times in different orders, and the combination guarantees uniqueness.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define the two principal integrity rules for the relational model. Discuss why it is desirable to enforce these rules?

A

Entity Integrity Rule:
The Entity Integrity Rule states that in a relational table, each row (tuple) must have a unique identifier, which is referred to as the primary key.
This means that no two rows in the table can be identical in terms of their primary key values.
The primary key uniquely identifies each entity (record) within the table.

Why It's Desirable:
    Enforcing the Entity Integrity Rule ensures that each row in the table represents a distinct entity or record. This is essential for data accuracy and integrity, as it prevents duplicate or redundant records.
    It allows for reliable data retrieval and modification, as each record can be uniquely identified and accessed.

Referential Integrity Rule:
    The Referential Integrity Rule establishes and enforces relationships between tables. It specifies that foreign keys in a referencing table must have corresponding values in the referenced table's primary key (or a candidate key).
    In other words, it ensures that relationships between entities are maintained consistently.
    If a value exists in a foreign key column of one table, there must be a matching value in the primary key (or candidate key) column of another table.

Why It's Desirable:
    Enforcing the Referential Integrity Rule helps maintain data consistency and accuracy when dealing with related data in multiple tables. It prevents orphaned or invalid references.
    It ensures that data relationships are well-defined and that changes to related data are controlled and coordinated, preventing data anomalies.
    Referential integrity is particularly important in complex databases with multiple tables and relationships, as it helps avoid data inconsistencies and maintain the integrity of the entire database.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does it mean to say that a database displays both entity integrity and referential integrity?

A

a database that displays both entity integrity and referential integrity follows the core principles of the relational model, ensuring that data is accurately represented, relationships are well-defined, and data consistency is maintained throughout the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why are entity integrity and referential integrity important in a database?

A

Entity integrity and referential integrity are crucial principles in database management systems (DBMS) for several important reasons:

Entity Integrity (Primary Key):

Data Accuracy: Entity integrity ensures that each row in a table represents a unique and distinct entity or record. This uniqueness prevents duplicate or redundant data, ensuring data accuracy and eliminating data anomalies.

Data Identification: A primary key serves as a unique identifier for each row, making it easier to locate, access, and modify specific records within the table. It simplifies data retrieval and management.

Data Consistency: Without entity integrity, duplicate or inconsistent data can lead to confusion and errors. Entity integrity enforces consistency by requiring a primary key for each entity.

Data Quality: Maintaining entity integrity improves data quality and reliability, which is essential for data-driven decision-making and reporting.

Referential Integrity (Foreign Key):

Data Relationships: Referential integrity establishes and enforces relationships between tables. It ensures that related data in different tables remains consistent and accurate.

Data Consistency Across Tables: In a database with multiple tables, referential integrity ensures that data in related tables is synchronized. If a change occurs in one table, it can be coordinated and reflected in related tables, preventing data inconsistencies.

Orphaned Data Prevention: Referential integrity prevents orphaned records by ensuring that foreign keys in one table match primary keys in another. Orphaned records are records that no longer have a valid relationship with other data, which can lead to data anomalies.

Data Integrity: Enforcing referential integrity helps maintain overall data integrity in the database. It ensures that data relationships are well-defined, and data remains reliable and trustworthy.

Complex Data Models: In databases with complex data models, such as those used in enterprise systems, maintaining referential integrity becomes even more critical. It allows for the correct representation of complex data relationships and dependencies.

Data Integrity Constraints: Referential integrity is implemented using data integrity constraints, which are rules that govern how data can be inserted, updated, or deleted in related tables. These constraints prevent actions that would compromise data integrity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the requirements that two relations must satisfy to be considered union-compatible?

A

To be considered union-compatible, two relations (tables) in a relational database must satisfy specific requirements, as follows:

Arity Compatibility:
    The two relations must have the same number of attributes (columns). In other words, they must have the same arity.
    Each attribute in one relation should have a corresponding attribute in the other relation, meaning that they should have matching data types and meanings.

Attribute (Column) Type Compatibility:
    For each corresponding pair of attributes (columns) in the two relations, their data types must be compatible or convertible to a common data type.
    For example, if one relation has an attribute of type integer and the other has an attribute of type decimal, these can be considered compatible if you can convert one to the other (e.g., by casting).

Attribute (Column) Name Compatibility:
    While not always a strict requirement for union compatibility, it is typically good practice for corresponding attributes in the two relations to have the same or semantically equivalent names.
    Having similar attribute names makes it easier to understand the results when performing operations like union.

Attribute (Column) Position Compatibility:
    The order of attributes in both relations must be the same. In other words, the corresponding attributes in both relations should appear in the same position (column order).
    This requirement ensures that the values from corresponding attributes are aligned correctly when performing union operations.

Domain Compatibility:
    The domains of the corresponding attributes in both relations should be compatible. This means that the values in the attributes should belong to the same or compatible domains or value sets.
    For example, if one relation has an attribute for "CustomerID," it should contain values that are compatible with the values in the corresponding attribute in the other relation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A