Database Architecture Flashcards by Tony Lu

What is ETL?

ETL stands for Extract, Transform, and Load. In ETL, data flows from the data source to staging to the data destination. ETL can help with data privacy and compliance, cleansing sensitive data before loading into the data destination.

How well did you know this?

Not at all

Perfectly

What is ELT? How does it differ from ETL?

ELT stands for Extract, Load, and Transform. ELT lets the data destination do the transformation, eliminating the need for data staging. ELT is simpler than ETL and is for companies with minor data needs.

How well did you know this?

Not at all

Perfectly

What is a data warehouse? Where is the data from?

A data warehouse is a type of data management system that is designed to enable and support business intelligence (BI) activities, especially analytics. Data warehouses are solely intended to perform queries and analysis and often contain large amounts of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files and transaction applications.

How well did you know this?

Not at all

Perfectly

Name common types of data warehouse architecture (4 types).

The architecture of a data warehouse is determined by the organization’s specific needs. Common architectures include:

Simple: All data warehouses share a basic design in which metadata, summary data, and raw data are stored within the central repository of the warehouse.

Simple with a staging area: Operational data must be cleaned and processed before being put in the warehouse. Many data warehouses add a staging area for data before it enters the warehouse, to simplify data preparation

Hub and spoke: Adding data marts between the central repository and end users allows an organization to customise its data warehouse to serve various lines of business. When the data is ready for use, it is moved to the appropriate data mart

Sandboxes: Sandboxes are private, secure, safe areas that allow companies to quickly and informally explore new datasets or ways of analyzing data without having to conform to or comply with the formal rules and protocol of the data warehouse

How well did you know this?

Not at all

Perfectly

Name main types of data warehouse (3 types).

Enterprise Data Warehouse (EDW): Centralised warehouse providing decision support service across the enterprise. It offers a unified approach for organising and representing data

Operational Data Store (ODS): Only supports daily operations, so their view of historical data is very limited. Works well with current data as it is refreshed in real time

Data Mart: A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such as sales, finance, sales or finance

How well did you know this?

Not at all

Perfectly

What are the three levels of database architecture?

Physical/internal level: Lowest level in the three level architecture, describes how data is actually stored in the database, as well as compression and encryption techniques

Conceptual/logical level: Higher level than the physical level. It describes how the database appears to the users conceptually and the relationships between various data tables.

External/view level: Highest level and closest to the user. The external level only shows the relevant database content to the users in the form of views and hides the rest of the data

How well did you know this?

Not at all

Perfectly

What is staging?

Staging is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses. Data staging areas are often transient in nature, with their contents being erased prior to running an ETL process or immediately following successful completion of an ETL process.

How well did you know this?

Not at all

Perfectly

What are the differences between OLTP and OLAP?

OLTP stands for Online Transaction Process System. It is known for maintaining transactional level data of the organization and generally is highly normalised (3NF). Main operations are insert, update and delete.

OLAP stands for Online Analytical Process System. It is known for a lot of analysis and fulfils reporting purposes. It may not be normalised. Main operation is to extract multidimensional data for analysis.

How well did you know this?

Not at all

Perfectly

What are the benefits of a data warehouse (name 4)?

Subject-oriented: They can analyze data about a particular subject or functional area (such as sales)

Integrated: Data warehouses create consistency among different data types from disparate sources

Nonvolatile: Once data is in a data warehouse, it’s stable and doesn’t change

Time-variant: Data warehouse analysis looks at change over time

How well did you know this?

Not at all

Perfectly

What is a virtual data warehouse?

A virtual data warehouse is a set of separate databases, which can be queried together, so a user can effectively access all the data as if it was stored in one data warehouse.

How well did you know this?

Not at all

Perfectly

Explain the different data models that are available in detail (3 types).

Conceptual data model: As the name itself implies, this data model depicts the high-level design of the available physical data.

Logical data model: Within the logical model, the entity names, entity relationships, attributes, primary keys, and foreign keys will show up.

Physical data model: Based on this data model, the view will give out more information and showcases how the model is implemented in the database. All the primary keys, foreign keys, table names, and column names will show up.

How well did you know this?

Not at all

Perfectly

What are the differences between normalisation and denormalisation?

Normalisation and denormalisation are both methods used in databases.

Normalisation is the process of creating a set schema to store non-redundant and consistent data in order to reduce data redundancy and inconsistency. It optimises disk space and increase number of tables.

Denormalisation is the process of combining the data so that it can be queried speedily in order to achieve faster execution of queries through introducing redundancy. It use more disk space but decreases number of tables.

How well did you know this?

Not at all

Perfectly

What is the difference between star schema and snowflake schema, and what are the advantages and disadvantages of each?

The star schema and snowflake schema are two ways to structure a data warehouse.

The star schema has a centralised data repository, stored in a fact table. The schema splits the fact table into a series of denormalised dimension tables. where there is only one link between fact table and dimension. The star schema’s simpler design makes it much easier to write complex queries. However, it uses a lot of disk space and isn’t the best option for data integrity.

The snowflake schema is different because it normalises the data. The snowflake schema uses less disk space and better preserves data integrity. The main disadvantage is the complexity of queries required to access data—each query must dig deep to get to the relevant data because there are multiple joins.

How well did you know this?

Not at all

Perfectly

What are the rules of 1NF?

Each table cell should contain a single value

Each record needs to be unique

How well did you know this?

Not at all

Perfectly

What are the rules of 2NF?

Must be in 1NF

There are no partial dependencies i.e. a non-prime attribute is functionally dependent to a candidate key

How well did you know this?

Not at all

Perfectly

What are the rules of 3NF?

Study These Flashcards

Be in 2NF

Have no transitive functional dependencies i.e. X->Z where X->Y and Y->Z

What are the rules of BCNF (Boyce-Codd Normal Form)?

Study These Flashcards

Be in 3NF

For any dependency A->B, A should be a super key

What is a super key? How is it different from a candidate key?

Study These Flashcards

A super key is one or more attributes which can uniquely identify a row in a table.

A candidate key is a type of super key with no reduudant attributes.

What is a primary key? How is it different to a foreign key?

Study These Flashcards

A primary key is selected from a set of candidate keys to be the main identifying key for that table (minimal super key).

A foreign key is a key which provides a link between data in two tables. It is a column which references the primary key of another table.

What is a DBMS? What is it used for (4 general functions)?

Study These Flashcards

A Database Management System effectively helps manage data and derive relevant info. It is used to:

Allow definition, creation, querying, update and administration of databases

Define rules to validate data

Convert an existing database or archive a large and growing one

Run business applications which perform tasks of managing business processses, interacting with end-users and other applications

What are the five components of a database?

Study These Flashcards

Hardware, software, data, procedures, database access language

What is SQL?

Study These Flashcards

SQL stands for Structured Query Language. It is used for storing and managing data in relational database management system (RDMS). It allows a user to create, read, update and delete relational database and tables.

What is NoSQL?

Study These Flashcards

NoSQL databases (aka “not only SQL”) are non tabular, and store data differently than relational tables. NoSQL data models allow related data to be nested within a single data structure. An example is MongoDB, which stores data in documents similar to JSON objects.

What are the types of joins in SQL (name 4)?

Study These Flashcards

(INNER) JOIN: Returns records that have matching values in both tables

LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table

RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table

FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table

What are constraints in SQL? List 6.

Constraints in SQL are used to specify the limit on the data type of the table. It can be specified while creating or altering the table statement. The sample of constraints are: ``` NOT NULL CHECK DEFAULT UNIQUE PRIMARY KEY FOREIGN KEY ```

What is the difference between DROP and TRUNCATE commands?

DROP command removes a table and it cannot be rolled back from the database whereas TRUNCATE command removes all the rows from the table.

Are NULL values same as that of zero or a blank space?

A NULL value is not at all same as that of zero or a blank space. NULL value represents a value which is unavailable, unknown, assigned or not applicable whereas a zero is a number and blank space is a character.

What are prime and non-prime attributes in DBMS?

A prime attribute is an attribute that forms a candidate key. All the other attributes of the relation are non-prime.

What is a subquery in SQL?

A subquery is a query inside another query where a query is defined to retrieve data or information back from the database. Subqueries are always executed first and the result of the subquery is passed on to the main query.

What is a view?

A view is a virtual table which consists of a subset of data contained in a table. Since views are not present, it takes less space to store. View can have data of one or more tables combined.

Database Architecture Flashcards

(30 cards)