Design and Development Flashcards

1
Q

How do you implement one-to-one relationships while designing tables?

A

One-to-one relationships can be implemented as a single row. Rarely is it implemented in two tables. For each instance in the first entity there is one and only one in the second entity and vice versa.

Here are some examples of one-to-one relationships in the home:
One family lives in one house, and the house contains one family.
One person has one passport, and the passport can only be used by one person.
One person has one ID number, and the ID number is unique to one person.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you implement one-to-many relationships while designing tables?

A

For each instance in the first entity there can be one or more in the second entity. For each instance in the second entity there can be one and only one instance in the first entity.

In a one-to-many relationship, one record in a table can be associated with one or more records in another table. For example, each customer can have many sales orders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you implement many-to-many relationships while designing tables?

A

For each instance in the first entity there can be one or more instances in the second entity and moreover, for each instance in the second entity there can be one or more instance in the first entity.

A many-to-many relationship occurs when multiple records in a table are associated with multiple records in another table. For example, a many-to-many relationship exists between customers and products: customers can purchase various products, and products can be purchased by many customers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain Primary Key.

A

A primary key is a column in a table whose values uniquely identify the rows in the table. The primary key is chosen from this list of candidates based on its perceived value to the business as an identifier.

Enforces uniqueness of a column in a table.
Default clustered index.
Does not allow Nulls.
Only one primary key per table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain Unique Key.

A

Enforces uniqueness of the column in a table.
Default non-clustered index.
Allows one Null value.
Can have more than one unique key per table.
Cannot have repeating values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Candidate Key.

A

Candidate Key - Key value which ‘can’ uniquely identify a row in a table. Any of the identified candidate keys can be used as the table’s primary key. One can describe a candidate key as a super key that contains only the minimum number of columns necessary to determine uniqueness.

A primary key is a column in a table whose values uniquely identify the rows in the table. The primary key is chosen from this list of candidates based on its perceived value to the business as an identifier.

Candidate keys that are not part of the primary key are called alternate keys.

Prime attributes are the attributes of the candidate key which defines the uniqueness (Eg: SSN number in an employee database)

A non-prime attribute is an attribute that is not part of a Candidate key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain Alternate Key.

A

Alternate Key - If the table has more than one Candidate keys and when one becomes a primary key then the remaining become alternate keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain Composite Key.

A

Composite Key - More then one key used in combination uniquely identifies a row in a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does database design contribute to the performance of a SQL based application?

A

Database design plays a major part.

  • When building a new system or adding to an existing system it is crucial the design is correct.
  • Ensure the correct data is selected and placed in the correct tables.
  • Ensure that the right relationships exist between the tables.
  • Ensure that data redundancy is eliminated is an ultimate goal.
  • Planning a design is an iterative process and constantly reviewed as an application is developed.
  • It is rare, although it should be the point that everyone tries to achieve, when the initial design and system goals are not altered, no matter how slightly. Therefore a designer has to be thorough and ensure the design of a database remains efficient.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can a developer do during the logical and physical design of a database in order to ensure proper performance?

A
  • A developer must investigate the volumes of data (Capacity Planning), what types of information will be stored and how the data will be accessed.
  • While upgrading an existing system, analyze the present data and where existing data volumes occur, how it is accessed and where current response bottlenecks are occurring to help assess existing problem areas in the design.
  • A new system requires a thorough investigation of what data will be captured and looking at volumes of data held in other formats will also aid design.
  • Knowing your data is important as knowing the users of your data
  • Constantly revisit the design, frequently review relationships, volumes of data and indexes to ensure the design remains optimal.
  • Use SQL Server Profile for analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the main steps in Database Modeling?

A
  1. Logical - Planning, Analysis and Design

2. Physical - Design, Implementation and Maintenance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the difference between a Data Warehouse and a Data Lake.

A

Data lakes and data warehouses are both widely used for storing big data, but they are not interchangeable terms. A data lake is a vast pool of raw data, the purpose for which is not yet defined. A data warehouse is a repository for structured, filtered data that has already been processed for a specific purpose.

Data Lake Data Warehouse

Data Structure
Raw Processed

Purpose of Data
Not yet determined Currently in use

Users
Data scientists Business professionals

Accessibility
Highly accessible and
quick to update More complicated and costly to make changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the key benefits of Data Modeling.

A

Higher quality data.
The visual depiction of requirements and business rules allow developers to foresee what could
become large-scale data corruption before it happens. Plus, data models allow developers to define
rules that monitor data quality, which reduces the chance of errors.

Increased internal communication about data and data processes.
Creating data models is a forcing function for the business to define how data is generated and moved throughout applications.

Reduced development and maintenance costs.
Because data modeling surfaces errors and inconsistencies early on in the process, they are far easier and cheaper to correct.

Improved performance.
An organized database is a more efficiently operated one; data modeling prevents the schema from endless searching and returns results faster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the three main types of Data Models.

A

Conceptual data model
A conceptual data model is a visual representation of database concepts and the relationships between them. Typically, a conceptual data model won’t include details of the database itself but instead focuses on establishing entities, characteristics of an entity, and relationships between them. These data models are created for a business audience, especially key business stakeholders.

Logical data model
A logical data model is often the next step after conceptual data modeling. This data model further defines the structure of the data entities and sets the relationships between them. The attributes of each data entity are clearly defined. Usually, a logical data model is used for a specific project since the project would have certain requirements for the structure. The model can still be integrated into other logical models to provide a better understanding of the scope. For this level of data modeling, the normalization process is applied to 3NF, but no secondary or primary key is needed.

Physical data model
A physical data model is used for database-specific modeling. Just like with the logical model, a physical model is used for a specific project but can be integrated with other physical models for a comprehensive view. The model goes into more detail with column keys, restraints, and primary and foreign keys. The columns will include exact types and attributes in this model, and the data should be normalized as well. A physical model designs the internal schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are Window functions ?

A

Window functions operate on a set of rows and return a single aggregated value for each row. The main advantage of using Window functions over regular aggregate functions is: Window functions do not cause rows to become grouped into a single output row, the rows retain their separate identities and an aggregated value will be added to each row.

We define the Window (set of rows on which functions operates) using an OVER() clause. We will discuss more about the OVER() clause in the article below.

Types of Window functions:
Aggregate Window Functions - SUM(), MAX(), MIN(), AVG(). COUNT()
Ranking Window Functions - RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE()
Value Window Functions - LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()

eg… SELECT order_id,order_date,customer_name,city, order_amount,
ROW_NUMBER() OVER(ORDER BY order_id) [row_number]
FROM [dbo].[Orders]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the primary procedural languages for MS SQL Server, Oracle, MySQL and PostGreSQL?

A

TSQL - MS SQL Server
PLSQL - Oracle
MySQL - MySQL
PL/PGSQL - PostGreSQL or
PSQL - PostGreSQL Command Line ..also see
PL/Tcl - Tcl Procedural Language - PostGreSQL
PL/Perl - Perl Procedural Language - PostGreSQL
PL/Python - Python Procedural Language - PostGreSQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a Data Lake?

A

A data lake is a scalable data storage repository that can quickly ingest large amounts of raw data and make it available on-demand. Users accessing the data lake can explore, transform, and analyze subsets of data as they need it to meet their specific requirements.

Data lakes offer several advantages. One of the biggest is their ability to support advanced analytics and other forms of data science.

Data lakes can also help break down silos by providing a centralized repository for storing and working with data from different systems

Data lakes can incorporate any type of data from any source. They’re also highly scalable systems, especially when implemented in the cloud. In addition, data cleansing and transformation are typically performed when the data is needed, rather than when the data is first ingested into the data lake.

One of the most important characteristics of a data lake is its ability to store all types of data from any source:

Structured data that is clearly defined and formatted, 
such as the data found in relational databases.

Unstructured data adheres to no specific format, such as 
social media or data generated by IoT devices.

Semistructured data falls somewhere between 
structured and unstructured data, such as CSV and 
JSON files.

Early data lakes were built on Hadoop technologies, particularly MapReduce and the Hadoop distributed file system (HDFS). Unfortunately, the first attempts at data lakes often turned into dumping grounds of undocumented and disorganized data, making it challenging to work with the data in any meaningful way. Since then, data lake technologies and methodologies have steadily improved, and they’re now better suited to handle the influx of growing data volumes.

An effective data lake can ingest any type of data in its native format from any source, whether databases, applications, file servers, data streams, or systems such as customer relationship management (CRM) or enterprise resource planning (ERP). A data lake also provides the mechanisms necessary to securely access data and carry out data-related operations without needing to move data to a separate storage system (although this is still an option if required).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the difference between a Temp Table, a CTE and a Table Variable?

A

Temp Tables are physically created in the tempdb database. These tables act as the normal table and also can have constraints, and an index like normal tables.

CTE is a named temporary result set which is used to manipulate the complex sub-queries data. This exists for the scope of a statement. This is created in memory rather than the Tempdb database. You cannot create an index on CTE.

Table Variable acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of a batch. This is also created in the tempdb database but not the memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Referential Integrity?

A

Referential integrity refers to the consistency that must be maintained between Primary Key and Foreign Keys. ie. Every Foreign Key must have a corresponding Primary Key value.

20
Q

What is the difference between Scope_Identity(), Identity(), @@Identity, and Ident_Current()?

A
The @@identity function returns the last identity created in the same session.
The scope_identity() function returns the last identity created in the same session and the same scope.
The ident_current(name) returns the last identity created for a specific table or view in any session.
The identity() function is not used to get an identity, it's used to create an identity in a select...into query.
The session is the database connection. The scope is the current query or the current stored procedure.

A situation where the scope_identity() and the @@identity functions differ, is if you have a trigger on the table. If you have a query that inserts a record, causing the trigger to insert another record somewhere, the scope_identity() function will return the identity created by the query, while the @@identity function will return the identity created by the trigger.

So, normally you would use the scope_identity() function.

21
Q

Explain the difference between a Stored Procedure and a TSQL query.

A
  • Stored procedures are compiled and stored in a database.
  • A stored procedure is invoked as a function call instead of a SQL query.
  • Stored procedures can have parameters for both passing values into the procedure and returning values from the call.
  • Results can be returned as a result set, or as an OUT parameter cursor.
  • Only one stored procedure may be selected per query.
  • Only one dataset is allowed in a stored procedure.
  • In Oracle, only one cursor is allowed and it has to be the last parameter in the stored procedure.
  • In Oracle, only standard SQL types should be used in the parameters (user defined types are not allowed).
  • In MySQL, older drivers do not return procedure meta-data.
  • The procedure calls are quick and efficient as stored procedures are compiled ‘once’ and stored in executable form. Hence the response is quick.
  • Since the same piece of code is used again and again so, it results in higher productivity.

Advantages Disadvantages
It is faster. It is difficult to debug.
It is pre-compiled. Need expert developer, since difficult to write code.
It reduces network traffic. It is database dependent.
It is reusable. It is non-portable.
It’s security is high . It is expensive.

22
Q

Explain the difference between a Stored Procedure and a Function in SQL Server.

A

Stored Procedures are pre-compiled objects which are compiled for the first time and its compiled format is saved, which executes (compiled code) whenever it is called.

A function is compiled and executed every time whenever it is called. A function must return a value and cannot modify the data received as parameters

The function must return a value but in Stored Procedure it is optional. Even a procedure can return zero or n values.

Functions can have only input parameters for it whereas Procedures can have input or output parameters.

Functions can be called from Procedure whereas Procedures cannot be called from a Function.

The procedure allows SELECT as well as DML(INSERT/UPDATE/DELETE) statement in it whereas Function allows only SELECT statement in it.

Procedures cannot be utilized in a SELECT statement whereas Function can be embedded in a SELECT statement.

Stored Procedures cannot be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section whereas Function can be.

Functions that return tables can be treated as another rowset. This can be used in JOINs with other tables.

Inline Function can be though of as views that take parameters and can be used in JOINs and other Rowset operations.

An exception can be handled by try-catch block in a Procedure whereas try-catch block cannot be used in a Function.

We can use Transactions in Procedure whereas we can’t use Transactions in Function.

23
Q

What are some techniques to increase stored procedure performance?

A

Work with as short a transaction area as possible. Helps to decrease lock contention.

Recompile stored procedures after index alterations or additions. Reboot server to ensure proper indexes are recognized and selected.

Break down joins to help isolate performance problems.

Check a stored procedure’s performance by using SHOWPLAN commands.

Try to use EXISTS rather than JOINs. An exist will join on a table until one record is found, rather than joining on all the records.

Look at using subqueries when trying to find a handful of values in the subquery statement and there is no key on the column you are looking upon.

24
Q

What are the different types of Cursors?

A
  1. Static - Makes a temporary copy of the data and stores in tempdb and any modifications made to the base table does not reflect in data returned by fetches made by the cursor.
  2. Dynamic - Reflects all changes in the base table.
  3. Forward Only - Specify the cursor can only fetch sequentially from first to last.
  4. Keyset Driven - Keyset is the set of keys that uniquely identifies a row is built in tempdb.
25
Q

Explain the disadvantages of a Cursor.

A

Each time you fetch a row from the cursor it results in a network roundtrip, whereas, a normal SELECT query makes only one roundtrip no matter how large the result set is.

Cursors also require more resources and temporary storage (resulting in more IO operations)

There are restrictions on the SELECT statements that can be used with some types of cursors.

Most of the time set-based operations can be used instead of cursors for example multiple UPDATE statements can be used instead of a cursor which iterates the same changes.

When a unique key is available a WHILE loop can be used in lieu of a cursor.

Try to use TempTables to reduce the number of records fed into cursor processing.

26
Q

In general, what steps can you take to improve the performance of a poor performing query?

A

Audit the Execution Plan

No Indexes

No Table Scans

Check for Improper joins

Check for larger record sets than required

Table contention

Missing or out of date Statistics

Blocking

Locking

Excess recompiliations of stored procedures

27
Q

What data quality checks should be considered for all ETL pipelines?

A

Not Null Checks
Duplication Checks
Format / regex validation

28
Q

What data quality checks should be considered for only critical ETL pipelines with lots of downstreams?

A

weekly row count anomaly detection
seasonal row count anomaly detection
complex custom validation queries

29
Q

What is an ER Diagram?

A

Entity Relationship Diagram.

It is a pictoral representation of the entities (tables) in a database. From a top down perspective, the ER diagram displays the requirements and assumptions in a database system.

30
Q

What is a relationship?

A

The way in which two or more concepts/entities are connected, or the state of being connected.

31
Q

Explain Foreign Key.

A

The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables. Enforces referential integrity.

A FOREIGN KEY is a field (or collection of fields) in one table, that refers to the PRIMARY KEY in another table.

The table with the foreign key is called the child table, and the table with the primary key is called the referenced or parent table. The FOREIGN KEY constraint prevents invalid data from being inserted into the foreign key column, because it has to be one of the values contained in the parent table.

32
Q

Explain Cascading Referential Integrity.

A

By using cascading referential integrity constraints, you can define the actions that the Database Engine takes when a user tries to delete or update a key to which existing foreign keys point. The following cascading actions can be defined.

CASCADE, SET NULL, SET DEFAULT and NO ACTION

33
Q

What is a Sparse Column?

A

A Sparse Column is a column that is optimized for holding NULL values.

34
Q

What is faster? a table variable or a temporary table?

A

A table variable is faster in most cases since it is held in memory while a temporary table is stored on disk. However, when the table variable’s size exceeds memory size the two table types tend to perform similarly.

35
Q

How big are the data types TINYINT, SMALLINT, INT and BIGINT?

A

1 byte, 2 bytes, 4 bytes and 8 bytes.

36
Q

What are some examples of Control Flow statements in SQL?

A

WHILE, IF, CASE, FOR EACH, etc…

37
Q

What is the EXIST function used for?

A

It is used to determine whether a query returns one or more rows. If it does, the EXIST function returns TRUE, otherwise, the EXIST function returns FALSE.

38
Q

What are some of the advantages of XML data type in SQL Server?

A

XML stores unstructured data in a relational database model

XML itself validates the information inside the file with tags and its values using DTD (Document Type Definition) and schema

XML data type stores independent data structure, hence it can be integrated or used in other database sources easily

In a way, the XML data type reduces the back-end application burden as the XML can be easily used with a UI

The XML data type can be used with the input parameter in a function or stored procedure

39
Q

What are some of the disadvantages of XML data type in SQL Server?

A

The XML data type consumes more space when compared with the relational (row/column) format in SQL Server data pages because the XML format is redundant with the XML tag-value set

The query execution time and data manipulation process could take longer compared to a normalized structure

A large XML document will use more server resources like CPU, memory, IO and few others in SQL Server

XML query structure is complex

40
Q

What are some of the limitations of XML data type in SQL Server?

A

Primary, unique or foreign constraints cannot be applied on an XML-based column

The XML data type cannot be used in remote query execution; it needs to be cast or converted with VARCHAR in a query

Index performance cannot be achieved on the key of XML with a pre-defined XML path

To perform any action (SELECT, INSERT, UPDATE or DELETE) on the XML column of the table, a user needs to follow the XML query standard

41
Q

How is the FOR XML Clause used to Return Query Results as XML in SQL Server?

A

SQL Server lets you retrieve data as XML by supporting the FOR XML clause, which can be included as part of your query. You can use the FOR XML clause in the main (outer) query as well as in subqueries. The clause supports numerous options that let you define the format of the XML data.

When you include the FOR XML clause in your query, you must specify one of the four supported modes-RAW, AUTO, EXPLICIT, or PATH. The options available to each mode vary according to that mode; however, many of the options are shared among the modes.

42
Q

How is the FOR XML clause in RAW mode used in SQL Server?

A

The RAW mode generates a single XML element for each row in the result set returned by the query.

To use the FOR XML clause in RAW mode, you simply append the clause and RAW keyword to your SELECT statement, such as: FOR XML RAW;

With the addition of the FOR XML clause, the statement returns the data as the following XML:

43
Q

How is the FOR XML Clause in AUTO Mode used in SQL Server?

A

The AUTO mode in a FOR XML clause is slightly different from the RAW mode in the way that it generates the XML result set. The AUTO mode generates the XML by using heuristics based on how the SELECT statement is defined.

AUTO mode in FOR XML clause is used when the XML required contains multi level nesting. The AUTO Mode generates nesting - each element has a tag named after the columns in the Select list.

44
Q

How is the FOR XML clause in EXPLICIT Mode used in SQL Server?

A

The EXPLICIT mode provides very specific control over your XML, but this mode is much more complex to use than the RAW or AUTO modes.

The EXPLICIT mode transforms the rowset that results from the query execution into an XML document. In order for EXPLICIT mode to produce the XML document, the rowset must have a specific format. This requires that you write the SELECT query to produce the rowset, the universal table, with a specific format so the processing logic can then produce the XML you want.

45
Q

How is the FOR XML clause in PATH Mode used in SQL Server?

A

Adding FOR XML PATH to the end of a query allows you to output the results of the query as XML elements, with the element name contained in the PATH argument.

When you specify the PATH mode in the FOR XML clause, column names (or their aliases) are treated as XPath expressions that determine how the data values will be mapped to the XML result set. By default, XML elements are defined based on column names. You can modify the default behavior by using the at (@) symbol to define attributes or the forward slash (/) to define the hierarchy.

Also see the ‘STUFF’ function.