Design and Development Flashcards
How do you implement one-to-one relationships while designing tables?
One-to-one relationships can be implemented as a single row. Rarely is it implemented in two tables. For each instance in the first entity there is one and only one in the second entity and vice versa.
Here are some examples of one-to-one relationships in the home:
One family lives in one house, and the house contains one family.
One person has one passport, and the passport can only be used by one person.
One person has one ID number, and the ID number is unique to one person.
How do you implement one-to-many relationships while designing tables?
For each instance in the first entity there can be one or more in the second entity. For each instance in the second entity there can be one and only one instance in the first entity.
In a one-to-many relationship, one record in a table can be associated with one or more records in another table. For example, each customer can have many sales orders.
How do you implement many-to-many relationships while designing tables?
For each instance in the first entity there can be one or more instances in the second entity and moreover, for each instance in the second entity there can be one or more instance in the first entity.
A many-to-many relationship occurs when multiple records in a table are associated with multiple records in another table. For example, a many-to-many relationship exists between customers and products: customers can purchase various products, and products can be purchased by many customers.
Explain Primary Key.
A primary key is a column in a table whose values uniquely identify the rows in the table. The primary key is chosen from this list of candidates based on its perceived value to the business as an identifier.
Enforces uniqueness of a column in a table.
Default clustered index.
Does not allow Nulls.
Only one primary key per table.
Explain Unique Key.
Enforces uniqueness of the column in a table.
Default non-clustered index.
Allows one Null value.
Can have more than one unique key per table.
Cannot have repeating values.
Explain Candidate Key.
Candidate Key - Key value which ‘can’ uniquely identify a row in a table. Any of the identified candidate keys can be used as the table’s primary key. One can describe a candidate key as a super key that contains only the minimum number of columns necessary to determine uniqueness.
A primary key is a column in a table whose values uniquely identify the rows in the table. The primary key is chosen from this list of candidates based on its perceived value to the business as an identifier.
Candidate keys that are not part of the primary key are called alternate keys.
Prime attributes are the attributes of the candidate key which defines the uniqueness (Eg: SSN number in an employee database)
A non-prime attribute is an attribute that is not part of a Candidate key.
Explain Alternate Key.
Alternate Key - If the table has more than one Candidate keys and when one becomes a primary key then the remaining become alternate keys.
Explain Composite Key.
Composite Key - More then one key used in combination uniquely identifies a row in a table.
How does database design contribute to the performance of a SQL based application?
Database design plays a major part.
- When building a new system or adding to an existing system it is crucial the design is correct.
- Ensure the correct data is selected and placed in the correct tables.
- Ensure that the right relationships exist between the tables.
- Ensure that data redundancy is eliminated is an ultimate goal.
- Planning a design is an iterative process and constantly reviewed as an application is developed.
- It is rare, although it should be the point that everyone tries to achieve, when the initial design and system goals are not altered, no matter how slightly. Therefore a designer has to be thorough and ensure the design of a database remains efficient.
What can a developer do during the logical and physical design of a database in order to ensure proper performance?
- A developer must investigate the volumes of data (Capacity Planning), what types of information will be stored and how the data will be accessed.
- While upgrading an existing system, analyze the present data and where existing data volumes occur, how it is accessed and where current response bottlenecks are occurring to help assess existing problem areas in the design.
- A new system requires a thorough investigation of what data will be captured and looking at volumes of data held in other formats will also aid design.
- Knowing your data is important as knowing the users of your data
- Constantly revisit the design, frequently review relationships, volumes of data and indexes to ensure the design remains optimal.
- Use SQL Server Profile for analysis.
What are the main steps in Database Modeling?
- Logical - Planning, Analysis and Design
2. Physical - Design, Implementation and Maintenance
Describe the difference between a Data Warehouse and a Data Lake.
Data lakes and data warehouses are both widely used for storing big data, but they are not interchangeable terms. A data lake is a vast pool of raw data, the purpose for which is not yet defined. A data warehouse is a repository for structured, filtered data that has already been processed for a specific purpose.
Data Lake Data Warehouse
Data Structure
Raw Processed
Purpose of Data
Not yet determined Currently in use
Users
Data scientists Business professionals
Accessibility
Highly accessible and
quick to update More complicated and costly to make changes
Describe the key benefits of Data Modeling.
Higher quality data.
The visual depiction of requirements and business rules allow developers to foresee what could
become large-scale data corruption before it happens. Plus, data models allow developers to define
rules that monitor data quality, which reduces the chance of errors.
Increased internal communication about data and data processes.
Creating data models is a forcing function for the business to define how data is generated and moved throughout applications.
Reduced development and maintenance costs.
Because data modeling surfaces errors and inconsistencies early on in the process, they are far easier and cheaper to correct.
Improved performance.
An organized database is a more efficiently operated one; data modeling prevents the schema from endless searching and returns results faster.
Describe the three main types of Data Models.
Conceptual data model
A conceptual data model is a visual representation of database concepts and the relationships between them. Typically, a conceptual data model won’t include details of the database itself but instead focuses on establishing entities, characteristics of an entity, and relationships between them. These data models are created for a business audience, especially key business stakeholders.
Logical data model
A logical data model is often the next step after conceptual data modeling. This data model further defines the structure of the data entities and sets the relationships between them. The attributes of each data entity are clearly defined. Usually, a logical data model is used for a specific project since the project would have certain requirements for the structure. The model can still be integrated into other logical models to provide a better understanding of the scope. For this level of data modeling, the normalization process is applied to 3NF, but no secondary or primary key is needed.
Physical data model
A physical data model is used for database-specific modeling. Just like with the logical model, a physical model is used for a specific project but can be integrated with other physical models for a comprehensive view. The model goes into more detail with column keys, restraints, and primary and foreign keys. The columns will include exact types and attributes in this model, and the data should be normalized as well. A physical model designs the internal schema.
What are Window functions ?
Window functions operate on a set of rows and return a single aggregated value for each row. The main advantage of using Window functions over regular aggregate functions is: Window functions do not cause rows to become grouped into a single output row, the rows retain their separate identities and an aggregated value will be added to each row.
We define the Window (set of rows on which functions operates) using an OVER() clause. We will discuss more about the OVER() clause in the article below.
Types of Window functions:
Aggregate Window Functions - SUM(), MAX(), MIN(), AVG(). COUNT()
Ranking Window Functions - RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE()
Value Window Functions - LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
eg… SELECT order_id,order_date,customer_name,city, order_amount,
ROW_NUMBER() OVER(ORDER BY order_id) [row_number]
FROM [dbo].[Orders]
What are the primary procedural languages for MS SQL Server, Oracle, MySQL and PostGreSQL?
TSQL - MS SQL Server
PLSQL - Oracle
MySQL - MySQL
PL/PGSQL - PostGreSQL or
PSQL - PostGreSQL Command Line ..also see
PL/Tcl - Tcl Procedural Language - PostGreSQL
PL/Perl - Perl Procedural Language - PostGreSQL
PL/Python - Python Procedural Language - PostGreSQL
What is a Data Lake?
A data lake is a scalable data storage repository that can quickly ingest large amounts of raw data and make it available on-demand. Users accessing the data lake can explore, transform, and analyze subsets of data as they need it to meet their specific requirements.
Data lakes offer several advantages. One of the biggest is their ability to support advanced analytics and other forms of data science.
Data lakes can also help break down silos by providing a centralized repository for storing and working with data from different systems
Data lakes can incorporate any type of data from any source. They’re also highly scalable systems, especially when implemented in the cloud. In addition, data cleansing and transformation are typically performed when the data is needed, rather than when the data is first ingested into the data lake.
One of the most important characteristics of a data lake is its ability to store all types of data from any source:
Structured data that is clearly defined and formatted, such as the data found in relational databases. Unstructured data adheres to no specific format, such as social media or data generated by IoT devices. Semistructured data falls somewhere between structured and unstructured data, such as CSV and JSON files.
Early data lakes were built on Hadoop technologies, particularly MapReduce and the Hadoop distributed file system (HDFS). Unfortunately, the first attempts at data lakes often turned into dumping grounds of undocumented and disorganized data, making it challenging to work with the data in any meaningful way. Since then, data lake technologies and methodologies have steadily improved, and they’re now better suited to handle the influx of growing data volumes.
An effective data lake can ingest any type of data in its native format from any source, whether databases, applications, file servers, data streams, or systems such as customer relationship management (CRM) or enterprise resource planning (ERP). A data lake also provides the mechanisms necessary to securely access data and carry out data-related operations without needing to move data to a separate storage system (although this is still an option if required).
What is the difference between a Temp Table, a CTE and a Table Variable?
Temp Tables are physically created in the tempdb database. These tables act as the normal table and also can have constraints, and an index like normal tables.
CTE is a named temporary result set which is used to manipulate the complex sub-queries data. This exists for the scope of a statement. This is created in memory rather than the Tempdb database. You cannot create an index on CTE.
Table Variable acts like a variable and exists for a particular batch of query execution. It gets dropped once it comes out of a batch. This is also created in the tempdb database but not the memory.