TSQL - MS SQL Server Flashcards

1
Q

What are the different authentication modes in SQL Server?

A

Windows authentication is the default, and is often referred to as integrated security because this SQL Server security model is tightly integrated with Windows. …
Mixed mode supports authentication both by Windows and by SQL Server.

(Judge)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Default Constraints?

A

A value used by a column if no value is supplied to that column while inserting data.

A Default value cannot be assigned for identity and timestamp values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are User Defined Data Types?

A

A custom data types created to fit the user’s needs. Allows you to extend the base SQL Server Data Types by providing a descriptive name and format to the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are Cursors? What is the syntax for a Cursor in SQL Server?

A

A programming structure which allows row-by-row processing of the result sets.

Explanation of Cursor Syntax in SQL Server
Based on the example above, cursors include these components:

DECLARE statements - Declare variables used in the code block
SET\SELECT statements - Initialize the variables to a specific value
DECLARE CURSOR statement - Populate the cursor with values that will be evaluated
NOTE - There are an equal number of variables in the DECLARE CURSOR FOR statement as there are in the
SELECT statement. This could be 1 or many variables and associated columns.
OPEN statement - Open the cursor to begin data processing
FETCH NEXT statements - Assign the specific values from the cursor to the variables to match the DECLARE CURSOR FOR and SELECT statement
NOTE - This logic is used for the initial population before the WHILE statement and then again during each loop in
the process as a portion of the WHILE statement
WHILE statement - Condition to begin and continue data processing
BEGIN…END statement - Start and end of the code block
NOTE - Based on the data processing, multiple BEGIN…END statements can be used
Data processing - In this example, this logic is to backup a database to a specific path and file name, but this
could be just about any DML or administrative logic
CLOSE statement - Releases the current data and associated locks, but permits the cursor to be re-opened
DEALLOCATE statement - Destroys the cursor

Cursor Example to backup databases:

– 1 - Declare Variables
DECLARE @name VARCHAR(50) – database name
DECLARE @path VARCHAR(256) – path for backup files
DECLARE @fileName VARCHAR(256) – filename for backup
DECLARE @fileDate VARCHAR(20) – used for file name

– Initialize Variables

SET @path = ‘C:\Backup'

SELECT @fileDate = CONVERT(VARCHAR(20),GETDATE(),112)

– 2 - Declare Cursor
DECLARE db_cursor CURSOR FOR

SELECT name
FROM MASTER.dbo.sysdatabases
WHERE name NOT IN (‘master’,’model’,’msdb’,’tempdb’)

– Open the Cursor
OPEN db_cursor

– 3 - Fetch the next record from the cursor
FETCH NEXT FROM db_cursor INTO @name

– Set the status for the cursor
WHILE @@FETCH_STATUS = 0

BEGIN
– 4 - Begin the custom business logic
SET @fileName = @path + @name + ‘_’ + @fileDate + ‘.BAK’
BACKUP DATABASE @name TO DISK = @fileName

– 5 - Fetch the next record from the cursor
FETCH NEXT FROM db_cursor INTO @name
END

– 6 - Close the cursor
CLOSE db_cursor

– 7 - Deallocate the cursor
DEALLOCATE db_cursor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Logical Order of Operations in SQL? (MS SQLServer)

A
  1. FROM, JOIN
  2. WHERE
  3. GROUP BY
  4. aggregate functions
  5. HAVING
  6. window functions
  7. SELECT
  8. DISTINCT
  9. UNION/INTERSECT/EXCEPT
  10. ORDER BY
  11. OFFSET
  12. LIMIT/FETCH/TOP

You can use window functions in SELECT and ORDER BY. However, you can’t put window functions anywhere in the FROM, WHERE, GROUP BY, or HAVING clauses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is s JOIN? What are the types of Joins?

A
  • Joins are used in queries to explain the relationship between two or more tables. Joins also allow you to select data from a table depending upon data from another table.
  • INNER JOIN -returns records that have matching values in both tables.
  • LEFT JOIN or LEFT OUTER JOIN -returns all record from the LEFT table, and the matched record from the RIGHT table.
  • RIGHT JOIN or RIGHT OUTER JOIN -returns all record from the RIGHT table, and the matched record from the LEFT table.
  • FULL JOIN or FULL OUTER JOIN -retuins all records when there is a match in either the left or the right table.
  • CROSS JOINs (the Cartesian Product)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What system function can be used to get the current User’s ID? (MS SQL Server)

A

USER_ID().

Other System Functions include 
USER_NAME()
SYSTEM_USER()
SESSION_USER,
CURRENT_USER,
USER,
SUSER_SUD(),
HOST_NAME()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Trigger?

A

Triggers are special kinds of Stored Procedures that get executed automatically when INSERT, UPDATE or DELETE operation takes place on a table. Multiple triggers may be created for each action.

Triggers are used to implement business rules, auditing.

Triggers can also be used to extend referential integrity checks, but whenever possible, use CONSTRAINTS for this purpose instead of triggers as constraints are much faster.

The VIRTUAL TABLES: INSERTED and DELETED form the basis of Trigger architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you invoke a trigger on demand?

A

Triggers can not be invoked on demand. They get triggered only when an associated INSERT, UPDATE, or

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a SELF JOIN?

A

A SELF JOIN is the same as any other Join except that the two instances of the same table are joined in the query.

eg.

SELECT t1.employeename as employee, COALESCE (t2.employeename, ‘No Manager’) as manager
FROM employee t1
LEFT OUTER JOIN employee t2 ON t1.mgrid = t2.emplid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What SQL query will return the name of the first day of the month?

A

SELECT DATENAME(dw, DATEADD(dd, - DATEPART(dd, GETDATE() ) +1, GETDATE() ))

DATENAME
DATEADD
DATEPART
GETDATE()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Write a SQL query to Select the 6th highest value (salary, etc…). This is a favorite interview question.

A

SELECT MIN(salary)
FROM EmployeeTable
WHERE salary IN (SELECT TOP 6 salary from EmployeeTable ORDER BY
salary DESC))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you remove row duplicates by using the Row_Number() Over() Partitioning clause?

A
with cte AS
(
SELECT Emp_Name, Company, Join_Date
,ROW_NUMBER() OVER (Partition By Emp_Name, Company, Join_Date Order By Emp_Name, Company, Join_Date ) RowNumber 
FROM Emp_Details
)
Select * FROM cte Where RowNumber > 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the difference between window functions RANK and DENSE RANK.

A

RANK skips the number of positions after records with the same rank number. The ranking RANK_DENSE returns position numbers from 1 to 6 because it doesn’t skip records with the same rank number:

Unlike DENSE_RANK, RANK skips positions after equal rankings. The number of positions skipped depends on how many rows had an identical ranking. For example, Mary and Lisa sold the same number of products and are both ranked as #2. With RANK, the next position is #4; with DENSE_RANK, the next position is #3.

Both RANK and RANK_DENSE work on partitions of data:

SELECT RANK() OVER(ORDER BY sold products DESC) AS r,
DENSE_RANK() OVER(ORDER BY sold products DESC) AS dr,
first_name,
last_name,
month,
sold products
FROM sales_assistant;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Write a query to enforce the use of a particular index.

A
SELECT lastname
FROM authors (index=aunmind)

– use (index=index_name) after the table name

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is ORDER BY and how is it different from a CLUSTERED INDEX?

A

The ORDER BY sorts query results by one or more columns up to 8,060 bytes. The Order By happens when we retrieve data from the database. Clustered indexes physically sort data while inserting/updating the table.

Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be stored in only one order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the difference between a UNION and a JOIN?

A

A JOIN selects columns from two or more tables. A UNION selects rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are different ways to delete duplicate rows from a SQL Table?

A
  1. SQL delete duplicate Rows using Group By and having clause
    In this method, we use the SQL GROUP BY clause to identify the duplicate rows. The Group By clause groups data as per the defined columns and we can use the COUNT function to check the occurrence of a row. For example:
SELECT [FirstName],
[LastName],
[Country],
COUNT(*) AS CNT
FROM [SampleDB].[dbo].[Employee]
GROUP BY [FirstName],
[LastName],
[Country]
HAVING COUNT(*) > 1;
  1. Delete duplicate Rows using Common Table Expressions (CTE) with a SQL ROW_NUMBER function.

WITH CTE([firstname], [lastname], [country], duplicatecount)
AS (SELECT [firstname], [lastname], [country],
ROW_NUMBER() OVER(PARTITION BY [firstname], [lastname], country]
ORDER BY id) AS DuplicateCount
FROM [SampleDB].[dbo].[employee] )
SELECT * FROM CTE;

  1. Delete duplicate Rows using the RANK function.
SELECT E.ID, E.firstname, E.lastname, E.country, T.rank
FROM [SampleDB].[dbo].[Employee] E
INNER JOIN
(
SELECT *,
RANK() OVER(PARTITION BY firstname, lastname, country
ORDER BY id) rank
FROM [SampleDB].[dbo].[Employee]
) T ON E.ID = t.ID;
  1. Use Sort Operator in an SSIS package for removing duplicating rows.
  2. SQL UNION function. To remove the duplicate rows, use the same table and UNION it with an empty result set returned by the same table as shown below

SELECT col1,col2
FROM DuplicateRcordTable

UNION

SELECT col1,col2
FROM DuplicateRcordTable
WHERE 1=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Provide a sample script to apply Table Permissions.

A

GRANT SELECT ON ‘project’ TO ‘Jim’

REVOKE SELECT ON ‘project’ TO ‘Jim’

DENY CREATE TABLE TO ‘David’

GRANT VIEW DEFINITION ON SCHEMA :: dbo TO ‘Alex’

GRANT SELECT ON ‘dbo’.’Customer’ TO ‘Alex’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Use the ANY predicate in a query.

A

SELECT * FROM emp_table WHERE date > ANY (SELECT date FROM works_on_table)

21
Q

Use the ALL predicate in a query.

A

SELECT * FROM emp_table WHERE date > ALL (SELECT date FROM works_on_table)

22
Q

What SQL Join is used to retrieve matching records from both tables?

A

INNER JOIN

23
Q

Write a TSQL query that uses the BETWEEN operator.

A

SELECT * FROM Employee_Details WHERE Age BETWEEN 18 AND 22;

SELECT * FROM Employee_Details WHERE Age NOT BETWEEN 30 AND 40;

24
Q

Write a TSQL query that uses the LIKE operator.

A

SELECT * FROM Employee_Details WHERE Name LIKE ‘%R G%’;

SELECT * FROM Employee_Details WHERE Name NOT LIKE ‘A%’;

25
Q

Explain the difference between the LEN() and DATALENGTH() functions in Transact-SQL.

A

LEN() Returns the number of characters of the specified string expression, excluding trailing blanks. DATALENGTH() Returns the number of bytes used to represent any expression.

26
Q

Explain the difference between the COUNT() and COUNT_BIG() functions in Transact-SQL.
Give some examples of the use of the COUNT() function.

A

This function returns the number of items found in a group.

These functions differ only in the data types of their return values. COUNT always returns an int data type value. COUNT_BIG always returns a bigint data type value.

COUNT(*) returns the number of items in a group. This includes NULL values and duplicates.

COUNT(ALL expression) evaluates expression for each row in a group, and returns the number of nonnull values.

COUNT(DISTINCT expression) evaluates expression for each row in a group, and returns the number of unique, nonnull values.

27
Q

How is the CHAR() function used in Transact-SQL?

A

The CHAR() function converts an INT ASCII code to a character value.

In other words, you pass in an integer, and the function interprets it as the code value for a string character and returns the corresponding string character.

28
Q

Explain TRUNCATE TABLE in SQL Server.

A

The TRUNCATE TABLE command deletes the data inside a table, but not the table itself.

Truncating a table is removing all the records in an entire table or a table partition. TRUNCATE table is functionally similar to DELETE table with no WHERE clause. However, TRUNCATE table is much faster than DELETE with respect to the time and the resource consumptions which we will look at in this article. TRUNCATE statement removes the data by de-allocating the data pages in the table data. This means that TRUNCATE is similar to drop and re-create the table. Also, it records only the page de-allocations in the transaction log, not the row-wise as in DELETE statement.

Truncate Table does not write to the Transaction Log.

Truncate Table does remove data pages.

Use DELETE FROM to delete both the data and the table structure.

29
Q

What is the difference between DDL and DML commands in SQL Server?

A

DDL is Data Definition Language which is used to define data structures. For example: create table, alter table are instructions in SQL. DML: DML is Data Manipulation Language which is used to manipulate data itself

(Judge)

30
Q

What are the different types of Joins in SQL Server?

A

There are four main types of JOINs in SQL: INNER JOIN, OUTER JOIN, CROSS JOIN, and SELF JOIN. However, remember that OUTER JOINS have two subtypes: LEFT OUTER JOIN and RIGHT OUTER JOIN.

(Judge)

31
Q

What are indexes in SQL Server?

A

Indexes are special data structures associated with tables or views that help speed up the query. SQL Server provides two types of indexes: clustered index and non-clustered index. In this section, you will learn everything you need to know about indexes to come up with a good index strategy and optimize your queries.

(Judge)

32
Q

What are different types of indexes in SQL Server?

A
There are various types of indexes in SQL server:
Clustered Index.
Non-Clustered Index.
Column Store Index.
Filtered Index.
Hash Index.
Unique Index.
33
Q

What is a Check constraint in SQL Server?

A

The CHECK constraint is used to limit the value range that can be placed in a column.

If you define a CHECK constraint on a column it will allow only certain values for this column.

If you define a CHECK constraint on a table it can limit the values in certain columns based on values in other columns in the row.

(Judge)

34
Q

What is SQL Injection, and what are ways to prevent it?

A

An SQL injection is a technique that attackers apply to insert SQL query into input fields to then be processed by the underlying SQL database. These weaknesses are then able to be abused when entry forms allow user-generated SQL statements to query the database directly

(Judge)

35
Q

What is parameter sniffing in SQL Server?

A

SQL Server uses a process called parameter sniffing when it executes stored procedures that have – you guessed it – parameters. When the procedure is compiled or recompiled, the value passed into the parameter is evaluated and used to create an execution plan. That value is then stored with the execution plan in the plan cache. On subsequent executions, that same value – and same plan – is used.

This is a normal, expected behavior in SQL Server. Because compiling queries is expensive, you want plans stored in the cache. You want SQL Server to re-use them as much as possible.

But what happens when the values in a table you’re querying aren’t evenly distributed? What if one value would return 10 rows and another value would return 10,000 rows, or 10 million rows? I call this the elephant and the mouse problem. You would handle one animal differently than the other; SQL Server might create different plans for the queries. But it doesn’t, because you’re using parameters. elephant

What will happen is that the first time the procedure is run and the plan is compiled, whatever value is passed in is stored with the plan. Every time it’s executed, until it’s recompiled, the same value and plan will be used – regardless of whether it is the fastest or best plan for that value. If this is happening to you, and causing performance problems, there are ways to deal with it. ..Brent Ozar.

(Judge)

36
Q

In SQL Server, explain the difference between an index seek vs an index scan.

A

Index Scan:
Since a scan touches every row in the table, whether or not it qualifies, the cost is proportional to the total number of rows in the table. Thus, a scan is an efficient strategy if the table is small or if most of the rows qualify for the predicate.

Index Seek:
Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table.

Index Scan is nothing but scanning on the data pages from the first page to the last page. If there is an index on a table, and if the query is touching a larger amount of data, which means the query is retrieving more than 50 percent or 90 percent of the data, and then the optimizer would just scan all the data pages to retrieve the data rows. If there is no index, then you might see a Table Scan (Index Scan) in the execution plan.

Index seeks are generally preferred for the highly selective queries. What that means is that the query is just requesting a fewer number of rows or just retrieving the other 10 (some documents says 15 percent) of the rows of the table.

In general query optimizer tries to use an Index Seek which means that the optimizer has found a useful index to retrieve recordset. But if it is not able to do so either because there is no index or no useful indexes on the table, then SQL Server has to scan all the records that satisfy the query condition.

37
Q

Ranking the Data | RANK()

Management likes to see rankings for absolutely anything: products sold, salaries, employees per department, money earned per any imaginable segment – they’ll always ask to see it ranked. Show an example of how to rank things in SQL, Use the sales table. It has the following columns:

product - The name of the product.
product_price - The price of the product.
items_sold - The number of items sold.

The idea is to calculate the revenue for every product and rank it using the RANK() function.

A

SELECT product,
product_price,
items_sold,
product_price * items_sold AS revenue,
RANK() OVER (ORDER BY product_price * items_sold DESC) AS revenue_rank
FROM sales;

38
Q

Calculating Delta Values | LAG()

The table consists of two rows:

month - The month of the year.
revenue - The revenue for that month.

Your task is to calculate the difference between each month’s revenue and the previous month (i.e. the monthly revenue delta).

A

SELECT month,
revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month) AS monthly_delta
FROM revenue;

39
Q

Calculating Running Totals (also known as cumulative sums)

Use a table named budget. It consists of these columns:

month - The month of the cash flow.
client - The client name.
cash_flow - The budgeted cash flow.

There are three clients. The budget contains monthly projections of the yearly cash flow that your company will collect from them. You need to calculate the cumulative cash flow for each client.

A

SELECT month,
client,
cash_flow,
SUM (cash_flow) OVER (PARTITION BY client ORDER BY month) AS running_total

FROM budget;

40
Q

Creating a Report Based on Multiple Conditions | The CASE Statement

The CASE statement goes through conditions and returns a value when the first condition is met (like an if-then-else statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause.

If there is no ELSE part and no conditions are true, it returns NULL.

A

SELECT OrderID, Quantity,
CASE
WHEN Quantity > 30 THEN ‘The quantity is greater than 30’
WHEN Quantity = 30 THEN ‘The quantity is 30’
ELSE ‘The quantity is under 30’
END AS QuantityText
FROM OrderDetails;

also, CASE can be used in the ORDER clause:
SELECT CustomerName, City, Country
FROM Customers
ORDER BY
(CASE
    WHEN City IS NULL THEN Country
    ELSE City
END);
41
Q

GROUP BY ROLLUP

Adding Subtotals to a Report

You have the table warehouse with the following columns:

warehouse - The name of the warehouse.
brand - The product’s brand.
product - The product’s name.
quantity - The quantity of this product in the warehouse.

There are two different brands with five products between them. And there are two warehouses.

Calculate the total product quantity for both brands in both warehouses. You also need the grand total of all the products in both warehouses. And finally, you need to do everything in one table with one query.

A

SELECT warehouse,
brand,
SUM (quantity) AS sum_product

FROM warehouse
GROUP BY ROLLUP (warehouse, brand);

Select the columns warehouse and brand from the table. Also sum the column quantity, which will be shown in the new table sum_product. Then ROLLUP comes in! It’s used to get totals for multiple data grouping levels. The GROUP BY ROLLUP (warehouse, brand) part will group the data by the warehouse and brand columns. After that, it will sum the data according to each grouping:

warehouse	brand	sum_product
Amsterdam	Brando	1105
Amsterdam	Ostap	62934
Amsterdam	NULL	64039
Berlin	        Brando	67356
Berlin	        Ostap	13451
Berlin	        NULL	80807
NULL	        NULL	144846

The table contains totals for the Brando and Ostap brands in the Amsterdam and Berlin warehouses and a grand total. The subtotal for both products in the Amsterdam warehouse is shown in the first row with the NULL brand value. It amounts to 64 039, the sum of the two previous rows.

Next, you can see the totals for both brands in the Berlin warehouse. After that, there’s another line with a NULL brand value; this is actually the Berlin subtotal amounting to 80 807. The last row shows the grand total of all products in all warehouses, which is 144 846.

42
Q

GROUP BY CUBE

GROUP BY CUBE is an extension of the GROUP BY clause similar to GROUP BY ROLLUP. In addition to producing all the rows of a GROUP BY ROLLUP, GROUP BY CUBE adds all the “cross-tabulations” rows. Sub-total rows are rows that further aggregate whose values are derived by computing the same aggregate functions that were used to produce the grouped rows.

A CUBE grouping is equivalent to a series of grouping sets and is essentially a shorter specification. The N elements of a CUBE specification correspond to 2^N GROUPING SETS.

Run a cube query that shows profit by city, state, and total across all states. The example below shows a query that has three “levels”:

Each city.
Each state.
All revenue combined.

A

This example uses ORDER BY state, city NULLS LAST to ensure that each state’s rollup comes immediately after all of the cities in that state, and that the final rollup appears at the end of the output.

select state, city, sum((s.retail_price - p.wholesale_price) * s.quantity) as profit 
 from products as p, sales as s
 where s.product_id = p.product_id
 group by cube (state, city)
 order by state, city nulls last
 ;
\+-------+---------+--------+
| STATE | CITY    | PROFIT |
|-------+---------+--------|
| CA      | SF          |     13 |
| CA      | SJ          |     26 |
| CA      | NULL     |     39 |
| FL       | Miami     |     48 |
| FL       | Orlando |     96 |
| FL       | NULL     |    144 |
| PR       | SJ          |    192 |
| PR       | NULL     |    192 |
| NULL  | Miami     |     48 |
| NULL  | Orlando |     96 |
| NULL  | SF          |     13 |
| NULL  | SJ          |    218 |
| NULL  | NULL     |    375 |
\+-------+---------+--------+
43
Q

What are GROUPING SETS in SQL Server?

A

GROUPING SET means you are asking SQL to group the result several times. You can use the GROUPING SETS syntax to specify precisely which aggregations to compute. Here’s an example.

SELECT t.[Group] AS region, t.name AS territory, sum(TotalDue) AS revenue,
datepart(yyyy, OrderDate) AS [year], datepart(mm, OrderDate) AS [month]
FROM Sales.SalesOrderHeader s
INNER JOIN Sales.SalesTerritory T ON s.TerritoryID = T.TerritoryID
GROUP BY t.[GROUP], GROUPING SETS(ROLLUP(t.name),
ROLLUP(datepart(yyyy, OrderDate), datepart(mm, OrderDate)))

Here, you are asking for the breakdown by territory group for every month of every year with month and year totals, followed by a summary total by territory name, but without a grand total. Unlike the ROLLUP, you get the same result whatever the order of the columns within each GROUPING SET and the order of the GROUPING SETS.

GROUPING SETs can give you precisely what CUBE and ROLLUP gives you and a lot more besides. As you can see with this last example, you can use standard ‘table d’hôte’ CUBE and ROLLUP mixed together with directly-expressed ‘à la carte ‘GROUPING SETs.

44
Q

What is the difference between ROLLUP and CUBE operator?

A

There is only one major difference between the functionality of the ROLLUP operator and the CUBE operator. ROLLUP operator generates aggregated results for the selected columns in a hierarchical way. On the other hand, CUBE generates a aggregated result that contains all the possible combinations for the selected columns.

To understand this, look at the result set for the ROLLUP operator where the sum of the salaries of the employees were grouped by department and gender:

Row Number Department Gender Salary_Sum
1 Finance Female 11800
2 Finance Male 5000
3 Finance All Genders 16800
4 HR Female 6000
5 HR Male 14200
6 HR All Genders 20200
7 IT Female 21200
8 IT All Genders 21200
9 Marketing Female 12200
10 Marketing Male 6500
11 Marketing All Genders 18700
12 Sales Male 18700
13 Sales All Genders 18700
14 All Departments All Genders 95600

Here data is aggregated in hierarchical manner. In rows 1, 2, 4, 5, 7, 9, 10 and 12, salaries are grouped by department and gender. In rows 3, 6, 8, 11 and 13, salaries are grouped by Department only.

Finally, in row 14 we have the grand total of the salaries of all of the employees of all genders from all departments. Here we have three combinations that are hierarchical in nature. They are as follows:

Department and Gender
Department
Grand Total
We do not have salary grouped by Gender only. This is because gender is lowest in hierarchy.

On the other hand, if you look at the aggregated result of the CUBE operator where the sum of the salaries of the employees were grouped by department and gender, we had all four possible combinations:

1- Department and Gender
2- Department only
3- Gender Only
4- Grand Total.
o

Note: It is important to mention here that the result of both the ROLLUP and the CUBE operators will be similar if your data is grouped by only one column.

Which One Should I Use?
ROLLUP and CUBE are performance tools. You should use ROLLUP if you want your data hierarchically and CUBE if you want all possible combinations.

For example, if you want to retrieve the total population of a country, state and city. ROLLUP would sum the population at three levels. First it would return the sum of population at Country-State-City Level. Then it would sum the population at Country-State level and finally it would sum the population at Country level. It would also provide a grand total level.

CUBE groups data in all possible combinations of columns so the population would be summed up in following levels:

Country-State-City
State-City
City
Country-State
State
Country-City
Country
All
It all depends what you need as to which you would choose. A simple rule of thumb is that if you have hierarchical data (for example, country->state->city or Department->Manager-Salesman, etc.), you usually want hierarchical results, and you use ROLLUP to group the data.

If you have non-hierarchical data (for example, City-Gender-Nationality), then you don’t want hierarchical results and so you use CUBE as it will provide all possible combinations.

45
Q

What are SQL window functions?

A

SQL window functions allow you to perform operations that are often required for creating reports, e.g. ranking data, calculating running totals and moving averages, finding the difference between rows, etc. Not only that, but you can also divide data into windows, which enables you to perform operations on data subsets rather than the data as a whole.

SQL Server Window Functions calculate an aggregate value based on a group of rows and return multiple rows for each group.

Ranking Functions
     row_number()
     rank()
     dense_rank()
Distribution Functions
     percent_rank()
     cume_dist()
Analytic Functions
     lead()
     lag()
     ntile()
     first_value()
     last_value()
    nth_value()
Aggregate Functions
    avg()
    count()
    max()
    min()
    sum()
46
Q

What are Common Table Expressions (CTEs)?

A

If you ever wanted query a query, that’s when CTEs come into play — CTEs essentially create a temporary table.

Using common table expressions (CTEs) is a great way to modularize and break down your code, the same way that you would break down an essay into several paragraphs.

with toronto_ppl as (
   SELECT DISTINCT name
   FROM population
   WHERE country = "Canada"
         AND city = "Toronto"
), avg_female_salary as (
   SELECT AVG(salary) as avgSalary
   FROM salaries
   WHERE gender = "Female")

SELECT name, salary
FROM People
WHERE name in (SELECT DISTINCT FROM toronto_ppl)
AND salary >= (SELECT avgSalary FROM avg_female_salary)

Now it’s clear that the WHERE clause is filtering for names in Toronto. If you noticed, CTEs are useful because you can break down your code into smaller chunks, but they’re also useful because it allows you to assign a variable name to each CTE (i.e. toronto_ppl and avg_female_salary)

47
Q

What are Recursive CTEs?

A

A Recursive CTE is a CTE that references itself, just like a recursive function in Python. Recursive CTEs are especially useful which it comes to querying hierarchical data like organization charts, file systems, a graph of links between webpages, etc…

There are 3 parts to a recursive CTE:

The anchor member: An initial query that returns the base result of the CTE
The recursive member: A recursive query that references the CTE. this is UNION ALL’ed with the anchor member
A termination condition that stops the recursive member
Here’s an example of a recursive CTE that gets the manager id for each staff id:

with org_structure as (
SELECT id , manager_id
FROM staff_members
WHERE manager_id IS NULL

UNION ALL

SELECT sm.id, sm.manager_id
FROM staff_members sm
INNER JOIN org_structure os ON os.id = sm.manager_id

48
Q

What are Temporary Functions in SQL Server?

A

Temporary functions are important for several reasons:

It allows you to break down chunks of code into smaller chunks of code
It’s useful for writing cleaner code
It prevents repetition and allows you to reuses code similar to using functions in Python.

Consider the following example:
SELECT name
, CASE WHEN tenure < 1 THEN “analyst”
WHEN tenure BETWEEN 1 and 3 THEN “associate”
WHEN tenure BETWEEN 3 and 5 THEN “senior”
WHEN tenure > 5 THEN “vp”
ELSE “n/a”
END AS seniority
FROM employees

Instead, you can leverage a temporary function to capture the CASE clause.

CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS (
CASE WHEN tenure < 1 THEN “analyst”
WHEN tenure BETWEEN 1 and 3 THEN “associate”
WHEN tenure BETWEEN 3 and 5 THEN “senior”
WHEN tenure > 5 THEN “vp”
ELSE “n/a”
END
);

SELECT name, get_seniority(tenure) as seniority
FROM employees

With a temporary function, the query itself is much simpler, more readable, and you can reuse the seniority function!