TSQL - MS SQL Server Flashcards

Question

Explain the difference between the LEN() and DATALENGTH() functions in Transact-SQL.

Answer 1

LEN() Returns the number of characters of the specified string expression, excluding trailing blanks. DATALENGTH() Returns the number of bytes used to represent any expression.

Answer 2

This function returns the number of items found in a group. These functions differ only in the data types of their return values. COUNT always returns an int data type value. COUNT_BIG always returns a bigint data type value. COUNT(*) returns the number of items in a group. This includes NULL values and duplicates. COUNT(ALL expression) evaluates expression for each row in a group, and returns the number of nonnull values. COUNT(DISTINCT expression) evaluates expression for each row in a group, and returns the number of unique, nonnull values.

Answer 3

The CHAR() function converts an INT ASCII code to a character value. In other words, you pass in an integer, and the function interprets it as the code value for a string character and returns the corresponding string character.

Answer 4

The TRUNCATE TABLE command deletes the data inside a table, but not the table itself. Truncating a table is removing all the records in an entire table or a table partition. TRUNCATE table is functionally similar to DELETE table with no WHERE clause. However, TRUNCATE table is much faster than DELETE with respect to the time and the resource consumptions which we will look at in this article. TRUNCATE statement removes the data by de-allocating the data pages in the table data. This means that TRUNCATE is similar to drop and re-create the table. Also, it records only the page de-allocations in the transaction log, not the row-wise as in DELETE statement. Truncate Table does not write to the Transaction Log. Truncate Table does remove data pages. Use DELETE FROM to delete both the data and the table structure.

Answer 5

DDL is Data Definition Language which is used to define data structures. For example: create table, alter table are instructions in SQL. DML: DML is Data Manipulation Language which is used to manipulate data itself (Judge)

Answer 6

There are four main types of JOINs in SQL: INNER JOIN, OUTER JOIN, CROSS JOIN, and SELF JOIN. However, remember that OUTER JOINS have two subtypes: LEFT OUTER JOIN and RIGHT OUTER JOIN. (Judge)

Answer 7

Indexes are special data structures associated with tables or views that help speed up the query. SQL Server provides two types of indexes: clustered index and non-clustered index. In this section, you will learn everything you need to know about indexes to come up with a good index strategy and optimize your queries. (Judge)

Answer 8

``` There are various types of indexes in SQL server: Clustered Index. Non-Clustered Index. Column Store Index. Filtered Index. Hash Index. Unique Index. ```

Answer 9

The CHECK constraint is used to limit the value range that can be placed in a column. If you define a CHECK constraint on a column it will allow only certain values for this column. If you define a CHECK constraint on a table it can limit the values in certain columns based on values in other columns in the row. (Judge)

Answer 10

An SQL injection is a technique that attackers apply to insert SQL query into input fields to then be processed by the underlying SQL database. These weaknesses are then able to be abused when entry forms allow user-generated SQL statements to query the database directly (Judge)

Answer 11

SQL Server uses a process called parameter sniffing when it executes stored procedures that have – you guessed it – parameters. When the procedure is compiled or recompiled, the value passed into the parameter is evaluated and used to create an execution plan. That value is then stored with the execution plan in the plan cache. On subsequent executions, that same value – and same plan – is used. This is a normal, expected behavior in SQL Server. Because compiling queries is expensive, you want plans stored in the cache. You want SQL Server to re-use them as much as possible. But what happens when the values in a table you’re querying aren’t evenly distributed? What if one value would return 10 rows and another value would return 10,000 rows, or 10 million rows? I call this the elephant and the mouse problem. You would handle one animal differently than the other; SQL Server might create different plans for the queries. But it doesn’t, because you’re using parameters. elephant What will happen is that the first time the procedure is run and the plan is compiled, whatever value is passed in is stored with the plan. Every time it’s executed, until it’s recompiled, the same value and plan will be used – regardless of whether it is the fastest or best plan for that value. If this is happening to you, and causing performance problems, there are ways to deal with it. ..Brent Ozar. (Judge)

Answer 12

Index Scan: Since a scan touches every row in the table, whether or not it qualifies, the cost is proportional to the total number of rows in the table. Thus, a scan is an efficient strategy if the table is small or if most of the rows qualify for the predicate. Index Seek: Since a seek only touches rows that qualify and pages that contain these qualifying rows, the cost is proportional to the number of qualifying rows and pages rather than to the total number of rows in the table. Index Scan is nothing but scanning on the data pages from the first page to the last page. If there is an index on a table, and if the query is touching a larger amount of data, which means the query is retrieving more than 50 percent or 90 percent of the data, and then the optimizer would just scan all the data pages to retrieve the data rows. If there is no index, then you might see a Table Scan (Index Scan) in the execution plan. Index seeks are generally preferred for the highly selective queries. What that means is that the query is just requesting a fewer number of rows or just retrieving the other 10 (some documents says 15 percent) of the rows of the table. In general query optimizer tries to use an Index Seek which means that the optimizer has found a useful index to retrieve recordset. But if it is not able to do so either because there is no index or no useful indexes on the table, then SQL Server has to scan all the records that satisfy the query condition.

Answer 13

SELECT product, product_price, items_sold, product_price * items_sold AS revenue, RANK() OVER (ORDER BY product_price * items_sold DESC) AS revenue_rank FROM sales;

Answer 14

SELECT month, revenue, revenue - LAG(revenue, 1) OVER (ORDER BY month) AS monthly_delta FROM revenue;

Answer 15

SELECT month, client, cash_flow, SUM (cash_flow) OVER (PARTITION BY client ORDER BY month) AS running_total FROM budget;

Answer 16

SELECT OrderID, Quantity, CASE WHEN Quantity > 30 THEN 'The quantity is greater than 30' WHEN Quantity = 30 THEN 'The quantity is 30' ELSE 'The quantity is under 30' END AS QuantityText FROM OrderDetails; ``` also, CASE can be used in the ORDER clause: SELECT CustomerName, City, Country FROM Customers ORDER BY (CASE WHEN City IS NULL THEN Country ELSE City END); ```

Answer 17

SELECT warehouse, brand, SUM (quantity) AS sum_product FROM warehouse GROUP BY ROLLUP (warehouse, brand); Select the columns warehouse and brand from the table. Also sum the column quantity, which will be shown in the new table sum_product. Then ROLLUP comes in! It’s used to get totals for multiple data grouping levels. The GROUP BY ROLLUP (warehouse, brand) part will group the data by the warehouse and brand columns. After that, it will sum the data according to each grouping: ``` warehouse brand sum_product Amsterdam Brando 1105 Amsterdam Ostap 62934 Amsterdam NULL 64039 Berlin Brando 67356 Berlin Ostap 13451 Berlin NULL 80807 NULL NULL 144846 ``` The table contains totals for the Brando and Ostap brands in the Amsterdam and Berlin warehouses and a grand total. The subtotal for both products in the Amsterdam warehouse is shown in the first row with the NULL brand value. It amounts to 64 039, the sum of the two previous rows. Next, you can see the totals for both brands in the Berlin warehouse. After that, there’s another line with a NULL brand value; this is actually the Berlin subtotal amounting to 80 807. The last row shows the grand total of all products in all warehouses, which is 144 846.

Answer 18

This example uses ORDER BY state, city NULLS LAST to ensure that each state’s rollup comes immediately after all of the cities in that state, and that the final rollup appears at the end of the output. ``` select state, city, sum((s.retail_price - p.wholesale_price) * s.quantity) as profit from products as p, sales as s where s.product_id = p.product_id group by cube (state, city) order by state, city nulls last ; +-------+---------+--------+ | STATE | CITY | PROFIT | |-------+---------+--------| | CA | SF | 13 | | CA | SJ | 26 | | CA | NULL | 39 | | FL | Miami | 48 | | FL | Orlando | 96 | | FL | NULL | 144 | | PR | SJ | 192 | | PR | NULL | 192 | | NULL | Miami | 48 | | NULL | Orlando | 96 | | NULL | SF | 13 | | NULL | SJ | 218 | | NULL | NULL | 375 | +-------+---------+--------+ ```

Answer 19

GROUPING SET means you are asking SQL to group the result several times. You can use the GROUPING SETS syntax to specify precisely which aggregations to compute. Here’s an example. SELECT t.[Group] AS region, t.name AS territory, sum(TotalDue) AS revenue, datepart(yyyy, OrderDate) AS [year], datepart(mm, OrderDate) AS [month] FROM Sales.SalesOrderHeader s INNER JOIN Sales.SalesTerritory T ON s.TerritoryID = T.TerritoryID GROUP BY t.[GROUP], GROUPING SETS(ROLLUP(t.name), ROLLUP(datepart(yyyy, OrderDate), datepart(mm, OrderDate))) Here, you are asking for the breakdown by territory group for every month of every year with month and year totals, followed by a summary total by territory name, but without a grand total. Unlike the ROLLUP, you get the same result whatever the order of the columns within each GROUPING SET and the order of the GROUPING SETS. GROUPING SETs can give you precisely what CUBE and ROLLUP gives you and a lot more besides. As you can see with this last example, you can use standard ‘table d’hôte’ CUBE and ROLLUP mixed together with directly-expressed ‘à la carte ‘GROUPING SETs.

Answer 20

There is only one major difference between the functionality of the ROLLUP operator and the CUBE operator. ROLLUP operator generates aggregated results for the selected columns in a hierarchical way. On the other hand, CUBE generates a aggregated result that contains all the possible combinations for the selected columns. To understand this, look at the result set for the ROLLUP operator where the sum of the salaries of the employees were grouped by department and gender: Row Number Department Gender Salary_Sum 1 Finance Female 11800 2 Finance Male 5000 3 Finance All Genders 16800 4 HR Female 6000 5 HR Male 14200 6 HR All Genders 20200 7 IT Female 21200 8 IT All Genders 21200 9 Marketing Female 12200 10 Marketing Male 6500 11 Marketing All Genders 18700 12 Sales Male 18700 13 Sales All Genders 18700 14 All Departments All Genders 95600 Here data is aggregated in hierarchical manner. In rows 1, 2, 4, 5, 7, 9, 10 and 12, salaries are grouped by department and gender. In rows 3, 6, 8, 11 and 13, salaries are grouped by Department only. Finally, in row 14 we have the grand total of the salaries of all of the employees of all genders from all departments. Here we have three combinations that are hierarchical in nature. They are as follows: Department and Gender Department Grand Total We do not have salary grouped by Gender only. This is because gender is lowest in hierarchy. On the other hand, if you look at the aggregated result of the CUBE operator where the sum of the salaries of the employees were grouped by department and gender, we had all four possible combinations: ``` 1- Department and Gender 2- Department only 3- Gender Only 4- Grand Total. o ``` Note: It is important to mention here that the result of both the ROLLUP and the CUBE operators will be similar if your data is grouped by only one column. Which One Should I Use? ROLLUP and CUBE are performance tools. You should use ROLLUP if you want your data hierarchically and CUBE if you want all possible combinations. For example, if you want to retrieve the total population of a country, state and city. ROLLUP would sum the population at three levels. First it would return the sum of population at Country-State-City Level. Then it would sum the population at Country-State level and finally it would sum the population at Country level. It would also provide a grand total level. CUBE groups data in all possible combinations of columns so the population would be summed up in following levels: ``` Country-State-City State-City City Country-State State Country-City Country All It all depends what you need as to which you would choose. A simple rule of thumb is that if you have hierarchical data (for example, country->state->city or Department->Manager-Salesman, etc.), you usually want hierarchical results, and you use ROLLUP to group the data. ``` If you have non-hierarchical data (for example, City-Gender-Nationality), then you don’t want hierarchical results and so you use CUBE as it will provide all possible combinations.

Answer 21

SQL window functions allow you to perform operations that are often required for creating reports, e.g. ranking data, calculating running totals and moving averages, finding the difference between rows, etc. Not only that, but you can also divide data into windows, which enables you to perform operations on data subsets rather than the data as a whole. SQL Server Window Functions calculate an aggregate value based on a group of rows and return multiple rows for each group. ``` Ranking Functions row_number() rank() dense_rank() Distribution Functions percent_rank() cume_dist() Analytic Functions lead() lag() ntile() first_value() last_value() nth_value() Aggregate Functions avg() count() max() min() sum() ```

Answer 22

If you ever wanted query a query, that’s when CTEs come into play — CTEs essentially create a temporary table. Using common table expressions (CTEs) is a great way to modularize and break down your code, the same way that you would break down an essay into several paragraphs. ``` with toronto_ppl as ( SELECT DISTINCT name FROM population WHERE country = "Canada" AND city = "Toronto" ), avg_female_salary as ( SELECT AVG(salary) as avgSalary FROM salaries WHERE gender = "Female") ``` SELECT name, salary FROM People WHERE name in (SELECT DISTINCT FROM toronto_ppl) AND salary >= (SELECT avgSalary FROM avg_female_salary) Now it’s clear that the WHERE clause is filtering for names in Toronto. If you noticed, CTEs are useful because you can break down your code into smaller chunks, but they’re also useful because it allows you to assign a variable name to each CTE (i.e. toronto_ppl and avg_female_salary)

Answer 23

A Recursive CTE is a CTE that references itself, just like a recursive function in Python. Recursive CTEs are especially useful which it comes to querying hierarchical data like organization charts, file systems, a graph of links between webpages, etc… There are 3 parts to a recursive CTE: The anchor member: An initial query that returns the base result of the CTE The recursive member: A recursive query that references the CTE. this is UNION ALL’ed with the anchor member A termination condition that stops the recursive member Here’s an example of a recursive CTE that gets the manager id for each staff id: with org_structure as ( SELECT id , manager_id FROM staff_members WHERE manager_id IS NULL UNION ALL SELECT sm.id, sm.manager_id FROM staff_members sm INNER JOIN org_structure os ON os.id = sm.manager_id

Answer 24

Temporary functions are important for several reasons: It allows you to break down chunks of code into smaller chunks of code It’s useful for writing cleaner code It prevents repetition and allows you to reuses code similar to using functions in Python. Consider the following example: SELECT name , CASE WHEN tenure < 1 THEN "analyst" WHEN tenure BETWEEN 1 and 3 THEN "associate" WHEN tenure BETWEEN 3 and 5 THEN "senior" WHEN tenure > 5 THEN "vp" ELSE "n/a" END AS seniority FROM employees Instead, you can leverage a temporary function to capture the CASE clause. CREATE TEMPORARY FUNCTION get_seniority(tenure INT64) AS ( CASE WHEN tenure < 1 THEN "analyst" WHEN tenure BETWEEN 1 and 3 THEN "associate" WHEN tenure BETWEEN 3 and 5 THEN "senior" WHEN tenure > 5 THEN "vp" ELSE "n/a" END ); SELECT name, get_seniority(tenure) as seniority FROM employees With a temporary function, the query itself is much simpler, more readable, and you can reuse the seniority function!

TSQL - MS SQL Server Flashcards

(48 cards)