Post midterm concepts Flashcards

1
Q

What is the SQL syntax order and the SQL execution order?

A

SYNTAX: SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY.
EXECUTION: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Difference between filtering in HAVING and WHERE?

A

HAVING: quicker because it sorts attributes in the SELECT statement. Often it is optimal to use over WHERE.
WHERE: slower because it sorts attributes across the entire base table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What should you do if you want to update two kinds of values and the update of one relies on the updated values of the other.

A

You must separate them into two UPDATE statements. Otherwise, it will not work as multiple updates in UPDATE happen simultaneously, not in order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an equi-join?

A

an inefficient inner join. It’s a cross join and then non matching records are deleted. It is done by putting all tables in the FROM clause and specifying PK-FK relations in the WHERE clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cross Join

A

All possible row joins are joined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Inner Join

A

table records overlapping on an attribute are joined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Left Join

A

All overlapping records on an attribute are joined as well as all other records from the Left table. The non matching records are matched with null records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Full Join

A

All overlapping records on an attribute are joined as well as all other records from both tables. Those that do not have a matching row are a totally null record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Union

A

two tables’ records are stacked on top of each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Outer Join

A

records from each table that do not match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

3 prerequisites for union?

A
  1. all query results must have the same number of columns.
  2. All columns must have the same sequence.
  3. All corresponding columns must share a common datatype.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

UNION vs. UNION ALL

A

UNION removes duplicate rows while UNION ALL is more inclusive. UNION ALL is more efficient and faster as it cuts out a process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which two syntax elements only appears once when a UNION is used?

A

The ORDER BY clause only appears at the end and the aliases only appear in the first SELECT clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Non-correlated Subquery

A

Inner query is executed first and once. The outer query is then compared to the Inner query. The attribute of comparison and the attribute selected in the inner query must match. IN is less optimal than EXISTS because it checks all records in the inner query.
Ex: “WHERE EmpID IN (…);” or
“WHERE EmpID = (…);”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Correlated Subquery

A

Inner query is executed with every record of the outer query. EXISTS is more optimal than IN because it looks for matching values and when a match is found, it stops looking.
Ex: “WHERE EXISTS (…);”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How would you do an inner join using subqueries?

A

Put the other table in a subquery and find overlapping records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Views, CTEs, and Derived Tables all come from what data language?

A

DML; data manipulation language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a view? Why should we use it? Does it saved across multiple sessions? What is the Syntax for creating/dropping it?

A
  1. A view is a pre-compiled, saved query that can be referenced as a table.
  2. It enhances security and productivity while reducing training.
  3. Yes it persists across multiple sessions.
  4. CREATE VIEW view_name AS (…) & DROP VIEW view_name.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a CTE?

A

A CTE is a common table expression that exists in memory only in the instance that the query is run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a derived table?

A

A subquery; usually in parenthesis and in the FROM clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How are RANK() and ROW_NUMBER() used syntactically? What is the difference? How are they used?

A
  1. ROW_NUMBER() COUNT(*) OVER(PARTITION BY attribute) AS ‘Row_no.’
  2. RANK() can have ties while ROW_NUMBER() just orders results in the order they are generated.
  3. They are typically put in an inner query, CTE, or view and attributes are put in the SELECT statement and ‘Row_no’ is put in the ORDER BY or WHERE statement of outer query.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Difference between Data Warehouse and Database?

A

Data Warehouse has fewer uses, read-only, not 3NF, and derived from the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Information Data vs. Operational Data?

A

Operational Data: has many users and is used for tactical decision-making. INSERT, DELETE, and UPDATE statements are used on it. It contains detailed and dynamic data.
Informational Data: has few users and is used for strategic decision-making. They only use SELECT statements. Data is aggregate and static data.

24
Q

What is a Fragmented Application Landscape?

A

Usually organizations have a mixed bag of systems and software that is duct-taped together. There are home-grown and bought systems, cloud-computing and onsite DBs/applications, servers and software from M&A, and overall lack of enterprise governance.

24
Q

SCD?

A

Slowly Changing Dimension: data warehouses have to adapt as data changes over time. This is typically done by making surrogate key take over the job of the original PK.

25
Q

Data Cubes are made of two kinds of data. What are they?

A

quantities or amounts and groups or classes.

26
Q

What is ETL?

A

Extract: pull data from sources. Also clean the data.
Transform: de-normalize, aggregate, compute, reformat, etc.
Load: put data that’s ready into the data warehouse at regular intervals.

27
Q

What is Star Schema?

A

a central Fact Table has all the facts and quantities. Branching Dimension Tables pertain to different classes. A Time dimension table must exist. Not 3NF.

28
Q

What is Snowflake Schema?

A

Star schema but data branches further off of the dimension tables for niche use.

29
Q

All 5 Multi-dimensional cube operations and what they do?

A
  1. Slicing: specify one dimension and observe all dimensions and facts; ex: year=2021.
  2. Dicing: specify multiple dimensions and observe all other facts and dimensions.
  3. Pivot: transpose.
  4. Drill-down: breaking down informational into more detail.
  5. Roll-up: putting small data points together for less detail.
30
Q

OLAP functions and their outputs. Which clause are they used in?

A
  1. ROLLUP(attr1, attr2): groups by attr1, (attr1 & attr2), and the total.
  2. CUBE(attr1, attr2): groups by all combinations. Most computationally expensive.
  3. GROUPING SETS(attr1, attr2): groups by attr1, attr2. “()” gives the total. Most flexible.

They are used in the GROUP BY clause.

31
Q

Who is Nassim Nicholas Taleb?

A

Quant that predicted the 2008 stock market crash and made lots of money.

32
Q

4 Types of Analytics in timeline order and 5th new one.

A

Descriptive, Explanatory, Predictive, and Prescriptive. Also Generative.

33
Q

Test-train-split

A

data is split such the train data is used to train the model and test data is used to evaluate the predictive power of the model.

34
Q

5 V’s of Big Data

A

Volume: cloud computing.
Variety: a NoSQL database has audio, image, and video data.
Velocity: speed of information moving.
Veracity: data is high quality.
Value: data and processes have value.

35
Q

Data triangle does from DB to Data Warehouse to Data Lake. Where does structured and unstructured data go in each?

A

structured: all 3.
unstructured: data lake.

36
Q

4 NoSQL DBs

A

Column, Key-value, Document, and Graph.

37
Q

Column DB

A

Data is column stored meaning data is fetched according to columns. More efficient than row stored.

38
Q

Key-value DB

A

All information is related by key-value pairs. Saves space as Nulls are not included. It is “hashed” meaning everything is retrievable by keys.

39
Q

Document DB

A

Like Key-value DB but information exists across documents. Kind of like JSON.

40
Q

Graph DB

A

Nodes and Edges. Nodes are accessible by edges. Edges are relationships between nodes. Popular type of DB in social media. It’s easy to delete nodes.

41
Q

IT solutions are On-prem, PaaS, SaaS, and IaaS. In what order does the vendor have the most control? Which is the most capital-intensive service to offer? In what order do you have the most control?

A
  1. On-prem, IaaS, PaaS, SaaS.
  2. IaaS.
  3. SaaS, PaaS, IaaS, On-prem.
42
Q

Order of Virtual Machines to Physical Storage?

A

Virtual Machines (3), Virtual Server, Physical Server (3), Physical Storage.

43
Q

Four types of clouds and who uses them?

A

Public Cloud: everyone.
Private Cloud: businesses.
Hybrid Cloud.
Community Cloud: both businesses and customers.

44
Q

1:10:100 Rule

A

$1 to prevent.
$10 to correct.
$100 if you don’t prevent or correct.

45
Q

Major components of defining a procedure

A

CREATE, input variables and their data types, Begin Try, Declare local variables, Commit, End Try, Begin Catch, Rollback, End Catch, EXEC.

46
Q

Difference between primary index and secondary index?

A

Primary Index: done by default assigning unique pointers to PKs. Clustered meaning that the index rows correspond with the table rows.
Secondary Index: must be done manually with usually non-unique pointers. Non-clustered.

47
Q

Query Optimization Steps

A
  1. Examine multiple options of executing the same query.
  2. Create a query plan and calculate execution time for each option.
  3. Choose the fastest option for execution.
48
Q

Query Plan

A

different query plans are created based on different RA along with cost/time.

49
Q

Index Disadvantages

A
  • speeds up search by sacrificing INSERT, DELETE, and UPDATE speed (rebalancing).
  • more than 15 indexes in a table strains memory.
  • should not be used on columns that contain many Null values.
50
Q

Optimization Strategies

A
  • Use indexes.
  • specify column names in the SELECT statement.
  • avoid use of unnecessary functions.
  • Use UNION ALL over UNION
  • avoid unnecessary table joins
  • Use left join and full join only when necessary.
51
Q

Factors other than DB and its processes that effect query time?

A

Hardware, network trafficking, permissions, scale of resources, application design.

52
Q

ACID Compliance

A

Atomicity
Consistency
Isolation
Durability

Data integrity for a DB.

53
Q

Pessimistic Locking via.

A

Locking: locks data from changing.
Blocking: locked by one transaction and blocks others from using it.
Deadlock: 2 are waiting in queue and one becomes the victim.

54
Q

Versioning vs. Locking & Blocking

A

multiple users vs. one user gets priority.
optimistic vs. pessimistic.

55
Q

What order to create tables and to drop tables?

A

Creates: tables without FKs first and get closer to complex; bottom-up.

Drop: start complex and work your way out.

56
Q

Who made Crow’s foot notation.

A

Gordon Everest