Post midterm concepts Flashcards

Question

SCD?

Answer 1

Slowly Changing Dimension: data warehouses have to adapt as data changes over time. This is typically done by making surrogate key take over the job of the original PK.

Answer 2

quantities or amounts and groups or classes.

Answer 3

Extract: pull data from sources. Also clean the data. Transform: de-normalize, aggregate, compute, reformat, etc. Load: put data that's ready into the data warehouse at regular intervals.

Answer 4

a central Fact Table has all the facts and quantities. Branching Dimension Tables pertain to different classes. A Time dimension table must exist. Not 3NF.

Answer 5

Star schema but data branches further off of the dimension tables for niche use.

Answer 6

1. Slicing: specify one dimension and observe all dimensions and facts; ex: year=2021. 2. Dicing: specify multiple dimensions and observe all other facts and dimensions. 3. Pivot: transpose. 4. Drill-down: breaking down informational into more detail. 5. Roll-up: putting small data points together for less detail.

Answer 7

1. ROLLUP(attr1, attr2): groups by attr1, (attr1 & attr2), and the total. 2. CUBE(attr1, attr2): groups by all combinations. Most computationally expensive. 3. GROUPING SETS(attr1, attr2): groups by attr1, attr2. "()" gives the total. Most flexible. They are used in the GROUP BY clause.

Answer 8

Quant that predicted the 2008 stock market crash and made lots of money.

Answer 9

Descriptive, Explanatory, Predictive, and Prescriptive. Also Generative.

Answer 10

data is split such the train data is used to train the model and test data is used to evaluate the predictive power of the model.

Answer 11

Volume: cloud computing. Variety: a NoSQL database has audio, image, and video data. Velocity: speed of information moving. Veracity: data is high quality. Value: data and processes have value.

Answer 12

structured: all 3. unstructured: data lake.

Answer 13

Column, Key-value, Document, and Graph.

Answer 14

Data is column stored meaning data is fetched according to columns. More efficient than row stored.

Answer 15

All information is related by key-value pairs. Saves space as Nulls are not included. It is "hashed" meaning everything is retrievable by keys.

Answer 16

Like Key-value DB but information exists across documents. Kind of like JSON.

Answer 17

Nodes and Edges. Nodes are accessible by edges. Edges are relationships between nodes. Popular type of DB in social media. It's easy to delete nodes.

Answer 18

1. On-prem, IaaS, PaaS, SaaS. 2. IaaS. 3. SaaS, PaaS, IaaS, On-prem.

Answer 19

Virtual Machines (3), Virtual Server, Physical Server (3), Physical Storage.

Answer 20

Public Cloud: everyone. Private Cloud: businesses. Hybrid Cloud. Community Cloud: both businesses and customers.

Answer 21

$1 to prevent. $10 to correct. $100 if you don't prevent or correct.

Answer 22

CREATE, input variables and their data types, Begin Try, Declare local variables, Commit, End Try, Begin Catch, Rollback, End Catch, EXEC.

Answer 23

Primary Index: done by default assigning unique pointers to PKs. Clustered meaning that the index rows correspond with the table rows. Secondary Index: must be done manually with usually non-unique pointers. Non-clustered.

Answer 24

1. Examine multiple options of executing the same query. 2. Create a query plan and calculate execution time for each option. 3. Choose the fastest option for execution.

Answer 25

different query plans are created based on different RA along with cost/time.

Answer 26

- speeds up search by sacrificing INSERT, DELETE, and UPDATE speed (rebalancing). - more than 15 indexes in a table strains memory. - should not be used on columns that contain many Null values.

Answer 27

- Use indexes. - specify column names in the SELECT statement. - avoid use of unnecessary functions. - Use UNION ALL over UNION - avoid unnecessary table joins - Use left join and full join only when necessary.

Answer 28

Hardware, network trafficking, permissions, scale of resources, application design.

Answer 29

Atomicity Consistency Isolation Durability Data integrity for a DB.

Answer 30

Locking: locks data from changing. Blocking: locked by one transaction and blocks others from using it. Deadlock: 2 are waiting in queue and one becomes the victim.

Answer 31

multiple users vs. one user gets priority. optimistic vs. pessimistic.

Answer 32

Creates: tables without FKs first and get closer to complex; bottom-up. Drop: start complex and work your way out.

Answer 33

Gordon Everest

Post midterm concepts Flashcards

(57 cards)