Week 3: Advanced SQL and Data Integration Flashcards

1
Q

REVERSED

CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa REAL)

A

How do you create a table in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

REVERSED

DROP TABLE Students

A

How do you delete an entire table including schema in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

REVERSED

DELETE FROM Students

A

How do you delete the contents of a table but keep the relation in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

REVERSED

DELETE FROM Students
WHERE …

A

How do you delete certain rows from a table in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

REVERSED

ALTER TABLE Students
ADD COLUMN firstYear integer

A

How do you add an empty column to an SQL table?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

REVERSED

ALTER TABLE Students
DROP firstYear

A

How do you delete an attribute from an SQL table?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

REVERSED

INSERT INTO Students
VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)

INSERT INTO Students(name, Sid, login, age, gpa)
VALUES (‘Smith’, 53688, ‘smith@ee’, 18, 3.2)

A

How do you add a row into an SQL table? (2 ways depending on the order)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

REVERSED

UPDATE Students
SET grade = grade*1.2
WHERE …

A

How do you change the value in an attribute for all or some tuples?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

REVERSED

CREATE TABLE test (sid INTEGER PRIMARY KEY, name VARCHAR(30), major VARCHAR(30))

CREATE TABLE test (sid INTEGER, name VARCHAR(30), major VARCHAR(30), PRIMARY KEY (sid))

A

How do you set a primary key when creating a table in SQL? (2 ways)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

REVERSED

CREATE TABLE Works_In(ssn CHAR(11), did INTEGER, since DATE,
PRIMARY KEY (ssn, did),
FOREIGN KEY (ssn)
REFERENCES Employees)

A

How do you set a foreign key when creating a table in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

REVERSED

organise a group of entity sets into a parent/child hierarchy

A

What are entity hierarchies?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

REVERSED

specify that the children of an entity do/don’t overlap

A

What are overlap constraints?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

REVERSED

instances of the children of an entity include all instances of their parent (ie cover it)

A

What are covering constraints?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

REVERSED

A view is just a relation, but we store a definition, rather than a set of tuples
Views can be used to present necessary information (or a summary) while hiding details in underlying relations

A

What is a view in SQL and what is it used for?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

REVERSED

CREATE VIEW YoungActiveStudents (name, grade)
AS SELECT S.name, E.grade
FROM Students S, Enrolled E
Where S.Sid = E.Sid and S.age <21

A

How do you create a view in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

REVERSED

DROP VIEW

A

How can you delete a view in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

REVERSED

a constraint is a relationship among data elements that the DBMS is required to enforce

A

What is an integrity constraint?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

REVERSED

Integrity constraints guard against accidental damage to the database, by ensuring that authorised changes do not result in a loss of data consistency

A

What is the purpose of integrity constraints?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

REVERSED

one that satisfies all specified ICs

A

What is a legal instance of a relation?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

REVERSED

  • Primary key
  • Foreign key (referential integrity)
  • Value-based
  • Tuple-based
A

What are the 4 types of integrity constraints?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

REVERSED

Means the primary key cannot be null

A

What is entity integrity?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

REVERSED

a key that could be the primary key but is not classified as the primary key

A

What is an alternate key?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

REVERSED

  • NO ACTION - delete/update is rejected
  • CASCADE - make the same changes to all tuples that refer to the updated/deleted tuple
  • SET NULL / SET DEFAULT - sets foreign key value of referencing tuple to NULL or a default value
A

What are the three ways of enforcing referential integrity in SQL?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

REVERSED

CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY(sid,cid), FOREIGN KEY(sid) REFERNCES Students
ON DELETE CASCADE
ON UPDATE SET DEFAULT)

A

How do you set the method of enforcing referential integrity when creating an SQL table?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
# REVERSED it will store the three characters followed by 2 spaces
What happens if you enter 3 characters into an attribute with CHAR(5)?
26
# REVERSED defines constraints on the values of a particular attributes
What is the check clause?
27
# REVERSED CREATE TABLE section( semester VARCHAR(6) CHECK(semester IN (‘Fall’, ‘Winter’, ‘Spring’, ‘Summer’)), year NUMERIC(4,0) CHECK(year\>1990))
How do you add a check into an SQL table?
28
# REVERSED CHECK(semester IN (‘Fall’, ‘Winter’, ‘Spring’, ‘Summer’) AND (year\>1990))
What is a tuple based check?
29
# REVERSED CHECK(time\_slot\_id IN (SELECT time\_slot\_id FROM time\_slot))
What is a complex check clause?
30
# REVERSED name VARCHAR(20) NOT NULL
How do you declare that an attribute can't be null?
31
# REVERSED - 1 table for each entity - 1 table for each relationship - Each cell contains a single value - If you have big relations, decompose them
What is the normalisation theory for designing a good database? (4)
32
# REVERSED means when you decompose a table and then join you get the original table back
What is lossless decomposition?
33
# REVERSED let R a relation schema (with constraints). A decomposition R1, R2, … Rn of R is called lossless iff. for each valid relation (instance) r(R): r = piR1(r) join piR2(r) join …
What is the official definition of a lossless decomposition?
34
# REVERSED Attribute(R1) UNION Attribute(R2) = Attribute(R) Attribute(R1) INTERSECTION Attribute(R2) is not empty Attribute(R1) INTERSECTION Attribute(R2) -\> Attribute(R1) or Attribute(R1) INTERSECTION Attribute(R2) -\> Attribute(R2)
What are the 3 rules to identify lossless decomposition?
35
# REVERSED - the values of a set of attributes X determine the values of another set of attributes Y - Denoted by X -\> Y (X determines Y)
What is functional dependence?
36
# REVERSED K is a superkey for relation R iff. K -\> R
What is a superkey in terms of functional dependence?
37
# REVERSED K is a candidate key for R iff. K -\> R AND for any X proper subset of K, X -/-\> R
What is a candidate key in terms of functional dependence?
38
# REVERSED functional dependencies allow us to express constraints that cannot be expressed using superkeys
What is the purpose of functional dependencies?
39
# REVERSED The set of functional dependencies logically implied by F is the closure of F, denoted F+ ie. to find the closure B+, find all the attributes that can be determined starting with B
What is the closure of a set of functional dependencies?
40
# REVERSED - If X subset of Y, then Y -\> X (reflexivity) - If X -\> Y, then AX -\> AY (augmentation) (A is a set of attributes) - If X -\> Y and Y -\> W, then X -\> W (transitivity)
What are Armstrongs axioms? (3)
41
# REVERSED The minimal set of attributes for which their closure is all the set of attributes in a relation
What is a candidate key in terms of closure?
42
# REVERSED if X -\> Y and X -\> W, then X -\> YW if X -\> YW, then X -\> Y and X -\> W if X -\> Y and WY -\> Z, then XW -\> Z If X is a candidate key, then X -\> Y for all Y
What functional dependencies can be derived from armstrongs axoims? (4)
43
# REVERSED For relational schema R(XYZ), the following holds: If X -\> Y then the decomposition R1(XY) R2(XZ) is lossless.
How do you check if a decomposition is lossless using functional dependencies?
44
# REVERSED A functional dependency that holds only under some contraint | ([country = UK, zip] -\> street)
What is a conditional functional dependency? How is it expressed?
45
# REVERSED - personalisation: content adapted to user - Customisation: structure adapted to user
What are personalisation and customisation in terms of data?
46
# REVERSED - a set of local databases, each with a local schema and a local instance - A global integrated schema - A set of mappings between the global and local schema
What is an information integration system
47
# REVERSED -leave the data in the sources -When a query comes in: \* Determine their relevant sources to the query \* Break down the query into sub-queries for the sources \* Get the answers from the sources, filter them if needed and combine them appropriately
what is visual integration architecture / on demand integration?
48
# REVERSED -extension of jaro-similarity that gives higher weight to matching prefix Jw(S1, S2) = JaroSim + P\*L\*(1-JaroSim) - P is a scaling factor, 0.1 by default - L is length of common prefix up to max 4
What is jaro-winkler similarity?
49
# REVERSED Text variations, local knowledge, evolving nature of data, new functionalities
What are reasons for discrepancies between entities? (4)
50
# REVERSED identify the different structures/records that model the same real-world object
What is entity resolution?
51
# REVERSED improves data quality and integrity, fosters re-use of existing data sources, optimises space
Why is entity resolution useful? (3)
52
# REVERSED - number of operations to convert from 1st string to 2nd string - delete and insert character with cost 1 - substitute character with cost 2
What is the levenshtein edit distance?
53
# REVERSED - overcome limitation of edit distance with shortened strings (abbreviations) - Considered two extra operations: open gap and extend gap (with small cost) Cost = insert + open + extend
What is the gap distance?
54
# REVERSED JaroSim(S1, S2) = (1/3)\*(C/|S1| + C/|S2| + (C-T)/C) C=common/matching characters in S1 and S2 T=transpositions/2 (transposition=matching but different sequence order) characters are considered matching when they are the same and not further than [max(|S1|, |S2|)/2]-1 apart
What is jaro-similarity?
55
# REVERSED Represent document as a set of its k-shingles Use jaccard set similarity
What is a similarity metric for documents?
56
# REVERSED - algorithm that indexes names by their sounds when pronounced in English - consists of the first letter of the name followed by three numbers - Remove all W, H - B,F,P,V encoded as 1 - C,G,J,K,Q,S,X,Z encoded as 2 - D,T encoded as 3 - L encoded as 4 - M,N encoded as 5 - R encoded as 6 - Remove vowels
What is the phonetic algorithm/sound encoding algorithm?
57
# REVERSED -The Jaccard similarity of two sets is the size of their intersection divided by the size of their union: sim(C1,C2) = |C1 intersection C2| / |C1 union C2|
What is jaccard similarity?
58
# REVERSED d(C1,C2) = 1 - sim(C1,C2)
What is jaccard distance?
59
# REVERSED Jaccard similarity but counts repetition of elements Max is 0.5
What is jaccard bag similarity?
60
# REVERSED -a sequence of k-tokens (characters or words) that appear in the document E.g. for document D1 = abcab, Set of 2-shingles S(D1) = {ab, bc, ca}
What is a k-shingle / k-gram?
61
# REVERSED 1. Shingling: convert documents to sets 2. Min-hashing: convert large sets to short signatures, while preserving similarity 3. Locality-sensitive hashing: focus on pairs of signatures likely to be from similar documents (candidate pairs)
What are the 3 steps for finding similar documents?
62
# REVERSED - get set of shingles - represent as a matrix where rows are shingles and columns are documents - matrix has 1 if the shingle is in the document
What is the process of shingling?
63
# REVERSED no. of rows where both columns are 1 / no. of rows where either column is 1
How do you find the similarity of two sets(documents) from the shingle matrix?
64
# REVERSED find similar columns while computing small signatures
What is the goal of hashing?
65
# REVERSED “hash”each column C to a small signature h(C), such that: 1. H(C) is small enough that the signature fits in RAM 2. sim(C1,C2) is the same as the “similarity” of signatures h(C1) and h(C2)
What is the key idea of hashing?
66
# REVERSED - start with a permutation for each row of the boolean matrix - get the first occurring row in the permuted order that correlates to a 1 in the document and store this as the new column value in the Signature matrix - repeat for many permutations
How do you perform min-hashing?
67
# REVERSED Define a minhash function h\_pi(C) = the index of the first(in the permuted order pi) row which column C has value 1
How do you define a minhash function?
68
# REVERSED Pr[h\_pi(C1) = h\_pi(C2)] = sim(C1, C2) the probability that the minhash function for a random permutation of rows produces the same value for two sets equals the Jaccard similarity of those sets
What is the property of the minhash function?
69
# REVERSED the fraction of the hash functions in which they agree (for which signatures have the same value)
What is the similarity of two signatures?
70
# REVERSED Smaller expected error
What happens as the signatures get longer?
71
# REVERSED generate from the collection of all elements (signatures) a small list of candidate pairs: pairs of elements whose similarity must be evaluated
What is locality sensitive hashing?
72
# REVERSED Divide the signature matrix into b bands, consisting of r rows. When rows are the same in the band, they are hashed to the same bucket. They have b opportunities to be hashed to the same bucket -signatures hashed to the same bucket are compared
How do you perform locality sensitive hashing?
73
# REVERSED 1. Select the most appropriate attribute name(s) with respect to noise and distinctiveness 2. Transform the corresponding value(s) into a blocking key (BK) 3. For each BK, create one block that contains all entities having this BK in their transformation \* works as a hash function - blocks on the equality of BKs
What is the standard blocking algorithm?