Week 3: Advanced SQL and Data Integration Flashcards

1
Q

What is the check clause?

A

defines constraints on the values of a particular attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is jaccard bag similarity?

A

Jaccard similarity but counts repetition of elements
Max is 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are personalisation and customisation in terms of data?

A
  • personalisation: content adapted to user
  • Customisation: structure adapted to user
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the official definition of a lossless decomposition?

A

let R a relation schema (with constraints). A decomposition R1, R2, … Rn of R is called lossless iff. for each valid relation (instance) r(R): r = piR1(r) join piR2(r) join …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What functional dependencies can be derived from armstrongs axoims? (4)

A

if X -> Y and X -> W, then X -> YW
if X -> YW, then X -> Y and X -> W
if X -> Y and WY -> Z, then XW -> Z
If X is a candidate key, then X -> Y for all Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you create a view in SQL?

A

CREATE VIEW YoungActiveStudents (name, grade)
AS SELECT S.name, E.grade
FROM Students S, Enrolled E
Where S.Sid = E.Sid and S.age <21

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is entity resolution useful? (3)

A

improves data quality and integrity, fosters re-use of existing data sources, optimises space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the standard blocking algorithm?

A
  1. Select the most appropriate attribute name(s) with respect to noise and distinctiveness
  2. Transform the corresponding value(s) into a blocking key (BK)
  3. For each BK, create one block that contains all entities having this BK in their transformation
    * works as a hash function - blocks on the equality of BKs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the similarity of two signatures?

A

the fraction of the hash functions in which they agree (for which signatures have the same value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is entity integrity?

A

Means the primary key cannot be null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you delete an attribute from an SQL table?

A

ALTER TABLE Students
DROP firstYear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a legal instance of a relation?

A

one that satisfies all specified ICs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a view in SQL and what is it used for?

A

A view is just a relation, but we store a definition, rather than a set of tuples
Views can be used to present necessary information (or a summary) while hiding details in underlying relations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is locality sensitive hashing?

A

generate from the collection of all elements (signatures) a small list of candidate pairs: pairs of elements whose similarity must be evaluated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a similarity metric for documents?

A

Represent document as a set of its k-shingles
Use jaccard set similarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the three ways of enforcing referential integrity in SQL?

A
  • NO ACTION - delete/update is rejected
  • CASCADE - make the same changes to all tuples that refer to the updated/deleted tuple
  • SET NULL / SET DEFAULT - sets foreign key value of referencing tuple to NULL or a default value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of integrity constraints?

A

Integrity constraints guard against accidental damage to the database, by ensuring that authorised changes do not result in a loss of data consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a candidate key in terms of closure?

A

The minimal set of attributes for which their closure is all the set of attributes in a relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you find the similarity of two sets(documents) from the shingle matrix?

A

no. of rows where both columns are 1 / no. of rows where either column is 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you add a check into an SQL table?

A

CREATE TABLE section( semester VARCHAR(6) CHECK(semester IN (‘Fall’, ‘Winter’, ‘Spring’, ‘Summer’)), year NUMERIC(4,0) CHECK(year>1990))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is functional dependence?

A
  • the values of a set of attributes X determine the values of another set of attributes Y
  • Denoted by X -> Y (X determines Y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are covering constraints?

A

instances of the children of an entity include all instances of their parent (ie cover it)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the key idea of hashing?

A

“hash”each column C to a small signature h(C), such that:

  1. H(C) is small enough that the signature fits in RAM
  2. sim(C1,C2) is the same as the “similarity” of signatures h(C1) and h(C2)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the gap distance?

A
  • overcome limitation of edit distance with shortened strings (abbreviations)
  • Considered two extra operations: open gap and extend gap (with small cost)

Cost = insert + open + extend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is jaccard similarity?
-The Jaccard similarity of two sets is the size of their intersection divided by the size of their union: sim(C1,C2) = |C1 intersection C2| / |C1 union C2|
26
How do you perform min-hashing?
- start with a permutation for each row of the boolean matrix - get the first occurring row in the permuted order that correlates to a 1 in the document and store this as the new column value in the Signature matrix - repeat for many permutations
27
What is jaccard distance?
d(C1,C2) = 1 - sim(C1,C2)
28
What is jaro-similarity?
JaroSim(S1, S2) = (1/3)\*(C/|S1| + C/|S2| + (C-T)/C) C=common/matching characters in S1 and S2 T=transpositions/2 (transposition=matching but different sequence order) characters are considered matching when they are the same and not further than [max(|S1|, |S2|)/2]-1 apart
29
What are reasons for discrepancies between entities? (4)
Text variations, local knowledge, evolving nature of data, new functionalities
30
How do you check if a decomposition is lossless using functional dependencies?
For relational schema R(XYZ), the following holds: If X -\> Y then the decomposition R1(XY) R2(XZ) is lossless.
31
What is the normalisation theory for designing a good database? (4)
- 1 table for each entity - 1 table for each relationship - Each cell contains a single value - If you have big relations, decompose them
32
What is a conditional functional dependency? How is it expressed?
A functional dependency that holds only under some contraint | ([country = UK, zip] -\> street)
33
How do you set the method of enforcing referential integrity when creating an SQL table?
CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY(sid,cid), FOREIGN KEY(sid) REFERNCES Students ON DELETE CASCADE ON UPDATE SET DEFAULT)
34
What is the property of the minhash function?
Pr[h\_pi(C1) = h\_pi(C2)] = sim(C1, C2) the probability that the minhash function for a random permutation of rows produces the same value for two sets equals the Jaccard similarity of those sets
35
What is the closure of a set of functional dependencies?
The set of functional dependencies logically implied by F is the closure of F, denoted F+ ie. to find the closure B+, find all the attributes that can be determined starting with B
36
what is visual integration architecture / on demand integration?
-leave the data in the sources -When a query comes in: \* Determine their relevant sources to the query \* Break down the query into sub-queries for the sources \* Get the answers from the sources, filter them if needed and combine them appropriately
37
What are overlap constraints?
specify that the children of an entity do/don’t overlap
38
What are Armstrongs axioms? (3)
- If X subset of Y, then Y -\> X (reflexivity) - If X -\> Y, then AX -\> AY (augmentation) (A is a set of attributes) - If X -\> Y and Y -\> W, then X -\> W (transitivity)
39
What is a k-shingle / k-gram?
-a sequence of k-tokens (characters or words) that appear in the document E.g. for document D1 = abcab, Set of 2-shingles S(D1) = {ab, bc, ca}
40
What is the goal of hashing?
find similar columns while computing small signatures
41
What is the purpose of functional dependencies?
functional dependencies allow us to express constraints that cannot be expressed using superkeys
42
How do you change the value in an attribute for all or some tuples?
UPDATE Students SET grade = grade\*1.2 WHERE ...
43
How do you define a minhash function?
Define a minhash function h\_pi(C) = the index of the first(in the permuted order pi) row which column C has value 1
44
How do you add a row into an SQL table? (2 ways depending on the order)
INSERT INTO Students VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2) INSERT INTO Students(name, Sid, login, age, gpa) VALUES (‘Smith’, 53688, ‘smith@ee’, 18, 3.2)
45
What happens if you enter 3 characters into an attribute with CHAR(5)?
it will store the three characters followed by 2 spaces
46
What is a candidate key in terms of functional dependence?
K is a candidate key for R iff. K -\> R AND for any X proper subset of K, X -/-\> R
47
How do you set a foreign key when creating a table in SQL?
CREATE TABLE Works\_In(ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees)
48
What is lossless decomposition?
means when you decompose a table and then join you get the original table back
49
How can you delete a view in SQL?
DROP VIEW
50
What are the 4 types of integrity constraints?
- Primary key - Foreign key (referential integrity) - Value-based - Tuple-based
51
What is a tuple based check?
CHECK(semester IN (‘Fall’, ‘Winter’, ‘Spring’, ‘Summer’) AND (year\>1990))
52
What is the process of shingling?
- get set of shingles - represent as a matrix where rows are shingles and columns are documents - matrix has 1 if the shingle is in the document
53
How do you create a table in SQL?
CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa REAL)
54
What is an alternate key?
a key that could be the primary key but is not classified as the primary key
55
What is an integrity constraint?
a constraint is a relationship among data elements that the DBMS is required to enforce
56
What is a complex check clause?
CHECK(time\_slot\_id IN (SELECT time\_slot\_id FROM time\_slot))
57
What is a superkey in terms of functional dependence?
K is a superkey for relation R iff. K -\> R
58
What is an information integration system
- a set of local databases, each with a local schema and a local instance - A global integrated schema - A set of mappings between the global and local schema
59
How do you delete an entire table including schema in SQL?
DROP TABLE Students
60
How do you delete the contents of a table but keep the relation in SQL?
DELETE FROM Students
61
How do you declare that an attribute can't be null?
name VARCHAR(20) NOT NULL
62
How do you delete certain rows from a table in SQL?
DELETE FROM Students WHERE ...
63
What are the 3 rules to identify lossless decomposition?
Attribute(R1) UNION Attribute(R2) = Attribute(R) Attribute(R1) INTERSECTION Attribute(R2) is not empty Attribute(R1) INTERSECTION Attribute(R2) -\> Attribute(R1) or Attribute(R1) INTERSECTION Attribute(R2) -\> Attribute(R2)
64
How do you add an empty column to an SQL table?
ALTER TABLE Students ADD COLUMN firstYear integer
65
What is the phonetic algorithm/sound encoding algorithm?
- algorithm that indexes names by their sounds when pronounced in English - consists of the first letter of the name followed by three numbers - Remove all W, H - B,F,P,V encoded as 1 - C,G,J,K,Q,S,X,Z encoded as 2 - D,T encoded as 3 - L encoded as 4 - M,N encoded as 5 - R encoded as 6 - Remove vowels
66
How do you perform locality sensitive hashing?
Divide the signature matrix into b bands, consisting of r rows. When rows are the same in the band, they are hashed to the same bucket. They have b opportunities to be hashed to the same bucket -signatures hashed to the same bucket are compared
67
What is jaro-winkler similarity?
-extension of jaro-similarity that gives higher weight to matching prefix Jw(S1, S2) = JaroSim + P\*L\*(1-JaroSim) - P is a scaling factor, 0.1 by default - L is length of common prefix up to max 4
68
What is the levenshtein edit distance?
- number of operations to convert from 1st string to 2nd string - delete and insert character with cost 1 - substitute character with cost 2
69
How do you set a primary key when creating a table in SQL? (2 ways)
CREATE TABLE test (sid INTEGER PRIMARY KEY, name VARCHAR(30), major VARCHAR(30)) CREATE TABLE test (sid INTEGER, name VARCHAR(30), major VARCHAR(30), PRIMARY KEY (sid))
70
What happens as the signatures get longer?
Smaller expected error
71
What are the 3 steps for finding similar documents?
1. Shingling: convert documents to sets 2. Min-hashing: convert large sets to short signatures, while preserving similarity 3. Locality-sensitive hashing: focus on pairs of signatures likely to be from similar documents (candidate pairs)
72
What are entity hierarchies?
organise a group of entity sets into a parent/child hierarchy
73
What is entity resolution?
identify the different structures/records that model the same real-world object