Chapter 13&14 Knowledge Testers Flashcards

Question 1

Q

Can you explain why joins in relational databases are slow, especially
in the context of traversal?

Answer

A

Does not scale well. indexing foreign keys. traversal of high number of relationships. looking for patterns in relationship will all take a long time

Question 2

Q

Can you explain why, while denormalizing the data partly addresses
the issue of slow joins, this is still not satisfactory?

Answer

A

Denormalizing means precomputing joins, but trees have no cycles

Question 3

Q

Can you explain why reverse traversals are also tricky to support in
a relational database?

Answer

A

supporting indexing of foreign keys can be slow.

Question 4

Q

Can you explain what index-free adjacency is and why it solves the
above problems?

Answer

A

Used to traverse data that has cycles.

Question 5

Q

Can you define a graph mathematically? What about a directed/undirected graph?

Answer

A

nodes, edges. edges can have directions. adjacency matrices can be used to define graph. Maps.

Question 6

Q

Can you give at least three examples of ways that graphs can be
represented in memory?

Answer

A

adjacency list, adjacency matrix, nodes on rows and columns on edges.

Question 7

Q

Can you explain what a labeled property graph is and what it is
made of?

Answer

A

relational tables can be stored as labeled property graphs. Each node has label (table name), nodes are records and properties are attributes.

Question 8

Q

Can you explain the difference between graph databases such as
neo4j on the one hand, and RDF triple stores on the other hand?

Answer

A

Neo4j use labeled property graphs with nodes, relationships, and properties, optimized for graph traversal and flexible schema.

triple store: graph is a list of edges, label of origin node, label of edge, label of target node. (labels can be URIs, atomics, missing)

Question 9

Q

Can you explain the RDF data model?

Answer

A

<div>graph is a list of edges, label of origin node, label of edge, label of target node. (labels can be URIs, atomics, missing)</div>

Question 10

Q

Can you give the usual constraints on RDF graphs and explain how they can be lifted into generalized graphs?

Answer

A

Literals can only be objects, but dont allow literals as subjects and impose strict rules on the subject, predicate, object format. Generalized RDFs lift these constraints allowing literals as subjects and relaxing URI rules to increase flexibility.

Question 11

Q

Can you explain how query-by-example is different from a classical query language?

Answer

A

Query-by-example allows users to construct queries by interacting with data through a graphical or example-based interface rather than writing explicit syntax. Classical query languages like SQL or SPARQL require users to write structured queries using defined syntax.

Question 12

Q

Can you name a few graph querying languages and relate them to products like Neo4j or RDF triple stores?

Answer

A

Cypher is used in Neo4j to query property graphs, while SPARQL is used in RDF triple stores to query RDF data. Both are specialized for graph traversal and manipulation but align with different graph models.

Question 13

Q

Can you sketch the physical architecture of Neo4j?

Answer

A

Neo4j’s architecture includes a storage engine (for nodes, relationships, and properties), a transaction log, a query engine, and clustering for high availability. Data is stored in a graph-native format optimized for traversal.

Question 14

Q

Can you explain how properties and relationships are physically stored in Neo4j?

Answer

A

Properties are stored as key-value pairs in property records, linked to nodes or relationships. Relationships are stored as records with pointers to start and end nodes, enabling efficient traversal.

Question 15

Q

Can you explain why sharding graphs is a hard problem (even though it is now largely solved)?

Answer

A

Sharding graphs is hard because relationships often span shard boundaries, leading to high communication overhead and complex query execution. Solutions like partitioning by community detection or replication have reduced this complexity.

Question 16

Q

Can you explain what OWL is and briefly explain how it can leverage RDF to build ontologies and semantic reasoning?

Answer

A

OWL (Web Ontology Language) extends RDF to define complex relationships, classes, and constraints, enabling richer semantics. It supports reasoning engines to infer new knowledge from existing data, enhancing the semantic web.

Question 17

Q

Can you explain the main differences between an OLTP environment and an OLAP environment?

Answer

A

OLTP - onLine transaction processing. day-to-day requirements to sustain a high pace of updates while maintaining consistency. A lot of writing and updating. Better for normalized data. Quick, need now!<br></br><br></br>OLAP - onLine Analytical Processing. Reading. Frozen, read only view of the data is sufficient for analysis purposes. Business intelligence. Better for denormalized data. slow.

Question 18

Q

Can you describe the data cube model?

Answer

A

Data cube consists of cells called facts arranged across multiple dimensions. Each cell corresponds to a combination of dimensions (dimensional coordinates). Usually dense.

Question 19

Q

Can you describe the operations performed on data cubes?

Answer

A

slicing: focusing on one dimension<br></br>dicing: Make a grid fact table form by rearranging table. In practice, there are 2 dicers. You can have one, or three. three<br></br>roll up: Aggregation of facts along one dimension. <br></br>drill down: Level of granualirty. drill down on years to show by months, weeks etc.

Question 20

Q

<div>Can you explain how to store a cube into relational tables? </div>

Answer

A

Star schema. Main fact table and then additional relational tables available, usually one per dimension. Satelite tables contain additional information about the dimensions.

Question 21

Q

<ul>
<li>Can you express slicing and dicing queries using SQL? </li></ul>

Question 22

Q

Can you express slicing and dicing queries using SQL?

Answer

A

slicing: where clause

Question 23

Q

If somebody else gives you a SQL query that slices and dices, can you find out which dimensions are slicers and which ones are dicers?

Answer

A

slicers are the where clauses.<br></br>dicers: whatever is not a slicer

Question 24

Q

Can you explain what roll-up and drill-down means?

Answer

A

Roll-up is aggregation based on compressing a dimension<br></br>Drill-down changing granularity of the data