Big Data Lecture 12 Querying Trees Flashcards

Question 1

Q

Why is not true that we should always build as many indices as possible?

Answer

A

Indexing on a certain attribute takes space, and computational resources. So even though it makes possibly the query on the attribute faster, it does not mean we should do it.

Question 2

Q

What is the goal of JSONiq?

Answer

A

We can query unstructured JSON data in data independent way.

Question 3

Q

What does RumbleDB actually do?

Answer

A

Rumble is an engine that connects the logical expressions of JSONiq to physical implementation underneath (HDFS, WCS, you name it!).

Question 4

Q

What kind of language is JSONiq?

Answer

A

<ul><li>Declarative (we say that we want, and it just happens),</li><li>functional (you can nest it like a boss),</li><li>set-based (everything is a sequence).</li></ul>

Question 5

Q

What is a data LakeHouse?

Answer

A

Datalake warehouse, something that can be queried from top-level, but at the lower level it is just a data lake.

Question 6

Q

How is data represented in JSONiq?

Answer

A

Everything is just a sequence of items, even nothing () and a single item (“lol”).

Question 7

Q

What happens when you enter valid JSON into JSONiq?

Answer

A

It evaluates to itself.

Question 8

Q

Can JSONiq item sequences be nested?

Answer

A

No, it naturally removes the nesting, e.g. ((1), 2) == (1, 2).

Question 9

Q

Are all types in JSONiq comparable like in MongoDB?

Question 10

Q

Can we compare two arrays in JSONiq?

Answer

A

Yes, the predicate will be evaluated as an existential quantifier.

Question 11

Q

What are FLWOR expressions?

Answer

A

<ul><li>For,</li><li>Let,<br></br></li><li>Where,</li><li>Order by,</li><li>Return.</li></ul>

Typical pattern in JSONiq, under the hood they are nicely optimized.

Question 12

Q

What are tuple streams in JSONiq?

Answer

A

Formalism to explain combinations of operations, how they chain together, and how they are applied to each other and in order.

Question 13

Q

How are types formally expressed in JSONiq?

Answer

A

Using the given type, plus quantification (+/*/?/nothing if required).

Question 14

Q

How is query of JSONiq parsed in the background?

Answer

A

<ol><li>Abstract Syntax Tree is created,</li><li>using VisitorPattern converted to ExpressionTree,</li><li>the ExpressionTree is optimized for execution,</li><li>conversion to Iterator Tree (volcano iterators).</li></ol>

Question 15

Q

What are volcano iterators?

Answer

A

Objects that we can open(), ask hasNext(), next() and close().

Question 16

Q

Is iterator tree composed only of volcano iterators?

Answer

Study These Flashcards

A

No, there can also be tuple stream in between.

Question 17

Q

What are 3 possible ways of execution?

Answer

Study These Flashcards

A

<ol><li>Materialized execution (everything is materialized on the go before the next iteration of the algorithms is applied),</li><li>streamed execution (only necessary components are materialized for the next step),</li><li>parallel execution (separate execution on different machines, and let those materialize each part).</li></ol>

Question 18

Q

What are the (dis)advantages of each type of execution?

Answer

Study These Flashcards

A

<ul><li>Materialized: memory overhead,</li><li>streamed: function call overhead (time),</li><li>parallel: incompleteness of data, YARN executor initialization overhead. </li></ul>