Question 1

How is JSON/XML stored in memory?

Accepted Answer

Using a tree:JSON: Nodes are either values, objects or array, edges are keys in objects with annotations on them (graph is directed downwards).XML: Nodes are tags, whose values is the text inside, the attributes and their values are attached to the respective node (graphs is undirected).

Question 2

What is XML information set?

Accepted Answer

The way parse tree is stored in memory:Document Information Item (child is the main element, stores version metadata),Element Information Item (stores local name, children, attributes, parent),Text Information Item (stores character string, and owner element),Attribute Information Item (stores local name, normalized value (without quotes) owner element).

Question 3

What is validation and how does it relate to well-formedness?

Accepted Answer

Well-formedness is against a language, validation is done later, it is our constraint on the structure and values within the language.

Question 4

What is the difference between validation and annotation?

Accepted Answer

Validation is just True/False, while annotation is screening the data and adding metadata/changing value structure to prepare the data. Annotation also includes conversion of the data into correct data types in the storage.

Question 5

Why do we do data validation?

Accepted Answer

When data is validated against a schema, it is not heterogeneous anymore, it is homogeneous instantly (w.r.t. to the schema) and we can use that to pre-load/query the data faster.

Question 6

What are cardinality markers?

Accepted Answer

Required, must be exactly once,repeated, , zero or more,optional, ?, zero or one,no name, +, one or more.

Question 7

What is JSound?

Accepted Answer

Validation schema for JSON:type the wanted type using quotes: "string", "integer",everything is optional by default, put a "!" up fron to make it required,use "item" if you want to store anything you want,you can nest [] and {} just like in any item.

Question 8

When is data set heterogeneous or homogeneous?

Accepted Answer

If it follows a schema, it is homogeneous, with respect to the schema.

Question 9

HBase runs on top of HDFS, how so it is still fast?

Accepted Answer

It is storing stuff using MemStore and cache, which are fast,it shortcircuits DataNodes in HDFS.

Question 10

What is the difference between atomic and structured types?

Accepted Answer

Atomic cannot be reduced, e.g. int,structured are nested, e.g. arrays and objects.

Question 11

What atomic types are there? (7)

Accepted Answer

Strings,numbers,booleans,dates and times,time intervals,binary,null.

Question 12

What is the difference between lexical and value space?

Accepted Answer

Value space is the actual value, lexical value is the encoding in characters.There can be a big number of lexical values connected to one actual value.

Question 13

What is the difference and relation of subtype and supertype?

Accepted Answer

Subtype's value space is a subset of supertype's value space.

Question 14

What structured types are there? (2)

Accepted Answer

Maps (e.g. JSON Object),and Lists (e.g. JSON Array).

Question 15

When is adding of default values done in the schema?

Accepted Answer

During the annotation phase.

Big Data Lecture 07 Data Models Flashcards

(26 cards)