Big Data Lecture 07 Data Models Flashcards
How is JSON/XML stored in memory?
Using a tree:<br></br><ul><li>JSON: Nodes are either values, objects or array, edges are keys in objects with annotations on them (graph is directed downwards).</li><li>XML: Nodes are tags, whose values is the text inside, the attributes and their values are attached to the respective node (graphs is undirected).</li></ul>
What is XML information set?
The way parse tree is stored in memory:<br></br><ul><li>Document Information Item (child is the main element, stores version metadata),</li><li>Element Information Item (stores local name, children, attributes, parent),</li><li>Text Information Item (stores character string, and owner element),</li><li>Attribute Information Item (stores local name, normalized value (without quotes) owner element).</li></ul>
What is validation and how does it relate to well-formedness?
Well-formedness is against a language, validation is done later, it is our constraint on the structure and values within the language.
What is the difference between validation and annotation?
Validation is just True/False, while annotation is screening the data and adding metadata/changing value structure to prepare the data. Annotation also includes conversion of the data into correct data types in the storage.
Why do we do data validation?
When data is validated against a schema, it is not heterogeneous anymore, it is homogeneous instantly (w.r.t. to the schema) and we can use that to pre-load/query the data faster.
What are cardinality markers?
<ul><li>Required, must be exactly once,</li><li>repeated, *, zero or more,<br></br></li><li>optional, ?, zero or one,</li><li>no name, +, one or more.</li></ul>
What is JSound?
Validation schema for JSON:<br></br><ul><li>type the wanted type using quotes: “string”, “integer”,</li><li>everything is optional by default, put a “!” up fron to make it required,</li><li>use “item” if you want to store anything you want,</li><li>you can nest [] and {} just like in any item.</li></ul>
When is data set heterogeneous or homogeneous?
If it follows a schema, it is homogeneous, with respect to the schema.
HBase runs on top of HDFS, how so it is still fast?
<ol><li>It is storing stuff using MemStore and cache, which are fast,</li><li>it shortcircuits DataNodes in HDFS.</li></ol>
What is the difference between atomic and structured types?
<ul><li>Atomic cannot be reduced, e.g. int,</li><li>structured are nested, e.g. arrays and objects.</li></ul>
What atomic types are there? (7)
<ul><li>Strings,</li><li>numbers,</li><li>booleans,</li><li>dates and times,</li><li>time intervals,</li><li>binary,</li><li>null.</li></ul>
What is the difference between lexical and value space?
Value space is the actual value, lexical value is the encoding in characters.<br></br><br></br>There can be a big number of lexical values connected to one actual value.
What is the difference and relation of subtype and supertype?
Subtype’s value space is a subset of supertype’s value space.
What structured types are there? (2)
Maps (e.g. JSON Object),<br></br>and Lists (e.g. JSON Array).
When is adding of default values done in the schema?
During the annotation phase.