Big Data Lecture 07 Data Models Flashcards

1
Q

How is JSON/XML stored in memory?

A

Using a tree:<br></br><ul><li>JSON: Nodes are either values, objects or array, edges are keys in objects with annotations on them (graph is directed downwards).</li><li>XML: Nodes are tags, whose values is the text inside, the attributes and their values are attached to the respective node (graphs is undirected).</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is XML information set?

A

The way parse tree is stored in memory:<br></br><ul><li>Document Information Item (child is the main element, stores version metadata),</li><li>Element Information Item (stores local name, children, attributes, parent),</li><li>Text Information Item (stores character string, and owner element),</li><li>Attribute Information Item (stores local name, normalized value (without quotes) owner element).</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is validation and how does it relate to well-formedness?

A

Well-formedness is against a language, validation is done later, it is our constraint on the structure and values within the language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between validation and annotation?

A

Validation is just True/False, while annotation is screening the data and adding metadata/changing value structure to prepare the data. Annotation also includes conversion of the data into correct data types in the storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we do data validation?

A

When data is validated against a schema, it is not heterogeneous anymore, it is homogeneous instantly (w.r.t. to the schema) and we can use that to pre-load/query the data faster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are cardinality markers?

A

<ul><li>Required, must be exactly once,</li><li>repeated, *, zero or more,<br></br></li><li>optional, ?, zero or one,</li><li>no name, +, one or more.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is JSound?

A

Validation schema for JSON:<br></br><ul><li>type the wanted type using quotes: “string”, “integer”,</li><li>everything is optional by default, put a “!” up fron to make it required,</li><li>use “item” if you want to store anything you want,</li><li>you can nest [] and {} just like in any item.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is data set heterogeneous or homogeneous?

A

If it follows a schema, it is homogeneous, with respect to the schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

HBase runs on top of HDFS, how so it is still fast?

A

<ol><li>It is storing stuff using MemStore and cache, which are fast,</li><li>it shortcircuits DataNodes in HDFS.</li></ol>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between atomic and structured types?

A

<ul><li>Atomic cannot be reduced, e.g. int,</li><li>structured are nested, e.g. arrays and objects.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What atomic types are there? (7)

A

<ul><li>Strings,</li><li>numbers,</li><li>booleans,</li><li>dates and times,</li><li>time intervals,</li><li>binary,</li><li>null.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between lexical and value space?

A

Value space is the actual value, lexical value is the encoding in characters.<br></br><br></br>There can be a big number of lexical values connected to one actual value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference and relation of subtype and supertype?

A

Subtype’s value space is a subset of supertype’s value space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What structured types are there? (2)

A

Maps (e.g. JSON Object),<br></br>and Lists (e.g. JSON Array).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When is adding of default values done in the schema?

A

During the annotation phase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

On which data representation is schema validation done?

A

It is done when the document is well-formed already on the stored representation in the memory.

17
Q

What are the rules for JSON Schema?

A

{<br></br>    “type” : “here write the type”,<br></br>    “required”: “here specify required properties”<br></br>    “properties”: {<br></br>                         “fake_name”: { “type” : “give type here”}<br></br>                         }<br></br>    “additionalProperties”: “here define if there can be more stuff, true by default”<br></br>}<br></br>Making type true makes it validate with anything, it will be always okay. Type false means that this attribute must not exist.

18
Q

What is impedance mismatch?

A

If lexical values in the file do not correspond well do the values represented in the programming language.

19
Q

How does XML validation work?

A

We have schemas stored in the domain xmlns:xs=”https://www.w3.org/2001/XMLSchema” and we use those to validate our schema.<br></br><br></br>We use complexType to make stuff with attributes.<br></br><br></br>We use sequence for repeated elements, we specify repetitions for each element in the sequence.

20
Q

What is schema of schemas?

A

Usually, JSON schema is also a JSON document. Same for XML schema, or JSound. So there is, naturally, also schema for validating the schema. This is called <i>schema of schema!</i>

21
Q

What are DataFrames?

A

Collections of valid JSON objects.

22
Q

What is a dataset from an abstract perspective?

A

A list of maps!

23
Q

What is a table w.r.t. a dataframe?

A

It is a special type of a dataframe, one that has no nesting.

24
Q

What is Parquet?

A

JSON compression schema, it does not store attribute names once again, to save space.

25
Q

What is P1DT7H2M3S?

A

Lexical value of a date.

26
Q

Can XML without a validation schema be valid?

A

No, there is nothing to validate it against.