Big Data Lecture 05 Syntax Flashcards

1
Q

Why do we store data as text?

A

Because it is readable and modifiable by humans, which makes it nice for deciphering now and in the future. 

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we encode text into binary?

A

Using encodings:<br></br><ul><li>ASCII,</li><li>ISO Latin 1,</li><li>UTF-8 (default 8 bit per char),</li><li>UTF-16 (default 16 bit per char).</li></ul><div>In UTF, more blocks can identify the same character if they start with appropriate bits.</div>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is CSV? What is its syntax?

A

Comma Separated Values format: one record per row, indicated with numbers, with rows.<br></br><br></br>To escape spaces and commas, we put quotes around values.<br></br><br></br>To escape quotes, we put double quotes into the text instead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why can we allow ourselves to denormalize in Big Data?

A

Normally, e.g. e-commerce, we have a WRITE intensive system, where we can to make sure that we do not have anomalies.<br></br><br></br>However, for search/AI training, we need avoid expensive joins, we have a READ intensive system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What can happen when we have denormalized data?

A

The table suddenly has:<br></br><ul><li>extra values (columns),</li><li>missing values,</li><li>invalid values,</li><li>nested values (table in a atables)</li><li>nested heterogeneous values.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we describe semi-structured data?

A

It is a heterogenous collection of arborescent (tree-like) items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does JSON and XML compare?

A

<ul><li>JSON is newer, and considered cooler,</li><li>XML is older, and has wider usage and implementation everywhere.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does it mean to be well-formed?

A

Syntax produces a language, and if a string belongs to a language, it is well-formed in this language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are JSON building blocks?

A

<ul><li><i>Strings</i>, have to be quoted, extra characters (including unicode) have to be escaped,</li><li><i>Numbers</i>, lexically stored and have to be interpreted, can have scientific (with E in it),</li><li><i>Booleans</i>, true or false, not quoted,</li><li><i>Null</i>,</li><li><i>Array</i>, enclosed as [ ], does not have to have homogeneous items,</li><li><i>Object</i>,<i>&nbsp;</i>map from quoted strings to anything, keys SHOULD NOT be duplicated.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are rows from a database stored in JSON file?

A

As JSON Lines: one object per line, casting string attribute to its value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does XML syntax work?

A

<ul><li>We have opening &lt;tags&gt; and closing &lt;/tags&gt;, like in HTML, an empty tag is denoted like &lt;this/&gt;.</li><li>A tag &lt;can have="value" /&gt;, or multiple of them, with different keys, those cannot be other attributes.</li><li>&lt;a&gt; tag can store text also. &lt;/a&gt;, that makes it love by publishers, who can annotate text.</li><li>&lt;---! Comments have this notation. --&gt; They are kept upon parsing, not like in programming languages.</li><li>At the start &lt;? version="1.0" encoding="UTF-9"?&gt; is stated, but that is optional.</li><li>In each document, there is EXACTLY one top level element, no text around it. If DOCTYPE is specified, it matches the name of this element.</li><li>Inside the main element text and tags can be mixed, but tags always have to be well bracketed.</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What characters have to be escaped in XML?

A

<ul><li>&lt; as &amp;lt; (in text, and always in attributes)</li><li>&gt; as &amp;gt; (in text, and always in attributes)</li><li>' as &amp;apos; (in text, and single-quote attributes)</li><li>" as &amp;quot; (in text, and double-quote attributes)</li><li>&amp; as &amp;amp; (in text, and always in attributes)</li></ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Are whitespaces relevant in XML and JSON?

A

No, but we should format nicely for others to make it readable!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are XML namespaces? How are they implemented?

A

To attach a certain element to a domain and avoid conflicts we attach variables to namespaces.<br></br><br></br>We denote them using qualified name <prefix:localname xlmns:prefix=”NAMESPACE URI”/>.<br></br><br></br>If we do not specify prefix, we get default namespace on all subelements by default. We can also apply the prefix to the elements, but that is usually done with more namespaces in one element (and SHOULD be done).<br></br><br></br>You SHOULD specify all of the namespaces in the root element.<br></br><br></br>That is why we cannot have attributes starting with xml.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is YAML?

A

Yet-another-markup-language, but it is very intuitive!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly