Chapter 5&6 Knowledge Testers Flashcards

1
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is syntax relevant to data management and Big Data?

A

We use syntax to store data. JSON and XML syntax allow us to nest. Proper/consistent syntax will prevent data loss and errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are examples of syntax for trees and tables?

A

Trees: JSON, XML
Tables: CSV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is well-formedness with respect to syntax?

A

Can be compiled by the interpreter. It must be formatted correctly; it may not be valid against a schema, but must be well-formed to be valid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the JSON basic building blocks?

A

Object - {}, {“hello”:1,”hi”:[1,2,3.4], “bye”:{“goodbye”:2}}
Array - [“hi”,”bye”, 1, [2.4]]
String - “hello”
Number - 32, 2.4, -5.3E45, 0.23
Boolean - true, false
Null - null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What to check to see if a JSON document is well-formed?

A

Strings: double quotes. String escapes used for “. No quotes for any other type. All open brackets have matching close brackets. Commas between elements in an array or object. All keys are strings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the XML basic building blocks?

A

Element - for tagging, must close inner elements before outer elements. Well-formed: <hi>welcome!</hi> or <hi></hi>. Attribute - appears in open element tag, key-value pair. Text - inside elements. Comment - <!-- Comment -->. Document - identified with an optional text declaration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the five fundamental pre-defined escapes in XML?

A

< <br></br>> <br></br>" “ <br></br>' ‘<br></br>& & <br></br>Used in text and attribute values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When must some characters be escaped in XML?

A

MANDATORY: everywhere in text and attribute values & and < must be escaped. In double quoted attribute values “ must be escaped, in single quoted attributes ‘ must be escaped.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do XML namespaces work?

A

Used to group elements and attributes in packages. xmlns in an opening element is a namespace declaration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to determine if two XML documents are equivalent according to namespace semantics?

A

Compare qnames (local name, namespace, and prefix). If local name and namespace are the same, they are semantically the same, but prefix is ignored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to determine if an XML document is well-formed in terms of namespaces?

A

All namespace-prefix bindings are on the root element.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are common patterns for using XML namespaces?

A

xmlns and xmlns:prefix only in the top-level element. Prefix-namespace mapping is bijective. Do not mix default namespace with namespaces associated with prefixes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What limitations of the traditional relational model do wide column stores overcome?

A

If you have objects of size bytes to 10MB, then they are too big for clobs/blobs in relational models. Too small for dfs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the differences and similarities between a wide column store and the traditional relational model?

A

Wide column provides more control over performance to achieve high throughput and low latency for objects that are bytes up to 10MB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why are wide column stores called wide column stores?

A

They want to avoid joins, so denormalize by precomputing the joins, thus the number of columns is increased.

17
Q

What are the two ways data can be distributed?

A

Partitioning - splitting a file into blocks. Replicating - ???

18
Q

What is the data model behind wide column stores?

A

Key is four-dimensional: rowid, column family, column qualifier, version. Rows have row ids, can be sorted and compared.

19
Q

What is the motivation behind denormalizing data into several column families?

A

Keep related info together, but avoid expensive joins.

20
Q

What aspects of a table in a wide column store must be known in advance?

21
Q

Who are the big players in wide column stores?

A

Google - HBase

22
Q

Why is HBase based on HDFS yet low-latency?

A

It is an enhanced key-value store, with a 4D key. It stores together what is accessed together, reducing latency.

23
Q

What are regions in wide column stores?

A

A group of rows defined by an upper and lower row key.

24
Q

What are the four basic kinds of (low-level) queries in HBase?

A

Get - retrieve a row by specifying a table and row ID. Put - put a new value in a cell. Scan - query a whole table or part of a table. Delete - delete specific value.

25
Q

How to identify a region based on the content of the wide column store?

A

By a lower (inclusive) and upper (exclusive) key.

26
Q

What is the physical architecture of a wide column store like HBase?

A

Same centralized architecture as HDFS. HMaster and RegionServer are processes running on a node.

27
Q

What are the physical layers of HBase?

A

Table - rows and columns. Region - group of consecutive rows. Store - Intersection of region and column family. Memstore - HFile - All cells within a store are persisted on HDFS in HFiles.