4. Data Storage and Processing Flashcards

1
Q

Explain linked data.

A

Set of machine-readable and well-defined information published on the web that can be connected to external data sets from different sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does RDF stand for?

A

Resource Description Framework.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four linked data principles?

A
  1. Use Uniform Resource Identifiers(URI) as names for things.
  2. Use HyperText Transfer Protocol (HTTP) URIs (or uniform resource locators[URL]) to help people look up the things’ names.
  3. Use the resource description framework(RDF), SPARQL protocol, and RDF Query Language standards to provide useful data to people who look up a URI.
  4. Include links to other URIs to help people discover more things.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name and explain the two categories used in large RDF datasets.

A
  1. Centralized systems: RDF data are stored and processed on a single machine.
  2. Distributed systems: RDF data are stored and processed on multiple machines.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain what an RDFS is.

A

RDF Schema is used to define the resources belonging to hierarchical classes representing a category of things.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In what two ways do linked data facilitate abstraction, interoperability, and integration in IoT?

A

First it enables integration by using common identifiers, such as international resource identifiers (IRIs).
Second, linked data machines can interpret the data descriptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Read “4.2 Analysis of networked data using a semantic reasoner” in the course book.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name a few semantic technologies.

A

Linking data, Real-time and linked stream processing, Logic, Machine learning, and Semantic-Based distributed reasoning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does LER stand for?

A

Linked Edit Rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is distributed reasoning more advantageous than centralized reasoning?

A
  1. Data are distributed both logically and physically.
  2. The communication costs are negligible compared to the problem solution costs.
  3. There is collaboration between the system’s components to solve problems.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does CDRS stand for?

A

Cross-Domain Reasoning Systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is CDRS?

A

It uses data gained from multiple source domains to provide recommendations in a target domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name some semantic analytical tools for IoT.

A
  1. Prefix.cc. This simplifies the RDF development process by looking up the URI prefixes.
  2. rdf-vocab. This open-source project, used by RDF developers, looks up and searches for linked data vocabularies.
  3. W3C RDF Validator. This online service checks and visualizes the RDF documents.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is CEP?

A

Complex Event Processing (Stream Processing) is a set of techniques used to aggregate, process, and analyze huge amounts of streaming data to generate real-time insights from those events as they happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between CEP and stream processing?

A

Complex event processing searches for complex patterns and dependencies of different events to identify a particular event.
Stream processing aggregates data in time windows and collects data on a single event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the five characteristics of big data?

A
  1. Volume indicates the size of the data produced by various sources.
  2. Velocity indicates the rate at which data are generated.
  3. Variety indicates whether data are structured, unstructured, or semi-structured.
  4. Veracity indicates the uncertainty, correctness, and trustworthiness of the data.
  5. Value indicates whether the data can garner useful insight.
17
Q

What is MapReduce?

A

It’s a programming algorithm and batch-processing method designed for distributed computing.

18
Q

What two major functions does the Map Reduce function perform?

A

Map: Breaks down individual elements of a dataset into tuples (key-value pairs) and converts the set into another set of data. Unstructured data to structured data.
Reduce: Uses the results of the map stage to combine the data tuples into smaller sets of tuples.

19
Q

What does NoSQL stand for?

A

No-relational structured query language.

20
Q

How are NoSQL data stored?

A

A document, key-value, or graph model.

21
Q

What are the four categories for NoSQL databases?

A

Key-value store, column-oriented, document-oriented, and graph-oriented databases.

22
Q

What does JSON stand for?

A

JavaScript Object Notation.