Data Storage and Processing Flashcards
What challenge do most existing IoT solutions face?
Most IoT solutions are tailored to specific verticals, leading to separate data silos, which makes it difficult to capture the full potential of IoT across multiple domains.
Why is handling IoT data from different domains challenging?
IoT data come from various structures, sources, and descriptions, making it complex to integrate and process them properly across different domains.
What is required to ensure interoperability of IoT devices?
IoT data must be stored in different network databases, shared among multiple nodes, analyzed by various tools, and interpreted by different machines to ensure interoperability.
What is the Semantic Web, and how does it help with IoT data?
The Semantic Web, or linked data web, provides reasoning engines and tools to analyze and link IoT data meaningfully across various domains.
What role does complex event processing play in IoT data analysis?
Complex event processing searches for dependencies and patterns in streaming IoT data, creating real-time insights to help businesses identify opportunities and threats early.
Why is a single server insufficient for handling IoT data?
IoT data are often too large for a single server or database to handle, requiring distributed processing approaches like MapReduce.
How does the MapReduce programming model help manage IoT data?
MapReduce distributes datasets across multiple databases to process the data separately and then recombines the results, making it possible to handle large volumes of structured and unstructured IoT data.
How did the web evolve from its initial phase to the Semantic Web (Web 3.0)?
The web started as a collection of documents linked to each other and gradually evolved into the Semantic Web, where documents and pieces of data are meaningfully connected.
What was unclear about the relationships between documents in the early phases of the web?
In the early phases, relationships between documents were unclear because they were not linked to specific pieces of data.
What does the Semantic Web enable for users and machines?
The Semantic Web provides meaningful links between data, allowing users (both humans and machines) to explore and understand connections between pieces of information.
What is linked data, and what does it create?
Linked data refers to semantically linking and integrating pieces of information across domains, creating a global web that connects data on topics like books, companies, and social media.
How do machines use linked data in the Semantic Web?
Machines can connect distributed data sources, process new data as they appear on the web, and produce integrated results, enhancing applications like data browsers and search engines.
What does a generic linked data browser allow users to do?
A generic linked data browser lets users browse a data source and travel along links to related sources, enhancing data exploration.
What capability do linked data search engines provide?
Linked data search engines allow expressive query capabilities over aggregated data by crawling the global web of linked data.
What is linked data?
Linked data refers to machine-readable, well-defined information published on the web that can be connected to external datasets from various sources.
What format is used in linked data technologies to connect information?
Linked data technologies use the Resource Description Framework (RDF) format to create a web of data by linking different things.
What kinds of data sources can linked data technologies connect?
Linked data can connect data sources ranging from geographically distributed database to heterogeneous systems that cannot interoperate at the data level.
Who specified the rules for publishing data as part of the global web of data?
Tim Berners-Lee, the inventor of the World Wide Web, specified the rules for publishing data as part of the global web of data.
What are the four linked data principles as specified by Tim Berners-Lee?
- Use Uniform Resource Identifiers (URIs) as names for things.
- Use HTTP URIs to help people look up the things’ names.
- Use RDF and SPARQL standards to provide useful data.
- Include links to other URIs to help people discover more things.
Which two fundamental web technologies are relied on by the first two linked data principles?
The first two linked data principles rely on Uniform Resource Identifiers (URIs) and Hypertext Transfer Protocol (HTTP).
How does RDF enhance linked data?
RDF supports a generic, graph-based data model that structures and links data describing things in the world, enhancing linked data.
What does the Resource Description Framework (RDF) syntax encode and represent?
RDF encodes and represents web resources and data in a structure known as triples.
What are the three components of an RDF triple?
- Subject: A resource identified by a URI.
- Predicate: A URI specifying the relationship between the subject and object.
- Object: A resource or literal (a basic string value) identified by a URI, related to the subject.
What does the predicate represent in an RDF triple?
The predicate specifies the relationship between the subject and the object, represented by a URI.
What can the object in an RDF triple be?
The object can be either a resource or a literal (basic string value) identified by a URI.
How are subjects and objects in RDF triples similar to hypertext links?
Like hypertext links that connect documents, subjects and objects in RDF triples link items in various datasets, contributing to the web of data.
Give an example of an RDF triple relationship.
An example is “Berlin” (subject) and “Germany” (object) being related through the predicate “is the capital of,” showing that Berlin is the capital of Germany.
What type of relationship exists in RDF between subject and object resources?
RDF defines a unidirectional relationship from the subject to the object resource.
Can a resource in RDF be used in multiple triples? If yes, in what roles?
Yes, a resource can be used in various triples with different roles: as a subject, predicate, or object.
What do multiple connections between RDF triples create?
Multiple connections between RDF triples create a connected graph of data.
In an RDF graph, how are resources and predicates represented?
- Resources are represented as nodes.
- Predicates (relationships between nodes) are depicted with lines connecting the nodes.
What is the significance of the connected graph in RDF?
The connected graph allows for multiple relationships between data points, enabling more complex and meaningful data linkages across different datasets.
What is a major benefit of using centralized systems for RDF datasets?
Benefit: No communication overhead between different nodes, as all data storage and queries are processed on a single machine.
What limits the capabilities of centralized systems in handling RDF datasets?
Limitation: The system is restricted by the memory and computational capacity of the single node.
How do distributed systems improve over centralized systems for RDF datasets?
Improvement: Distributed systems offer larger memory and computational power by utilizing multiple machines.
What are the potential drawbacks of distributed systems when processing RDF data?
Drawback 1: Expensive communication between machines. Drawback 2: Intermediate data shuffling during complex queries can degrade system performance.
What is the DBPedia project, and what does it aim to do?
DBPedia Project: It extracts the structured content of Wikipedia and makes it available in RDF. It allows users to semantically query properties, relationships, and link to related datasets.
How does the DBPedia project improve user experience in applications?
Improvement: Applications can exploit information from other datasets to enhance the user experience by linking related information in RDF triples.
Why is RDF Schema (RDFS) used in conjunction with RDF?
RDF Schema (RDFS) is used to define classes of resources in RDF, enabling the categorization of things into hierarchical classes, which RDF alone does not support.
What is a resource in RDFS, and how is it classified?
A resource in RDFS is an instance of a certain class, and each class can have subclasses with additional descriptions.
Does RDF Schema (RDFS) specify how applications should use the class descriptions?
No, RDFS does not specify how an application should use the descriptions of resources in the classes.
How does linked data facilitate data abstraction in IoT?
Linked data uses common identifiers like International Resource Identifiers (IRIs), which integrate common data structures from various IoT sensors, enhancing data abstraction.
What role do machines play in interpreting linked data in IoT?
Machines can interpret data descriptions by extracting the origin, attributes, and understanding the relationships between the data and other related information.
What is the main purpose of the Internet of Things (IoT)?
The main purpose of IoT is to interpret the semantic data captured from various sources and sensors and transform it into actionable knowledge.
Why are IoT data considered useless?
IoT data are considered useless if they cannot be understood or interpreted, as they must provide meaningful insights to be actionable.
What challenges arise from the heterogeneous nature of IoT data?
The heterogeneous nature of IoT data presents challenges in ensuring interoperability among IoT devices due to the support for different protocols and data formats.
How does the Semantic Web contribute to IoT?
The Semantic Web provides analytical tools and best practices that facilitate data reasoning, help satisfy interoperability requirements, and enable effective integration and analysis of different sources of IoT data.
What is the relationship between the Internet of Things and the Semantic Web?
The relationship between IoT and the Semantic Web results in global interoperability between devices, enabling the generation of new services through effective data integration and analysis.
What are the key open approaches developed by the Semantic Web community for data analytics?
The key open approaches include sharing and reusing open data through linked data, linked vocabularies, and linked services.
How are semantic IoT data stored and managed in the Semantic Web?
Semantic IoT data are stored and managed in RDF databases as RDF graphs.
What language is used for querying and reasoning over RDF graphs?
SPARQL is used for querying and reasoning over the stored RDF graphs.
What is the role of semantic technologies in data analytics?
Semantic technologies help derive meaning from collected data, transforming it into actionable information.