Week 11 Flashcards
What is the first star in Linked Open Data?
vailable on the web with an open license
Example: Government population dataset published on a website with Creative Commons license
What does the second star add to open data?
Data must be machine-readable structured data (e.g., Excel instead of PDF)
What does the third star require?
Use of non-proprietary format (e.g., CSV instead of Excel)
Example: Population data in CSV:
city,population_2020,population_2021
New York,8336817,8467513
Los Angeles,3898747,3923341
What standards are required for the fourth star?
Back: Use W3C open standards (RDF and SPARQL) to identify things
Example: Population data in RDF:
turtleCopy@prefix city: http://example.org/city/ .
@prefix pop: http://example.org/population/ .
city:NewYork pop:population2020 8336817 ;
pop:population2021 8467513
: What makes data achieve five stars?
Link your data to other people’s data to provide context
Example: Extended RDF with links:
turtleCopycity:NewYork
pop:population2020 8336817 ;
owl:sameAs http://dbpedia.org/resource/New_York_City ;
geo:location http://geonames.org/5128581/ .
explain linked data life cycle
Answer:
1. GENERATE
What: Creating initial data that will become linked data
Format: JSON, XML, HTML, etc.
Example: Creating a JSON file with book information
{
“title”: “1984”,
“author”: “George Orwell”,
“published”: “1949”
}
- VALIDATE
Three levels of checking:
Individual Fields
Spell checking
Data type verification
Format validation (e.g., ISBN format)
Structural
Required fields present
Proper nesting/hierarchy
Data integrity
Semantic
Logical consistency (e.g., publication date within author’s lifetime)
Domain/range checks
Relationship validity
- PUBLISH
Convert data to RDF triples
Make available as SPARQL endpoint
Key Feature: Live, real-time queryable database
Example: Publishing book data so others can query it instantly
- QUERY
Ability to search and retrieve data
Uses SPARQL query language
Example Query:
SELECT ?book WHERE {
?book author “George Orwell”
}
5. ENHANCE
What it IS:
Adding new relationships
Creating external links
Enriching with additional metadata
Example: Linking books to related works
REMEMBER THE CHARACTERISTICS FOR DATA DUMP AND SPARQL ENDPOINTS ARE COMPLETE OPPOSITES
NOTE
What is a data dump in linked data publishing and what are its characteristics?
Data Type: “Old” (static) data
Access Method: Download entire dataset at once
Bandwidth Usage: High (must download everything)
Availability: High (simple file download)
Client Cost: High (needs resources to process full dataset)
Server Cost: Low (just hosting files)
: What is a SPARQL endpoint and what are its characteristics?
A SPARQL endpoint is a live service that accepts queries and returns specific results.
Key Characteristics:
Data Type: Live (real-time) data
Access Method: Query-based (get only what you need)
Bandwidth Usage: Low (selective data retrieval)
Availability: Lower (service might be down)
Client Cost: Low (minimal processing needed)
Server Cost: High (maintaining query service)
Example:
sparqlCopySELECT ?book WHERE {
?book author “Terry Pratchett”
?book published “2000”
}
This gets only specific books instead of downloading the entire library catalog.