Introduction to Graph Data and Ontologies Flashcards
What is Data integration?
The analysis of genomic, imaging or other
types of data allows us to investigate different
facets of human health.
• But in order to gain a comprehensive
understanding of human health, we need to
integrate such data
What are the challenges to data integration?
- Biomedical and healthcare datasets sit in
silos. - Linking entities between different datasets is
not a trivial task.
In Scotland, we use CHI Numbers to uniquely
identify patients.
But how about sharing data between different
countries? - Ambiguity around the meaning of
different terms.
What are graph databases?
Graph Databases use graph structures with
nodes, edges and properties to represent and
store data
What are nodes?
T
What are edges?
T
What are properties?
T
What is the RDF graph data model?
Data is represented in the form of triples, i.e.
statements consisting of a subject, a predicate
and an object.
-Subject
-Predicate
-Object
Describe RDF triple visualisation
T
What are URIs?
In RDF, we use URIs (Uniform Resource Identifiers) to uniquely identify concepts and entities. • Examples: http://dbpedia.org/resource/Edinburgh http://xmlns.com/foaf/0.1/age • URIs are used for both resources and properties.
How to use existing URIs
DBPedia (http://dbpedia.org) is a very good source of URIs.
• Every resource that is the subject of a page in Wikipedia has a
corresponding URI in DBpedia.
• URI forEdinburgh:
http://dbpedia.org/resource/Edinburgh
How to create your own URIs
If you don’t own a domain name, you can use
http://example.com/
http://example.com/id/EwanMcGregor
Keep it simple
How to merge RDF data
By uniquely identifying resources with the use
of URIs, we can easily link data about the
same resource.
• Merging different RDF datasets is simply a
matter of bringing the two sets of RDF
statements together
Dataset3 = Dataset1 + Dataset2
How to write RDF statements in Turtle
Turtle (Terse RDF Triple Language): One of the
most popular forms of syntax for expressing
RDF.
• General form:
subject predicate object
What is Turtle?
Turtle (Terse RDF Triple Language): One of the
most popular forms of syntax for expressing
RDF
Whitespace and full stop
When using URIs, these should be enclosed in
angle brackets, e.g.
What is ontology?
A formal, explicit
specification of a shared conceptualisation.
• Essentially, a way of encoding domain
knowledge.
• Something like an enhanced dictionary, where
you can look up the meaning of different
concepts and find relations between them.
What are the components of ontology?
Classes (e.g. Woman)
Individuals (e.g. Lucy)
Attributes (e.g. Age)
Relations (e.g. MotherOf)
• Ontologies often contain a class taxonomy.
• Formal definitions of classes may also be
included.
Why are ontologies useful?
- Allow us to attach meanings to data
- Enable standardisation of terminology
- Allow us to infer new knowledge from existing data
What is Gene Ontology?
It represents information about biological
processes, cellular components and molecular
functions.
What is Disease Ontology?
It provides descriptions of human disease terms,
phenotype characteristics and related
medical vocabulary disease concepts
What is SNOMED-CT?
It is a collection of medical terms. It includes
codes, terms, synonyms and definitions used in
clinical documentation and reporting.
It is considered to be the most comprehensive,
multilingual clinical healthcare terminology in the
world.