Lecture 2 - Knowledge Graphs Flashcards

1
Q

Data management

A
  • Becoming essential when organizations aim to be data-driven.
  • Becoming also a huge challenge with Big Data (5 V’s)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

We achieve good data quality through (Cf. model Verhoef):

A

•Governance and leadership - defined roles and responsibilities to ensure accountability for data
quality with policies and procedures in place to support the process
•Systems and processes - in place that secure the quality of data. Cf. auditing (next week)
•People and skills - train staff so they have the appropriate knowledge, competencies and capacity
for their roles
•Data use - the purpose of collecting and reporting robust, good quality data is to inform
management, make improvements to service delivery and to promote accountability to customers,
stakeholders, local residents and Government
•Data security - data collected must be secure and should only be used for authorised purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Quality dimensions

A
• Accuracy
• Completeness
• Consistency
• Timeliness
• Validity (potential to be accurate, 
e.g., right datatype)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Steps in

A

Step 1 - Separate and manage master data
Step 2 - Cleanse the data
Step 3 - Standardize
Step 4 - Publish (Open Data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Step 4 - Publish (Open Data)

A

• Make (selected) data sets available within the enterprise, business
network or to the world
• Open data allows others to build new services, combine data etc.
• Open data is more and more expected from government agencies
• Note similarities and differences with traditional “data integration”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The problem (the need to link open data)

A
  • Data everywhere
  • Relevant data is scattered over many files and applications
  • For many tasks, data from multiple sources needs to be used together
  • For many tasks, data needs to be re-used out of context
  • Exchange across systems, departments, organizations
  • No “integrated schema”
  • No centralized data governance possible anymore when you cross organizational borders
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Solution: Linked Data (now called Knowledge graphs)

A
  • URIs: Universal Identifiers for everything - object identification
  • RDF: HTML (markup language) for Linked Data - data representation
  • SPARQL: SQL for Linked Data - data retrieval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Triples

A
  • All information can be broken down into simple “Subject-Predicate-Object” triples.
  • Thing –Attribute – Value
    • This course has name “Business Analytics Emerging Trends”
    • This lecture has date “2020-12-07”
  • Things – Relationship – Thing
  • This lecture location is Room WZ 104
  • This lecture teacher is Weigand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Things are identified by URIs

A
• Benefits of using URIs
  • Globally unique
  • Decentralized – doubles are not prevented, but can be resolved easily using “same-as” 
relationship
  • Resolvable (use browser)
• Costs of using URIs:
  • Can be long and ugly
• Can use international alphabets nowadays
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to query across data sources

A
  1. Make all data sources available as RDF
    (RDF is usually not the primary data representation)
  2. Put them into a single store
    ( Physical or virtual, ad hoc or persistent)
  3. Execute SPARQL queries!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Knowledge Graphs in Web Search

A

• KGs are already used heavily by e.g., Google. Some predict that in 5
years time, the Google interface has completely changed (voice
interface, no 2,300,576 web page results but only facts and ads.
• The Winterthur example (Denny Vrandecic, 2020)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Knowledge Graphs in Web Searches problem

A

• Some twin towns are included in the Winterthur description.
• The Ontario page mentions Winterthur as Sister City, in a text description.
• There is not easy way to resolve the differences.
• Solution: Wikidata - publicly curated Knowledge Graph, where the relationship
is modeled as being symmetric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Artificial Intelligence

A

• KG can be the output representation for
• Natural Language Processing
• Computer Vision
• KG can be input for several AI tasks
• Simple reasoning
• Based on properties of the relationships. E.g., “sister city” is symmetric.
• Machine Learning
• Requires conversion from KG to numerical input: word embeddings, graph embeddings
• The ML results can be embedded again into the KG for link prediction.
– For example, (?, StarringIn, Terminator) is to predict the stars of the film Terminator when the data is
incomplete.
• Chatbots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Edge Detection

A

Man (circled): wearing glasses (circled)
feeding horse (circled)
horse (circled) eating from bucket (circled)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Contribution KG to Conversational AI

A
• First of all, the contribution of KG is 
providing more data from 
heterogeneous sources, including 
personal data (personalization)
• KG data can also be used to generate 
queries that can be used to train the 
(ML-based) NL Interpreter
• KG data can be used to improve the 
Intention finder, by attaching domain-
specific intentions to objects.
   • Example: table reservation intention for 
restaurant objects in touristic chatbot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Conclusion lecture 2 Knowledge Graphs

A

• Big Data challenges traditional data management solutions
• Need for broadly-standardized data (from company or supply chain standards
to global standards) – LOD, RDF
• Standards like RDF implement knowledge graphs, that is, networks of atomic
data triples.
• Knowledge graphs are used in AI and use AI (ML, NLP)
• Enterprises should be more aware of the potential of knowledge graphs

The need for data integration is not less than before, but the solution direction
shifts from closed centralized to open distributed.
Information Management should live with the decentralized reality and should still
care about data management (data quality, consistency).