Lecture 2 - Knowledge Graphs Flashcards

Question 1

Q

Data management

Answer

A

Becoming essential when organizations aim to be data-driven.
Becoming also a huge challenge with Big Data (5 V’s)

Question 2

Q

We achieve good data quality through (Cf. model Verhoef):

Answer

A

•Governance and leadership - defined roles and responsibilities to ensure accountability for data
quality with policies and procedures in place to support the process
•Systems and processes - in place that secure the quality of data. Cf. auditing (next week)
•People and skills - train staff so they have the appropriate knowledge, competencies and capacity
for their roles
•Data use - the purpose of collecting and reporting robust, good quality data is to inform
management, make improvements to service delivery and to promote accountability to customers,
stakeholders, local residents and Government
•Data security - data collected must be secure and should only be used for authorised purposes

Question 3

Q

Data Quality dimensions

Answer

A

• Accuracy
• Completeness
• Consistency
• Timeliness
• Validity (potential to be accurate, 
e.g., right datatype)

Question 4

Q

Steps in

Answer

A

Step 1 - Separate and manage master data
Step 2 - Cleanse the data
Step 3 - Standardize
Step 4 - Publish (Open Data)

Question 5

Q

Step 4 - Publish (Open Data)

Answer

A

• Make (selected) data sets available within the enterprise, business
network or to the world
• Open data allows others to build new services, combine data etc.
• Open data is more and more expected from government agencies
• Note similarities and differences with traditional “data integration”

Question 6

Q

The problem (the need to link open data)

Answer

A

Data everywhere
Relevant data is scattered over many files and applications
For many tasks, data from multiple sources needs to be used together
For many tasks, data needs to be re-used out of context
Exchange across systems, departments, organizations
No “integrated schema”
No centralized data governance possible anymore when you cross organizational borders

Question 7

Q

Solution: Linked Data (now called Knowledge graphs)

Answer

A

URIs: Universal Identifiers for everything - object identification
RDF: HTML (markup language) for Linked Data - data representation
SPARQL: SQL for Linked Data - data retrieval

Question 8

Q

Triples

Answer

A

All information can be broken down into simple “Subject-Predicate-Object” triples.
Thing –Attribute – Value
- This course has name “Business Analytics Emerging Trends”
- This lecture has date “2020-12-07”
Things – Relationship – Thing
This lecture location is Room WZ 104
This lecture teacher is Weigand

Question 9

Q

Things are identified by URIs

Answer

A

• Benefits of using URIs
  • Globally unique
  • Decentralized – doubles are not prevented, but can be resolved easily using “same-as” 
relationship
  • Resolvable (use browser)
• Costs of using URIs:
  • Can be long and ugly
• Can use international alphabets nowadays

Question 10

Q

How to query across data sources

Answer

A

Make all data sources available as RDF
(RDF is usually not the primary data representation)
Put them into a single store
( Physical or virtual, ad hoc or persistent)
Execute SPARQL queries!

Question 11

Q

Knowledge Graphs in Web Search

Answer

A

• KGs are already used heavily by e.g., Google. Some predict that in 5
years time, the Google interface has completely changed (voice
interface, no 2,300,576 web page results but only facts and ads.
• The Winterthur example (Denny Vrandecic, 2020)

Question 12

Q

Knowledge Graphs in Web Searches problem

Answer

A

• Some twin towns are included in the Winterthur description.
• The Ontario page mentions Winterthur as Sister City, in a text description.
• There is not easy way to resolve the differences.
• Solution: Wikidata - publicly curated Knowledge Graph, where the relationship
is modeled as being symmetric

Question 13

Q

Artificial Intelligence

Answer

A

• KG can be the output representation for
• Natural Language Processing
• Computer Vision
• KG can be input for several AI tasks
• Simple reasoning
• Based on properties of the relationships. E.g., “sister city” is symmetric.
• Machine Learning
• Requires conversion from KG to numerical input: word embeddings, graph embeddings
• The ML results can be embedded again into the KG for link prediction.
– For example, (?, StarringIn, Terminator) is to predict the stars of the film Terminator when the data is
incomplete.
• Chatbots

Question 14

Q

Edge Detection

Answer

A

Man (circled): wearing glasses (circled)
feeding horse (circled)
horse (circled) eating from bucket (circled)

Question 15

Q

Contribution KG to Conversational AI

Answer

A

• First of all, the contribution of KG is 
providing more data from 
heterogeneous sources, including 
personal data (personalization)
• KG data can also be used to generate 
queries that can be used to train the 
(ML-based) NL Interpreter
• KG data can be used to improve the 
Intention finder, by attaching domain-
specific intentions to objects.
   • Example: table reservation intention for 
restaurant objects in touristic chatbot

Question 16

Q

Conclusion lecture 2 Knowledge Graphs

Answer

Study These Flashcards

A

• Big Data challenges traditional data management solutions
• Need for broadly-standardized data (from company or supply chain standards
to global standards) – LOD, RDF
• Standards like RDF implement knowledge graphs, that is, networks of atomic
data triples.
• Knowledge graphs are used in AI and use AI (ML, NLP)
• Enterprises should be more aware of the potential of knowledge graphs

The need for data integration is not less than before, but the solution direction
shifts from closed centralized to open distributed.
Information Management should live with the decentralized reality and should still
care about data management (data quality, consistency).

Lecture 2 - Knowledge Graphs Flashcards

(16 cards)