Annotation Formats Flashcards
Why do we need/use annotation formats?
Documents are often human readable but not machine readable
What are annotations?
added like markup to a text document (not the original format)
What is important when creating annotations?
A convention must be followed
What is the purpose of annotation (3)
- presentation
- metadata
- understanding content
What are the types of annotation formats (3)
- boundary notation
- inline markup elements
- standoff (delimiter separated values csv, json)
What is boundary notation?
The application of annotation to each individual token,
Give an example of boundary notation
BIO- begin, inside, outside
What are the benefits (1) and disadvantages (1) of BIO
- simple
- cant handle hierarchical or structured annotations like nesting, relations or events
Give an example of a nested entity
“the British prime minister boris johnson”
Given an example of an event
The iraq, us war
What are the benefits (1) and disadvantages (2) of inline markup
- can cope with nested, hierarchical and structured (i.e. events)
- requires substantial processing
- cant encode overlapping/intersecting annotations
Give an example overlapping/intersecting annotation
“the Iraq city of Basra”
What are stand off annotations?
Annotations stored separately from the document (requires a way to link). We link annotations using indexing on character offsets
What are the benefits (2) and disadvantages (1) of stand off annotations
- original text is left untouched
- handles overlapping and structured
- not human readable because of the separation