Chapter 7 - XML Data Modelling Flashcards
What are the goals and main characteristics of XML?
- original intent: document markup language (tags = extra information)
- separate document content from structure: describe documents for interchange
- meta-language: language for defining other languages
- markup = metadata at specific parts (instance-level): self-describing documents
- > schema (optional) = “global metadata (vocabulary, strcture, allowed content …)
Which application scenarios influence the development of XML?
- Document processing: use document in various, evolving systems
- > structure, content, layout (grammar: markup vocabulary for mixed content)
- Databases and Data Exchange: data independence
- > structured, typed data (schema, integrity constraints
- Semi-structured data and information integration: integrate autonomous data sources
- > dynamic, partially known schemas (data analysis after processing)
What is the main characteristic and structure of XML documents?
- > Unicode text files: markup: tags, references (&), comments, PI’s + data: text (character data) (att. sysntax: CDATA)
- > Well-formed: optional XML decl./ schema, single root, nesting rules, etc..
(Additional Information on Elements, Attributes and Namespaces)
- Element content is ordered
- Mix of text and elemtns allowed but not recommended
- attribute values = string (no further structuring)
- attributes are unordered and unique within element
- attributes vs subelements:
- > documents: attr is part of markup (interpretation), elements are basic context
- > data representation: difference is unclear
- namespaces specify how to construct universally unique names
- > collection of names identified by URI, declared via XMLNS: attribute (IMPORT ~)
- Just xmlns=”URI”=> declared NS
What are the goals of a schema definition in XML?
- > Standardized data exchange (structured data)
- > Restrictions on structure and data types: validation of documents
Overview DTD
- Defines elements, their mandatory/optional attributes and subelements (no. of occur)
- no data types: all values as strings
- <!DOCTYPE …> clause in XML doc
- Syntax <!ELEMENT elem-name (subelement specificatoin)> name list, #PCDATA, regex
<!ATTLIST elem-name attr-spec> type: CDATA, ID, NMTOKEN, enumeration + #REQUIRED, #IMPLIED, def. value
untyped:
- ID type: distinct value (object identifier)
- IDREF type: ID value of elem. in same document
- IDREFS type: set of ID values (IDREF)
Overview XML schema.
(class of documents -> instance documents conforms to it)
- closer to general understanding of a database schema. Add. to DTD, supports:
- > typing/constraints of values, typed references, UDTs, XML syntax, namespaces, list types, inheritance, unique, foreign key… -> more complicated than DTD
Overview: types in XML schema
- > simple: no further structure (child elem/attrs) -> plain values, ran be derived by restriction
vs. complex: may carry attributes and have element content (or empty) - > primitive: buit-in, simple types which are not derived
vs. derived: defined in terms of other types (restriction, list, union, extension) - > built-in vs. user-derived
Overview: complex types
- simple content: add attributes to simple content of elements (extension)
- sequence (subelements in specified order), choice (one of subelems), all (unordered)
- > subelements with number of occurrences (min occurs, max occurs)
- derivation: restriction/extension
What are the alternatives for deriving types in XML schema?
- Restriction: same content of base type
- > for simple content: add facets (bounds, lengths, enumeration, pattern)
- > complex content: limiting occurrences (larger min/ smaller max)
- specifying default or fixed attribute value
- remove optional component
- replace types with derived (simple types
- Extension: add attributes and elements to content inherited from base type
- must be surrounded by or
How does XML Schema support substitutability of derived types?
- document leve: element of derived type can occur in the place of base type
- > requires attribute xsi: type to specify the derived type
- > can be forbidden with block attribute (or final)
- Element level: specify which element names may substitute a head type
- > must be of the same type or derived
- > substitution groups!
What are the namespaces involved in XML Schema and their associations?
- Schema vocabulary (schema of XML Schema): fixed URI, usually xsd prefix
- Target Namespace defines NS to which the names defined in the schema belong
- Doc using schema: defines xmlns as target Namespace (schemaLocation to get file)
How does XML Schema support unique constraints?
xsd: unique element: selector + field
- > XPath expressions to define scope and values that must be unique within an element
xsd:key = unique + nillable = false
How can XML schema map ER models with 1:n and n:m relationships?
- > entities = XML elements (xsd:key simulates ER keys)
- > 1:n relationships: nesting N subelements inside (sequence)
- > by local element definition (element name=””) or global (elment ref=””)
- > requires xsd:key/xsd:keyref for uniqueness + referential integrity
- > nesting alone insufficient
- > n:m relationships: key+keyref in helper element (similar to relational table)
- > flat modelling with pointers
What are the advantages of key/key ref in XML schema over ID/IDREF in DTDs?
- based on equality of data types
- allows composite keys
- scope restriction (implies type)