XML Introduction Flashcards
Fully Structured Data
This is where data in a relational model is structured and has to fit a schema, like Module(code, name).
Unstructured Data
Things like music and image files, where it is completely free on how to organise data.
Semi-structured Data
This contains no schema, and it flexible, but is also self-describing in a way.
Semi-structured Data model
A data model for the semi-structured data may be thought as a collection of nodes and edges in a tree-like manner.
Applications for semi-structured data
- Storing and sharing data on the web
- Storage of documents
- Data analytics and big data
Forms of semi-structured data
- XML
- JSON
- Simple key-value relationships
- Graphs
XML
XML files have a very similar structure to HTML files, where you have items in backets, <>, which descend to represent lower details in the tree.
XML Elements
Anything that is contained within an elements start tag and end tag, and can contain text, attributes and/or other elements.
<price>1.99</1.99>
</price>
Element Parents
Since we work in a top-down structure, we can nest elements within each other to declare hierarchy.
<person>
<student>
</student>
</person>
Attributes
Things that help define elements.
<person hair=”blonde”
</person
They can be very helpful in defining references.
Two ways of defining XML file format
DTD and XML Schema.
DTD
Document type definitions.
Provided information about the structure of the document.
DTD includes
- elements
- sub-elements
- attributes
<!DOCTYPE lecturers [
]>
Beginning of XML document.
Lecturers is the root name.
<!ELEMENT lecturers (lecturer+)>
+ means that there must be one or more lecturer in lecturer.
<!ELEMENT lecturer (name, phone?, email?, teaches*)>
Name is required, but we don’t have permission for multiple unlike using +.
Phone and email are optional, and can either have ONE or none.
Teachers can either have none, or many. So the lecturer could teach nothing, one lesson or multiple lessons.
DTD Format
When defining elements with child nodes in DTD, they have to be stated later on.
For example:
<!ELEMENT lecturer (name, phone, email)>
Since this element has name, phone and email, they need to be defined later in the document like this:
<!ELEMENT lecturer (name, phone, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email (#PCDATA)>
PCDATA
Special symbol for text data.
Defining attributes in DTD
Lets say you have, in XML:
<student name=”Jake” year=2
In DTD, we represent this differently:
<!ELEMENT student EMPTY>
<!ATTLIST student name CDATA #IMPLIED>
<!ATTLIST student year CDATA #REQUIRED>
ATTLIST
Simply enough, attribute list.
IMPLIED
Means optional.
REQUIRED
Means required.
FIXED
Means a constant.
CDATA
Character data, containing any text.
ID
Used to identify individual elements in documents.
IDREF/IDREFS
Must correspond to value of ID attributes for some element in document.
Essentially allows a unique key to identify multiple elements.
ID example
<lecturer>
<!ATTLIST lecturer staffNo ID #required>
</lecturer>
Two levels of document processing
- well-formed
- valid
DTD Document Processing
Validating/non-validating processors
Non-validating processor ensures XML document is well-formed before passing information onto application.
Validating processor will not only check for well-formedness, but also if it conforms to a DTD, which results in it being valid.