XML Flashcards
What is XML?
XML is the Extensible Markup Language. It improves the functionality of the Web by letting you identify your information in a more accurate, flexible, and adaptable way.
It is extensible because it is not a fixed format like HTML (which is a single, predefined markup language). Instead, XML is actually a metalanguage—a language for describing other languages—which lets you design your own markup languages for limitless different types of documents. XML can do this because it’s written in SGML, the international standard metalanguage for text document markup (ISO 8879).
What is a markup language?
A markup language is a set of words and symbols for describing the identity of pieces of a document (for example ‘this is a paragraph’, ‘this is a heading’, ‘this is a list’, ‘this is the caption of this figure’, etc). Programs can use this with a style sheet to create output for screen, print, audio, video, Braille, etc.
Some markup languages (e.g. those used in word processors) only describe appearances (‘this is italics’, ‘this is bold’), but this method can only be used for display, and is not normally re-usable for anything else
Where should I use XML?
Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.
Despite early attempts, browsers never allowed other SGML, only HTML (although there were plugins), and they allowed it (even encouraged it) to be corrupted or broken, which held development back for over a decade by making it impossible to program for it reliably. XML fixes that by making it compulsory to stick to the rules, and by making the rules much simpler than SGML.
But XML is not just for Web pages: in fact it’s very rarely used for Web pages on its own because browsers still don’t provide reliable support for formatting and transforming it. Common uses for XML include:
Information identification
because you can define your own markup, you can define meaningful names for all your information items. Information storage
because XML is portable and non-proprietary, it can be used to store textual information across any platform. Because it is backed by an international standard, it will remain accessible and processable as a data format. Information structure
XML can therefore be used to store and identify any kind of (hierarchical) information structure, especially for long, deep, or complex document sets or data sources, making it ideal for an information-management back-end to serving the Web. This is its most common Web application, with a transformation system to serve it as HTML until such time as browsers are able to handle XML consistently. Publishing
The original goal of XML as defined in the quotation at the start of this section. Combining the three previous topics (identity, storage, structure) means it is possible to get all the benefits of robust document management and control (with XML) and publish to the Web (as HTML) as well as to paper (as PDF) and to other formats (e.g. Braille, Audio, etc) from a single source document by using the appropriate style sheets. Messaging and data transfer
XML is also very heavily used for enclosing or encapsulating information in order to pass it between different computing systems which would otherwise be unable to communicate. By providing a lingua franca for data identity and structure, it provides a common envelope for inter-process communication (messaging). Web services
Building on all of these, as well as its use in browsers, machine-processable data can be exchanged between consenting systems, where before it was only comprehensible by humans (HTML). Weather services, e-commerce sites, blog newsfeeds, AJAX sites, and thousands of other data-exchange services use XML for data management and transmission, and the web browser for display and interaction.
Why is XML such an important development?
It removes two constraints which were holding back Web developments:
- dependence on a single, inflexible document type (HTML) which was being much abused for tasks it was never designed for;
- the complexity of full SGML, whose syntax allows many powerful but hard-to-program options.
XML allows the flexible development of user-defined document types. It provides a robust, non-proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of SGML, making it easier to program for.
Describe the role that XSL can play when dynamically generating HTML pages from a relational database.
Even if candidates have never participated in a project involving this type of architecture, they should recognize it as one of the common uses of XML. Querying a database and then formatting the result set so that it can be validated as an XML document allows developers to translate the data into an HTML table using XSLT rules. Consequently, the format of the resulting HTML table can be modified without changing the database query or application code since the document rendering logic is isolated to the XSLT rules
What is SGML?
SGML is the Standard Generalized Markup Language (ISO 8879:1986), the international standard for defining descriptions of the structure of different types of electronic document. There is an SGML FAQ from David Megginson at http://math.albany.edu:8800/hm/sgml/cts-faq.htmlFAQ; and Robin Cover’s SGML Web pages are at http://www.oasis-open.org/cover/general.html. For a little light relief, try Joe English’s ‘Not the SGML FAQ’ at http://www.flightlab.com/~joe/sgml/faq-not.txtFAQ.
SGML is very large, powerful, and complex. It has been in heavy industrial and commercial use for nearly two decades, and there is a significant body of expertise and software to go with it.
XML is a lightweight cut-down version of SGML which keeps enough of its functionality to make it useful but removes all the optional features which made SGML too complex to program for in a Web environment.
Aren’t XML, SGML, and HTML all the same thing?
Not quite; SGML is the mother tongue, and has been used for describing thousands of different document types in many fields of human activity, from transcriptions of ancient Irish manuscripts to the technical documentation for stealth bombers, and from patients’ clinical records to musical notation. SGML is very large and complex, however, and probably overkill for most common office desktop applications.
XML is an abbreviated version of SGML, to make it easier to use over the Web, easier for you to define your own document types, and easier for programmers to write programs to handle them. It omits all the complex and less-used options of SGML in return for the benefits of being easier to write applications for, easier to understand, and more suited to delivery and interoperability over the Web. But it is still SGML, and XML files may still be processed in the same way as any other SGML file (see the question on XML software).
HTML is just one of many SGML or XML applications—the one most frequently used on the Web.
Technical readers may find it more useful to think of XML as being SGML– rather than HTML++.
Who is responsible for XML?
XML is a project of the World Wide Web Consortium (W3C), and the development of the specification is supervised by an XML Working Group. A Special Interest Group of co-opted contributors and experts from various fields contributed comments and reviews by email.
XML is a public format: it is not a proprietary development of any company, although the membership of the WG and the SIG represented companies as well as research and academic institutions. The v1.0 specification was accepted by the W3C as a Recommendation on Feb 10, 1998.
Why is XML such an important development?
It removes two constraints which were holding back Web developments:
- dependence on a single, inflexible document type (HTML) which was being much abused for tasks it was never designed for;
- the complexity of full question A.4, SGML, whose syntax allows many powerful but hard-to-program options.
XML allows the flexible development of user-defined document types. It provides a robust, non-proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of SGML, making it easier to program for.
Give a few examples of types of applications that can benefit from using XML.
There are literally thousands of applications that can benefit from XML technologies. The point of this question is not to have the candidate rattle off a laundry list of projects that they have worked on, but, rather, to allow the candidate to explain the rationale for choosing XML by citing a few real world examples. For instance, one appropriate answer is that XML allows content management systems to store documents independently of their format, which thereby reduces data redundancy. Another answer relates to B2B exchanges or supply chain management systems. In these instances, XML provides a mechanism for multiple companies to exchange data according to an agreed upon set of rules. A third common response involves wireless applications that require WML to render data on hand held devices.
What is DOM and how does it relate to XML?
The Document Object Model (DOM) is an interface specification maintained by the W3C DOM Workgroup that defines an application independent mechanism to access, parse, or update XML data. In simple terms it is a hierarchical model that allows developers to manipulate XML documents easily Any developer that has worked extensively with XML should be able to discuss the concept and use of DOM objects freely. Additionally, it is not unreasonable to expect advanced candidates to thoroughly understand its internal workings and be able to explain how DOM differs from an event-based interface like SAX.
What is SOAP and how does it relate to XML?
The Simple Object Access Protocol (SOAP) uses XML to define a protocol for the exchange of information in distributed computing environments. SOAP consists of three components: an envelope, a set of encoding rules, and a convention for representing remote procedure calls. Unless experience with SOAP is a direct requirement for the open position, knowing the specifics of the protocol, or how it can be used in conjunction with HTTP, is not as important as identifying it as a natural application of XML
Why not just carry on extending HTML?
HTML was already overburdened with dozens of interesting but incompatible inventions from different manufacturers, because it provides only one way of describing your information.
XML allows groups of people or organizations to question C.13, create their own customized markup applications for exchanging information in their domain (music, chemistry, electronics, hill-walking, finance, surfing, petroleum geology, linguistics, cooking, knitting, stellar cartography, history, engineering, rabbit-keeping, question C.19, mathematics, genealogy, etc).
HTML is now well beyond the limit of its usefulness as a way of describing information, and while it will continue to play an important role for the content it currently represents, many new applications require a more robust and flexible infrastructure.
Why should I use XML?
Here are a few reasons for using XML (in no particular order). Not all of these will apply to your own requirements, and you may have additional reasons not mentioned here (if so, please let the editor of the FAQ know!).
- XML can be used to describe and identify information accurately and unambiguously, in a way that computers can be programmed to ‘understand’ (well, at least manipulate as if they could understand).
- XML allows documents which are all the same type to be created consistently and without structural errors, because it provides a standardised way of describing, controlling, or allowing/disallowing particular types of document structure. [Note that this has absolutely nothing whatever to do with formatting, appearance, or the actual text content of your documents, only the structure of them.]
- XML provides a robust and durable format for information storage and transmission. Robust because it is based on a proven standard, and can thus be tested and verified; durable because it uses plain-text file formats which will outlast proprietary binary ones.
- XML provides a common syntax for messaging systems for the exchange of information between applications. Previously, each messaging system had its own format and all were different, which made inter-system messaging unnecessarily messy, complex, and expensive. If everyone uses the same syntax it makes writing these systems much faster and more reliable.
- XML is free. Not just free of charge (free as in beer) but free of legal encumbrances (free as in speech). It doesn’t belong to anyone, so it can’t be hijacked or pirated. And you don’t have to pay a fee to use it (you can of course choose to use commercial software to deal with it, for lots of good reasons, but you don’t pay for XML itself).
- XML information can be manipulated programmatically (under machine control), so XML documents can be pieced together from disparate sources, or taken apart and re-used in different ways. They can be converted into almost any other format with no loss of information.
- XML lets you separate form from content. Your XML file contains your document information (text, data) and identifies its structure: your formatting and other processing needs are identified separately in a stylesheet or processing system. The two are combined at output time to apply the required formatting to the text or data identified by its structure (location, position, rank, order, or whatever).
Can you walk us through the steps necessary to parse XML documents?
Superficially, this is a fairly basic question. However, the point is not to determine whether candidates understand the concept of a parser but rather have them walk through the process of parsing XML documents step-by-step. Determining whether a non-validating or validating parser is needed, choosing the appropriate parser, and handling errors are all important aspects to this process that should be included in the candidate’s response.
Give some examples of XML DTDs or schemas that you have worked with.
Although XML does not require data to be validated against a DTD, many of the benefits of using the technology are derived from being able to validate XML documents against business or technical architecture rules. Polling for the list of DTDs that developers have worked with provides insight to their general exposure to the technology. The ideal candidate will have knowledge of several of the commonly used DTDs such as FpML, DocBook, HRML, and RDF, as well as experience designing a custom DTD for a particular project where no standard existed.
Using XSLT, how would you extract a specific attribute from an element in an XML document?
XML Interview Questions and Answers
Give a few examples of types of applications that can benefit from using XML.
There are literally thousands of applications that can benefit from XML technologies. The point of this question is not to have the candidate rattle off a laundry list of projects that they have worked on, but, rather, to allow the candidate to explain the rationale for choosing XML by citing a few real world examples. For instance, one appropriate answer is that XML allows content management systems to store documents independently of their format, which thereby reduces data redundancy. Another answer relates to B2B exchanges or supply chain management systems. In these instances, XML provides a mechanism for multiple companies to exchange data according to an agreed upon set of rules. A third common response involves wireless applications that require WML to render data on hand held devices.
What is DOM and how does it relate to XML?
The Document Object Model (DOM) is an interface specification maintained by the W3C DOM Workgroup that defines an application independent mechanism to access, parse, or update XML data. In simple terms it is a hierarchical model that allows developers to manipulate XML documents easily Any developer that has worked extensively with XML should be able to discuss the concept and use of DOM objects freely. Additionally, it is not unreasonable to expect advanced candidates to thoroughly understand its internal workings and be able to explain how DOM differs from an event-based interface like SAX.
What is SOAP and how does it relate to XML?
The Simple Object Access Protocol (SOAP) uses XML to define a protocol for the exchange of information in distributed computing environments. SOAP consists of three components: an envelope, a set of encoding rules, and a convention for representing remote procedure calls. Unless experience with SOAP is a direct requirement for the open position, knowing the specifics of the protocol, or how it can be used in conjunction with HTTP, is not as important as identifying it as a natural application of XML
Why not just carry on extending HTML?
HTML was already overburdened with dozens of interesting but incompatible inventions from different manufacturers, because it provides only one way of describing your information.
XML allows groups of people or organizations to question C.13, create their own customized markup applications for exchanging information in their domain (music, chemistry, electronics, hill-walking, finance, surfing, petroleum geology, linguistics, cooking, knitting, stellar cartography, history, engineering, rabbit-keeping, question C.19, mathematics, genealogy, etc).
HTML is now well beyond the limit of its usefulness as a way of describing information, and while it will continue to play an important role for the content it currently represents, many new applications require a more robust and flexible infrastructure.
Why should I use XML?
Here are a few reasons for using XML (in no particular order). Not all of these will apply to your own requirements, and you may have additional reasons not mentioned here (if so, please let the editor of the FAQ know!).
* XML can be used to describe and identify information accurately and unambiguously, in a way that computers can be programmed to ‘understand’ (well, at least manipulate as if they could understand).
* XML allows documents which are all the same type to be created consistently and without structural errors, because it provides a standardised way of describing, controlling, or allowing/disallowing particular types of document structure. [Note that this has absolutely nothing whatever to do with formatting, appearance, or the actual text content of your documents, only the structure of them.]
* XML provides a robust and durable format for information storage and transmission. Robust because it is based on a proven standard, and can thus be tested and verified; durable because it uses plain-text file formats which will outlast proprietary binary ones.
* XML provides a common syntax for messaging systems for the exchange of information between applications. Previously, each messaging system had its own format and all were different, which made inter-system messaging unnecessarily messy, complex, and expensive. If everyone uses the same syntax it makes writing these systems much faster and more reliable.
* XML is free. Not just free of charge (free as in beer) but free of legal encumbrances (free as in speech). It doesn’t belong to anyone, so it can’t be hijacked or pirated. And you don’t have to pay a fee to use it (you can of course choose to use commercial software to deal with it, for lots of good reasons, but you don’t pay for XML itself).
* XML information can be manipulated programmatically (under machine control), so XML documents can be pieced together from disparate sources, or taken apart and re-used in different ways. They can be converted into almost any other format with no loss of information.
* XML lets you separate form from content. Your XML file contains your document information (text, data) and identifies its structure: your formatting and other processing needs are identified separately in a stylesheet or processing system. The two are combined at output time to apply the required formatting to the text or data identified by its structure (location, position, rank, order, or whatever).
Can you walk us through the steps necessary to parse XML documents?
Superficially, this is a fairly basic question. However, the point is not to determine whether candidates understand the concept of a parser but rather have them walk through the process of parsing XML documents step-by-step. Determining whether a non-validating or validating parser is needed, choosing the appropriate parser, and handling errors are all important aspects to this process that should be included in the candidate’s response.
Give some examples of XML DTDs or schemas that you have worked with.
Although XML does not require data to be validated against a DTD, many of the benefits of using the technology are derived from being able to validate XML documents against business or technical architecture rules. Polling for the list of DTDs that developers have worked with provides insight to their general exposure to the technology. The ideal candidate will have knowledge of several of the commonly used DTDs such as FpML, DocBook, HRML, and RDF, as well as experience designing a custom DTD for a particular project where no standard existed.
Using XSLT, how would you extract a specific attribute from an element in an XML document?
Successful candidates should recognize this as one of the most basic applications of XSLT. If they are not able to construct a reply similar to the example below, they should at least be able to identify the components necessary for this operation: xsl:template to match the appropriate XML element, xsl:value-of to select the attribute value, and the optional xsl:apply-templates to continue processing the document.
Extract Attributes from XML Data
Example 1.
Attribute Value:
When constructing an XML DTD, how do you create an external entity reference in an attribute value?
Every interview session should have at least one trick question. Although possible when using SGML, XML DTDs don’t support defining external entity references in attribute values. It’s more important for the candidate to respond to this question in a logical way than than the candidate know the somewhat obscure answer.
How would you build a search engine for large volumes of XML data?
The way candidates answer this question may provide insight into their view of XML data. For those who view XML primarily as a way to denote structure for text files, a common answer is to build a full-text search and handle the data similarly to the way Internet portals handle HTML pages. Others consider XML as a standard way of transferring structured data between disparate systems. These candidates often describe some scheme of importing XML into a relational or object database and relying on the database’s engine for searching. Lastly, candidates that have worked with vendors specializing in this area often say that the best way the handle this situation is to use a third party software package optimized for XML data.
What is the difference between XML and C or C++ or Java ?
C and C++ (and other languages like FORTRAN, or Pascal, or Visual Basic, or Java or hundreds more) are programming languages with which you specify calculations, actions, and decisions to be carried out in order:
mod curconfig[if left(date,6) = “01-Apr”,
t.put “April googlel!”,
f.put days(‘31102005’,’DDMMYYYY’) -
days(sdate,’DDMMYYYY’)
“ more shopping days to Samhain”];
XML is a markup specification language with which you can design ways of describing information (text or data), usually for storage, transmission, or processing by a program. It says nothing about what you should do with the data (although your choice of element names may hint at what they are for):
Camshaft end bearing retention circlip
Ringtown Fasteners Ltd
Angle-nosed insertion tool is required for the removal
and replacement of this part.
On its own, an SGML or XML file (including HTML) doesn’t do anything. It’s a data format which just sits there until you run a program which does something with it.
Does XML replace HTML?
No. XML itself does not replace HTML. Instead, it provides an alternative which allows you to define your own set of markup elements. HTML is expected to remain in common use for some time to come, and the current version of HTML is in XML syntax. XML is designed to make the writing of DTDs much simpler than with full SGML. (See the question on DTDs for what one is and why you might want one.)
Do I have to know HTML or SGML before I learn XML?
No, although it’s useful because a lot of XML terminology and practice derives from two decades’ experience of SGML.
Be aware that ‘knowing HTML’ is not the same as ‘understanding SGML’. Although HTML was written as an SGML application, browsers ignore most of it (which is why so many useful things don’t work), so just because something is done a certain way in HTML browsers does not mean it’s correct, least of all in XML.
What does an XML document actually look like (inside)?
The basic structure of XML is similar to other applications of SGML, including HTML. The basic components can be seen in the following examples. An XML document starts with a Prolog:
1. The XML Declaration
which specifies that this is an XML document;
2. Optionally a Document Type Declaration
which identifies the type of document and says where the Document Type Description (DTD) is stored;
The Prolog is followed by the document instance:
- A root element, which is the outermost (top level) element (start-tag plus end-tag) which encloses everything else: in the examples below the root elements are conversation and titlepage;
- A structured mix of descriptive or prescriptive elements enclosing the character data content (text), and optionally any attributes (‘name=value’ pairs) inside some start-tags.
XML documents can be very simple, with straightforward nested markup of your own design:
<br></br>
Hello, world!
Stop the planet, I want to get
off!
Or
they can be more complicated, with a Schema or question C.11, Document Type Description (DTD) or internal subset (local DTD changes in [square brackets]), and an arbitrarily complex nested structure:
<!DOCTYPE titlepage
SYSTEM “http://www.google.bar/dtds/typo.dtd”
[<!ENTITY % active.links “INCLUDE”>]>
Hello, world!
<!-- In some copies the following
decoration is hand-colored, presumably
by the author -->
Vitam capias
Or
they can be anywhere between: a lot will depend on how you want to define your document type (or whose you use) and what it will be used for. Database-generated or program-generated XML documents used in e-commerce is usually unformatted (not for human reading) and may use very long names or values, with multiple redundancy and sometimes no character data content at all, just values in attributes:
How does XML handle white-space in my documents?
All white-space, including linebreaks, TAB characters, and normal spaces, even between ‘structural’ elements where no text can ever appear, is passed by the parser unchanged to the application (browser, formatter, viewer, converter, etc), identifying the context in which the white-space was found (element content, data content, or mixed content, if this information is available to the parser, eg from a DTD or Schema). This means it is the application’s responsibility to decide what to do with such space, not the parser’s:
* insignificant white-space between structural elements (space which occurs where only element content is allowed, ie between other elements, where text data never occurs) will get passed to the application (in SGML this white-space gets suppressed, which is why you can put all that extra space in HTML documents and not worry about it)
* significant white-space (space which occurs within elements which can contain text and markup mixed together, usually mixed content or PCDATA) will still get passed to the application exactly as under SGML. It is the application’s responsibility to handle it correctly.
The parser must inform the application that white-space has occurred in element content, if it can detect it. (Users of SGML will recognize that this information is not in the ESIS, but it is in the Grove.)
My title for
Chapter 1.
text
In the example above, the application will receive all the pretty-printing linebreaks, TABs, and spaces between the elements as well as those embedded in the chapter title. It is the function of the application, not the parser, to decide which type of white-space to discard and which to retain. Many XML applications have configurable options to allow programmers or users to control how such white-space is handled.