XML Flashcards

Question

Which parts of an XML document are case-sensitive?

Answer 1

All of it, both markup and text. This is significantly different from HTML and most other SGML applications. It was done to allow markup in non-Latin-alphabet languages, and to obviate problems with case-folding in writing systems which are caseless. * Element type names are case-sensitive: you must follow whatever combination of upper- or lower-case you use to define them (either by first usage or in a DTD or Schema). So you can't say …: upper- and lower-case must match; thus , , and are three different element types; * For well-formed XML documents with no DTD, the first occurrence of an element type name defines the casing; * Attribute names are also case-sensitive, for example the two width attributes in and (if they occurred in the same file) are separate attributes, because of the different case of width and WIDTH; * Attribute values are also case-sensitive. CDATA values (eg Url="MyFile.SGML") always have been, but NAME types (ID and IDREF attributes, and token list attributes) are now case-sensitive as well; * All general and parameter entity names (eg Á), and your data content (text), are case-sensitive as always.

Answer 2

Either convert them to conform to some new document type (with or without a DTD or Schema) and write a stylesheet to go with them; or edit them to conform to XHTML. It is necessary to convert existing HTML files because XML does not permit end-tag minimisation (missing , etc), unquoted attribute values, and a number of other SGML shortcuts which have been normal in most HTML DTDs. However, many HTML authoring tools already produce almost (but not quite) well-formed XML. You may be able to convert HTML to XHTML using the Dave Raggett's HTML Tidy program, which can clean up some of the formatting mess left behind by inadequate HTML editors, and even separate out some of the formatting to a stylesheet, but there is usually still some hand-editing to do.

Answer 3

Yes, the W3C recommends using XHTML which is ‘a reformulation of HTML 4 in XML 1.0’. This specification defines HTML as an XML application, and provides three DTDs corresponding to the ones defined by HTML 4.* (Strict, Transitional, and Frameset). The semantics of the elements and their attributes are as defined in the W3C Recommendation for HTML 4. These semantics provide the foundation for future extensibility of XHTML. Compatibility with existing HTML browsers is possible by following a small set of guidelines (see the W3C site).

Answer 4

Yes, provided you use up-to-date SGML software which knows about the WebSGML Adaptations TC to ISO 8879 (the features needed to support XML, such as the variant form for EMPTY elements; some aspects of the SGML Declaration such as NAMECASE GENERAL NO; multiple attribute token list declarations, etc). An alternative is to use an SGML DTD to let you create a fully-normalised SGML file, but one which does not use empty elements; and then remove the DocType Declaration so it becomes a well-formed DTDless XML file. Most SGML tools now handle XML files well, and provide an option switch between the two standards.

Answer 5

Yes, the XML Specification explicitly says XML uses ISO 10646, the international standard character repertoire which covers most known languages. Unicode is an identical repertoire, and the two standards track each other. The spec says (2.2): ‘All XML processors must accept the UTF-8 and UTF-16 encodings of ISO 10646…’. There is a Unicode FAQ at http://www.unicode.org/faq/FAQ. UTF-8 is an encoding of Unicode into 8-bit characters: the first 128 are the same as ASCII, and higher-order characters are used to encode anything else from Unicode into sequences of between 2 and 6 bytes. UTF-8 in its single-octet form is therefore the same as ISO 646 IRV (ASCII), so you can continue to use ASCII for English or other languages using the Latin alphabet without diacritics. Note that UTF-8 is incompatible with ISO 8859-1 (ISO Latin-1) after code point 127 decimal (the end of ASCII). UTF-16 is an encoding of Unicode into 16-bit characters, which lets it represent 16 planes. UTF-16 is incompatible with ASCII because it uses two 8-bit bytes per character (four bytes above U+FFFF).

Answer 6

A DTD is a description in XML Declaration Syntax of a particular type or class of document. It sets out what names are to be used for the different types of element, where they may occur, and how they all fit together. (A question C.16, Schema does the same thing in XML Document Syntax, and allows more extensive data-checking.) For example, if you want a document type to be able to describe Lists which contain Items, the relevant part of your DTD might contain something like this: This defines a list as an element type containing one or more items (that's the plus sign); and it defines items as element types containing just plain text (Parsed Character Data or PCDATA). Validators read the DTD before they read your document so that they can identify where every element type ought to come and how each relates to the other, so that applications which need to know this in advance (most editors, search engines, navigators, and databases) can set themselves up correctly. The example above lets you create lists like: Chocolate Music Surfingv (The indentation in the example is just for legibility while editing: it is not required by XML.) A DTD provides applications with advance notice of what names and structures can be used in a particular document type. Using a DTD and a validating editor means you can be certain that all documents of that particular type will be constructed and named in a consistent and conformant manner. DTDs are not required for processing the tip in question Bwell-formed documents, but they are needed if you want to take advantage of XML's special attribute types like the built-in ID/IDREF cross-reference mechanism; or the use of default attribute values; or references to external non-XML files (‘Notations’); or if you simply want a check on document validity before processing. There are thousands of DTDs already in existence in all kinds of areas (see the SGML/XML Web pages for pointers). Many of them can be downloaded and used freely; or you can write your own (see the question on creating your own DTD. Old SGML DTDs need to be converted to XML for use with XML systems: read the question on converting SGML DTDs to XML, but most popular SGML DTDs are already available in XML form. The alternatives to a DTD are various forms of question C.16, Schema. These provide more extensive validation features than DTDs, including character data content validation.

Answer 7

No, it lets you make up names for your own element types. If you think tags and elements are the same thing you are already in considerable trouble: read the rest of this question carefully.

Answer 8

``` Document types usually need a formal description, either a DTD or a Schema. Whilst it is possible to process well-formed XML documents without any such description, trying to create them without one is asking for trouble. A DTD or Schema is used with an XML editor or API interface to guide and control the construction of the document, making sure the right elements go in the right places. Creating your own document type therefore begins with an analysis of the class of documents you want to describe: reports, invoices, letters, configuration files, credit-card verification requests, or whatever. Once you have the structure correct, you write code to express this formally, using DTD or Schema syntax. ```

Answer 9

You need to use the XML Declaration Syntax (very simple: declaration keywords begin with It says that there shall be an element called Shopping-List and that it shall contain elements called Item: there must be at least one Item (that's the plus sign) but there may be more than one. It also says that the Item element may contain only parsed character data (PCDATA, ie text: no further markup). Because there is no other element which contains Shopping-List, that element is assumed to be the ‘root’ element, which encloses everything else in the document. You can now use it to create an XML file: give your editor the declarations: (assuming you put the DTD in that file). Now your editor will let you create files according to the pattern: Chocolate Sugar Butter It is possible to develop complex and powerful DTDs of great subtlety, but for any significant use you should learn more about document systems analysis and document type design. See for example Developing SGML DTDs: From Text to Model to Markup (Maler and el Andaloussi, 1995): this was written for SGML but perhaps 95% of it applies to XML as well, as XML is much simpler than full SGML—see the list of restrictions which shows what has been cut out. Warning Incidentally, a DTD file never has a DOCTYPE Declaration in it: that only occurs in an XML document instance (it's what references the DTD). And a DTD file also never has an XML Declaration at the top either. Unfortunately there is still software around which inserts one or both of these.

Answer 10

No. This is done in the document's Document Type Declaration, not in the DTD.

Answer 11

The W3C XML Schema recommendation provides a means of specifying formal data typing and validation of element content in terms of data types, so that document type designers can provide criteria for checking the data content of elements as well as the markup itself. Schemas are written in XML Document Syntax, like XML documents are, avoiding the need for processing software to be able to read XML Declaration Syntax (used for DTDs). There is a separate Schema FAQ at http://www.schemavalid.comFAQ. The term ‘vocabulary’ is sometimes used to refer to DTDs and Schemas together. Schemas are aimed at e-commerce, data control, and database-style applications where character data content requires validation and where stricter data control is needed than is possible with DTDs; or where strong data typing is required. They are usually unnecessary for traditional text document publishing applications. Unlike DTDs, Schemas cannot be specified in an XML Document Type Declaration. They can be specified in a Namespace, where Schema-aware software should pick it up, but this is optional: ... More commonly, you specify the Schema in your processing software, which should record separately which Schema is used by which XML document instance. In contrast to the complexity of the W3C Schema model, Relax NG is a lightweight, easy-to-use XML schema language devised by James Clark (see http://relaxng.org/) with development hosted by OASIS. It allows similar richness of expression and the use of XML as its syntax, but it provides an additional, simplified, syntax which is easier to use for those accustomed to DTDs.

Answer 12

Ask your database manufacturer: they all provide XML import and export modules to connect XML applications with databases. In some trivial cases there will be a 1:1 match between field names in the database table and element type names in the XML Schema or DTD, but in most cases some programming will be required to establish the desired match. This can usually be stored as a procedure so that subsequent uses are simply commands or calls with the relevant parameters. In less trivial, but still simple, cases, you could export by writing a report routine that formats the output as an XML document, and you could import by writing an XSLT transformation that formatted the XML data as a load file.

Answer 13

Yes, if the document type you use provides for math, and your users' browsers are capable of rendering it. The mathematics-using community has developed the MathML Recommendation at the W3C, which is a native XML application suitable for embedding in other DTDs and Schemas. It is also possible to make XML fragments from other DTDs, such as ISO 12083 Math, or OpenMath, or one of your own making. Browsers which display math embedded in SGML existed for many years (eg DynaText, Panorama, Multidoc Pro), and mainstream browsers are now rendering MathML. David Carlisle has produced a set of stylesheets for rendering MathML in browsers. It is also possible to use XSLT to convert XML math markup to LATEX for print (PDF) rendering, or to use XSL:FO. Please note that XML is not itself a programming language, so concepts such as arithmetic and if-statements (if-then-else logic) are not meaningful in XML documents.

Answer 14

The linking abilities of XML systems are potentially much more powerful than those of HTML, so you'll be able to do much more with them. Existing href-style links will remain usable, but the new linking technology is based on the lessons learned in the development of other standards involving hypertext, such as TEI and HyTime, which let you manage bidirectional and multi-way links, as well as links to a whole element or span of text (within your own or other documents) rather than to a single point. These features have been available to SGML users for many years, so there is considerable experience and expertise available in using them. Currently only Mozilla Firefox implements XLink. The XML Linking Specification (XLink) and the XML Extended Pointer Specification (XPointer) documents contain the details. An XLink can be either a URI or a TEI-style Extended Pointer (XPointer), or both. A URI on its own is assumed to be a resource; if an XPointer follows it, it is assumed to be a sub-resource of that URI; an XPointer on its own is assumed to apply to the current document (all exactly as with HTML). An XLink may use one of #, ?, or |. The # and ? mean the same as in HTML applications; the | means the sub-resource can be found by applying the link to the resource, but the method of doing this is left to the application. An XPointer can only follow a #. The TEI Extended Pointer Notation (EPN) is much more powerful than the fragment address on the end of some URIs, as it allows you to specify the location of a link end using the structure of the document as well as (or in addition to) known, fixed points like IDs. For example, the linked second occurrence of the word ‘XPointer’ two paragraphs back could be referred to with the URI (shown here with linebreaks and spaces for clarity: in practice it would of course be all one long string): http://xml.silmaril.ie/faq.xml#ID(hypertext) .child(1,#element,'answer') .child(2,#element,'para') .child(1,#element,'link') This means the first link element within the second paragraph within the answer in the element whose ID is hypertext (this question). Count the objects from the start of this question (which has the ID hypertext) in the XML source: 1. the first child object is the element containing the question (); 2. the second child object is the answer (the element); 3. within this element go to the second paragraph; 4. find the first link element. Eve Maler explained the relationship of XLink and XPointer as follows: XLink governs how you insert links into your XML document, where the link might point to anything (eg a GIF file); XPointer governs the fragment identifier that can go on a URL when you're linking to an XML document, from anywhere (eg from an HTML file). [Or indeed from an XML file, a URI in a mail message, etc…Ed.] David Megginson has produced an xpointer function for Emacs/psgml which will deduce an XPointer for any location in an XML document. XML Spy has a similar function.

Answer 15

Graphics have traditionally just been links which happen to have a picture file at the end rather than another piece of text. They can therefore be implemented in any way supported by the XLink and XPointer specifications (see question C.18, ‘How will XML affect my document links?’), including using similar syntax to existing HTML images. They can also be referenced using XML's built-in NOTATION and ENTITY mechanism in a similar way to standard SGML, as external unparsed entities. However, the SVG specification (see the tip below, by Peter Murray-Rust) lets you use XML markup to draw vector graphics objects directly in your XML file. This provides enormous power for the inclusion of portable graphics, especially interactive or animated sequences, and it is now slowly becoming supported in browsers. The XML linking specifications for external images give you much better control over the traversal and activation of links, so an author can specify, for example, whether or not to have an image appear when the page is loaded, or on a click from the user, or in a separate window, without having to resort to scripting. XML itself doesn't predicate or restrict graphic file formats: GIF, JPG, TIFF, PNG, CGM, EPS, and SVG at a minimum would seem to make sense; however, vector formats (EPS, SVG) are normally essential for non-photographic images (diagrams). You cannot embed a raw binary graphics file (or any other binary [non-text] data) directly into an XML file because any bytes happening to resemble markup would get misinterpreted: you must refer to it by linking (see below). It is, however, possible to include a text-encoded transformation of a binary file as a CDATA Marked Section, using something like UUencode with the markup characters ], & and > removed from the map so that they could not occur as an erroneous CDATA termination sequence and be misinterpreted. You could even use simple hexadecimal encoding as used in PostScript. For vector graphics, however, the solution is to use SVG (see the tip below, by Peter Murray-Rust). Sound files are binary objects in the same way that external graphics are, so they can only be referenced externally (using the same techniques as for graphics). Music files written in MusiXML or an XML variant of SMDL could however be embedded in the same way as for SVG. The point about using entities to manage your graphics is that you can keep the list of entity declarations separate from the rest of the document, so you can re-use the names if an image is needed more than once, but only store the physical file specification in a single place. This is available only when using a DTD, not a Schema.

Answer 16

This works exactly the same as for SGML. First you declare the entity you want to include, and then you reference it by name: ]> ...blah blah... ``` &chap1; &chap2; &chap3; &chap4; &chap5; ``` The difference between this method and the one used for including a DTD fragment (see question D.15, ‘How do I include one DTD (or fragment) in another?’) is that this uses an external general (file) entity which is referenced in the same way as for a character entity (with an ampersand). The one thing to make sure of is that the included file must not have an XML or DOCTYPE Declaration on it. If you've been using one for editing the fragment, remove it before using the file in this way. Yes, this is a pain in the butt, but if you have lots of inclusions like this, write a script to strip off the declaration (and paste it back on again for editing).

Answer 17

Parsing is the act of splitting up information into its component parts (schools used to teach this in language classes until the teaching profession collectively caught the anti-grammar disease). ‘Mary feeds Spot’ parses as 1. Subject = Mary, proper noun, nominative case 2. Verb = feeds, transitive, third person singular, present tense 3. Object = Spot, proper noun, accusative case In computing, a parser is a program (or a piece of code or API that you can reference inside your own programs) which analyses files to identify the component parts. All applications that read input have a parser of some kind, otherwise they'd never be able to figure out what the information means. Microsoft Word contains a parser which runs when you open a .doc file and checks that it can identify all the hidden codes. Give it a corrupted file and you'll get an error message. XML applications are just the same: they contain a parser which reads XML and identifies the function of each the pieces of the document, and it then makes that information available in memory to the rest of the program. While reading an XML file, a parser checks the syntax (pointy brackets, matching quotes, etc) for well-formedness, and reports any violations (reportable errors). The XML Specification lists what these are. Validation is another stage beyond parsing. As the component parts of the program are identified, a validating parser can compare them with the pattern laid down by a DTD or a Schema, to check that they conform. In the process, default values and datatypes (if specified) can be added to the in-memory result of the validation that the validating parser gives to the application. Judy O'Grady The example above parses as: 1. Element person identified with Attribute corpid containing abc123 and Attribute birth containing 1960-02-31 and Attribute gender containing female containing ... 2. Element name containing ... 3. Element forename containing text ‘Judy’ followed by ... 4. Element surname containing text ‘O'Grady’ (and lots of other stuff too). As well as built-in parsers, there are also stand-alone parser-validators, which read an XML file and tell you if they find an error (like missing angle-brackets or quotes, or misplaced markup). This is essential for testing files in isolation before doing something else with them, especially if they have been created by hand without an XML editor, or by an API which may be too deeply embedded elsewhere to allow easy testing.

Answer 18

You should almost never need to use CDATA Sections. The CDATA mechanism was designed to let an author quote fragments of text containing markup characters (the open-angle-bracket and the ampersand), for example when documenting XML (this FAQ uses CDATA Sections quite a lot, for obvious reasons). A CDATA Section turns off markup recognition for the duration of the section (it gets turned on again only by the closing sequence of double end-square-brackets and a close-angle-bracket). Consequently, nothing in a CDATA section can ever be recognised as anything to do with markup: it's just a string of opaque characters, and if you use an XML transformation language like XSLT, any markup characters in it will get turned into their character entity equivalent. If you try, for example, to use: some text with in it. in the expectation that the embedded markup would remain untouched, it won't: it will just output some text with markup in it. In other words, CDATA Sections cannot preserve the embedded markup as markup. Normally this is exactly what you want because this technique was designed to let people do things like write documentation about markup. It was not designed to allow the passing of little chunks of (possibly invalid) unparsed HTML embedded inside your own XML through to a subsequent process—because that would risk invalidating the output. As a result you cannot expect to keep markup untouched simply because it looked as if it was safely ‘hidden’ inside a CDATA section: it can't be used as a magic shield to preserve HTML markup for future use as markup, only as characters.

Answer 19

Apart from using CDATA Sections, there are two common occasions when people want to handle embedded HTML inside an XML element: 1. when they have received (possibly poorly-designed) XML from somewhere else which they must find a way to handle; 2. when they have an application which has been explicitly designed to store a string of characters containing < and & character entity references with the objective of turning them back into markup in a later process (eg FreeMind, Atom). Generally, you want to avoid this kind of trick, as it usually indicates that the document structure and design has been insufficiently thought out. However, there are occasions when it becomes unavoidable, so if you really need or want to use embedded HTML markup inside XML, and have it processable later as markup, there are a couple of techniques you may be able to use: * Provide templates for the handling of that markup in your XSLT transformation or whatever software you use which simply replicates what was there, eg * Use XSLT's ‘deep copy’ instruction, which outputs nested well-formed markup verbatim, eg * As a last resort, use the disable-output-escaping attribute on the xsl:text element of XSL[T] which is available in some processors, eg Now!]]> * Some processors (eg JX) are now providing their own equivalents for disabling output escaping. Their proponents claim it is ‘highly desirable’ or ‘what most people want’, but it still needs to be treated with care to prevent unwanted (possibly dangerous) arbitrary code from being passed untouched through your system. It also adds another dependency to your software. For more details of using these techniques in XSL[T], see the relevant question in the XSL FAQ.

Answer 20

For normal text (not markup), there are no special characters: just make sure your document refers to the correct encoding scheme for the language and/or writing system you want to use, and that your computer correctly stores the file using that encoding scheme. See the question on non-Latin characters for a longer explanation. If your keyboard will not allow you to type the characters you want, or if you want to use characters outside the limits of the encoding scheme you have chosen, you can use a symbolic notation called ‘entity referencing’. Entity references can either be numeric, using the decimal or hexadecimal Unicode code point for the character (eg if your keyboard has no Euro symbol (€) you can type €); or they can be character, using an established name which you declare in your DTD (eg ) and then use as € in your document. If you are using a Schema, you must use the numeric form for all except the five below because Schemas have no way to make character entity declarations. If you use XML with no DTD, then these five character entities are assumed to be predeclared, and you can use them without declaring them: < The less-than character () starts entity markup (the first character of a character entity reference). > The greater-than character (>) ends a start-tag or an end-tag. " The double-quote character (") can be symbolised with this character entity reference when you need to embed a double-quote inside a string which is already double-quoted. ' The apostrophe or single-quote character (') can be symbolised with this character entity reference when you need to embed a single-quote or apostrophe inside a string which is already single-quoted. If you are using a DTD then you must declare all the character entities you need to use (if any), including any of the five above that you plan on using (they cease to be predeclared if you use a DTD). If you are using a Schema, you must use the numeric form for all except the five above because Schemas have no way to make character entity declarations.

Answer 21

The only changes needed are to make sure your server serves up .xml, .css, .dtd, .xsl, and whatever other file types you will use as the correct MIME content (media) types. The details of the settings are specified in RFC 3023. Most new versions of Web server software come preset. If not, all that is needed is to edit the mime-types file (or its equivalent: as a server operator you already know where to do this, right?) and add or edit the relevant lines for the right media types. In some servers (eg Apache), individual content providers or directory owners may also be able to change the MIME types for specific file types from within their own directories by using directives in a .htaccess file. The media types required are: * text/xml for XML documents which are ‘readable by casual users’; * application/xml for XML documents which are ‘unreadable by casual users’; * text/xml-external-parsed-entity for external parsed entities such as document fragments (eg separate chapters which make up a book) subject to the readability distinction of text/xml; * application/xml-external-parsed-entity for external parsed entities subject to the readability distinction of application/xml; * application/xml-dtd for DTD files and modules, including character entity sets. The RFC has further suggestions for the use of the +xml media type suffix for identifying ancillary files such as XSLT (application/xslt+xml). If you run scripts generating XHTML which you wish to be treated as XML rather than HTML, they may need to be modified to produce the relevant Document Type Declaration as well as the right media type if your application requires them to be validated.

Answer 22

For implementation to succeed, the terminology needs to be precise. Design goal eight of the specification tells us that ‘the design of XML shall be formal and concise’. To describe XML, the specification therefore uses formal language drawn from several fields, specifically those of text engineering, international standards and computer science. This is often confusing to people who are unused to these disciplines because they use well-known English words in a specialised sense which can be very different from their common meanings—for example: grammar, production, token, or terminal. The specification does not explain these terms because of the other part of the design goal: the specification should be concise. It doesn't repeat explanations that are available elsewhere: it is assumed you know this and either know the definitions or are capable of finding them. In essence this means that to grok the fullness of the spec, you do need a knowledge of some SGML and computer science, and have some exposure to the language of formal standards. Sloppy terminology in specifications causes misunderstandings and makes it hard to implement consistently, so formal standards have to be phrased in formal terminology. This FAQ is not a formal document, and the astute reader will already have noticed it refers to ‘element names’ where ‘element type names’ is more correct; but the former is more widely understood.

Answer 23

Yes, so long as what they generate ends up as part of an XML-conformant file (ie either valid or just well-formed). Server-side tag-replacers like shtml, PHP, JSP, ASP, Zope, etc store almost-valid files using comments, Processing Instructions, or non-XML markup, which gets replaced at the point of service by text or XML markup (it is unclear why some of these systems use non-HTML/XML markup). There are also some XML-based preprocessors for formats like XVRL (eXtensible Value Resolution Language) which resolve specialised references to external data and output a normalised XML file.

Answer 24

The same rule applies as for server-side inclusions, so you need to ensure that any embedded code which gets passed to a third-party engine (eg calls to SQL, VB, Java, etc) does not contain any characters which might be misinterpreted as XML markup (ie no angle brackets or ampersands). Either use a CDATA marked section to avoid your XML application parsing the embedded code, or use the standard <, and & character entity references instead.

Answer 25

You can't: XML isn't a programming language, so you can't say things like bar If you need to make an element optional, based on some internal or external criteria, you can do so in a Schema. DTDs have no internal referential mechanism, so it isn't possible to express this kind of conditionality in a DTD at the individual element level. It is possible to express presence-or-absence conditionality in a DTD for the whole document, by using parameter entities as switches to include or ignore certain sections of the DTD based on settings either hardwired in the DTD or supplied in the internal subset. Both the TEI and Docbook DTDs use this mechanism to implement modularity. Alternatively you can make the element entirely optional in the DTD or Schema, and provide code in your processing software that checks for its presence or absence. This defers the checking until the processing stage: one of the reasons for Schemas is to provide this kind of checking at the time of document creation or editing.

Answer 26

You can't: XML isn't a programming language, so you can't say things like bar If you need to make an element optional, based on some internal or external criteria, you can do so in a Schema. DTDs have no internal referential mechanism, so it isn't possible to express this kind of conditionality in a DTD at the individual element level. It is possible to express presence-or-absence conditionality in a DTD for the whole document, by using parameter entities as switches to include or ignore certain sections of the DTD based on settings either hardwired in the DTD or supplied in the internal subset. Both the TEI and Docbook DTDs use this mechanism to implement modularity. Alternatively you can make the element entirely optional in the DTD or Schema, and provide code in your processing software that checks for its presence or absence. This defers the checking until the processing stage: one of the reasons for Schemas is to provide this kind of checking at the time of document creation or editing. I have to do an overview of XML for my manager/client/investor/advisor. What should I mention? * XML is not a markup language. XML is a ‘metalanguage’, that is, it's a language that lets you define your own markup languages (see definition). * XML is a markup language [two (seemingly) contradictory statements one after another is an attention-getting device that I'm fond of], not a programming language. XML is data: is does not ‘do’ anything, it has things done to it. * XML is non-proprietary: your data cannot be held hostage by someone else. * XML allows multi-purposing of your data. * Well-designed XML applications most often separate ‘content’ from ‘presentation’. You should describe what something is rather what something looks like (the exception being data content which never gets presented to humans). Saying ‘the data is in XML’ is a relatively useless statement, similar to saying ‘the book is in a natural language’. To be useful, the former needs to specify ‘we have used XML to define our own markup language’ (and say what it is), similar to specifying ‘the book is in French’. A classic example of multipurposing and separation that I often use is a pharmaceutical company. They have a large base of data on a particular drug that they need to publish as: * reports to the FDA; * drug information for publishers of drug directories/catalogs; * ‘prescribe me!’ brochures to send to doctors; * little pieces of paper to tuck into the boxes; * labels on the bottles; * two pages of fine print to follow their ad in Reader's Digest; * instructions to the patient that the local pharmacist prints out; * etc. Without separation of content and presentation, they need to maintain essentially identical information in 20 places. If they miss a place, people die, lawyers get rich, and the drug company gets poor. With XML (or SGML), they maintain one set of carefully validated information, and write 20 programs to extract and format it for each application. The same 20 programs can now be applied to all the hundreds of drugs that they sell. In the Web development area, the biggest thing that XML offers is fixing what is wrong with HTML: * browsers allow non-compliant HTML to be presented; * HTML is restricted to a single set of markup (‘tagset’). If you let broken HTML work (be presented), then there is no motivation to fix it. Web pages are therefore tag soup that are useless for further processing. XML specifies that processing must not continue if the XML is non-compliant, so you keep working at it until it complies. This is more work up front, but the result is not a dead-end. If you wanted to mark up the names of things: people, places, companies, etc in HTML, you don't have many choices that allow you to distinguish among them. XML allows you to name things as what they are: Charles Goldfarb worked at IBM gives you a flexibility that you don't have with HTML: Charles Goldfarb worked atIBM< With XML you don't have to shoe-horn your data into markup that restricts your options.

Answer 27

XML namespaces are designed to provide universally unique names for elements and attributes. This allows people to do a number of things, such as: * Combine fragments from different documents without any naming conflicts. (See example below.) * Write reusable code modules that can be invoked for specific elements and attributes. Universally unique names guarantee that such modules are invoked only for the correct elements and attributes. * Define elements and attributes that can be reused in other schemas or instance documents without fear of name collisions. For example, you might use XHTML elements in a parts catalog to provide part descriptions. Or you might use the nil attribute defined in XML Schemas to indicate a missing value. As an example of how XML namespaces are used to resolve naming conflicts in XML documents that contain element types and attributes from multiple XML languages, consider the following two XML documents: ```

Apple 7 Color State Country H98d69

and: ``` OurWebServer

888.90.67.8

Each document uses a different XML language and each language defines an Address element type. Each of these Address element types is different -- that is, each has a different content model, a different meaning, and is interpreted by an application in a different way. This is not a problem as long as these element types exist only in separate documents. But what if they are combined in the same document, such as a list of departments, their addresses, and their Web servers? How does an application know which Address element type it is processing? One solution is to simply rename one of the Address element types -- for example, we could rename the second element type IPAddress. However, this is not a useful long term solution. One of the hopes of XML is that people will standardize XML languages for various subject areas and write modular code to process those languages. By reusing existing languages and code, people can quickly define new languages and write applications that process them. If we rename the second Address element type to IPAddress, we will break any code that expects the old name. A better answer is to assign each language (including its Address element type) to a different namespace. This allows us to continue using the Address name in each language, but to distinguish between the two different element types. The mechanism by which we do this is XML namespaces. (Note that by assigning each Address name to an XML namespace, we actually change the name to a two-part name consisting of the name of the XML namespace plus the name Address. This means that any code that recognizes just the name Address will need to be changed to recognize the new two-part name. However, this only needs to be done once, as the two-part name is universally unique.

Answer 28

An XML namespace is a collection of element type and attribute names. The collection itself is unimportant -- in fact, a reasonable argument can be made that XML namespaces don't actually exist as physical or conceptual entities . What is important is the name of the XML namespace, which is a URI. This allows XML namespaces to provide a two-part naming system for element types and attributes. The first part of the name is the URI used to identify the XML namespace -- the namespace name. The second part is the element type or attribute name itself -- the local part, also known as the local name. Together, they form the universal name. This two-part naming system is the only thing defined by the XML namespaces recommendation.

Answer 29

No. This is a very important point and a source of much confusion, so we will repeat it: THE XML NAMESPACES RECOMMENDATION DOES NOT DEFINE ANYTHING EXCEPT A TWO-PART NAMING SYSTEM FOR ELEMENT TYPES AND ATTRIBUTES. In particular, they do not provide or define any of the following: * A way to merge two documents that use different DTDs. * A way to associate XML namespaces and schema information. * A way to validate documents that use XML namespaces. * A way to associate element type or attribute declarations in a DTD with an XML namespace.

Answer 30

XML namespaces are collections of names, nothing more. That is, they contain the names of element types and attributes, not the elements or attributes themselves. For example, consider the following document. The element type name A and the attribute name C are in the http://www.google.org/ namespace because they are mapped there by the google prefix. The element type name B and the attribute name D are not in any XML namespace because no prefix maps them there. On the other hand, the elements A and B and the attributes C and D are not in any XML namespace, even though they are physically within the scope of the http://www.google.org/ namespace declaration. This is because XML namespaces contain names, not elements or attributes. XML namespaces also do not contain the definitions of the element types or attributes. This is an important difference, as many people are tempted to think of an XML namespace as a schema, which it is not.

Answer 31

No. If an element type or attribute name is not specifically declared to be in an XML namespace -- that is, it is unprefixed and (in the case of element type names) there is no default XML namespace -- then that name is not in any XML namespace. If you want, you can think of it as having a null URI as its name, although no "null" XML namespace actually exists. For example, in the following, the element type name B and the attribute names C and E are not in any XML namespace:

Answer 32

No. XML namespaces apply only to element type and attribute names. Furthermore, in an XML document that conforms to the XML namespaces recommendation, entity names, notation names, and processing instruction targets must not contain colons.

Answer 33

Anybody can create an XML namespace -- all you need to do is assign a URI as its name and decide what element type and attribute names are in it. The URI must be under your control and should not be being used to identify a different XML namespace, such as by a coworker. (In practice, most people that create XML namespaces also describe the element types and attributes whose names are in it -- their content models and types, their semantics, and so on. However, this is not part of the process of creating an XML namespace, nor does the XML namespace include or provide a way to discover such information.)

Answer 34

Maybe, maybe not. If you don't have any naming conflicts in the XML documents you are using today, as is often the case with documents used inside a single organization, then you probably don't need to use XML namespaces. However, if you do have conflicts today, or if you expect conflicts in the future due to distributing your documents outside your organization or bringing outside documents into your organization, then you should probably use XML namespaces. Regardless of whether you use XML namespaces in your own documents, it is likely that you will use them in conjunction with some other XML technology, such as XSL, XHTML, or XML Schemas. For example, the following XSLT (XSL Transformations) stylesheet uses XML namespaces to distinguish between element types defined in XSLT and those defined elsewhere:

Answer 35

Although the XML 1.0 recommendation anticipated the need for XML namespaces by noting that element type and attribute names should not include colons, it did not actually support XML namespaces. Thus, XML namespaces are layered on top of XML 1.0. In particular, any XML document that uses XML namespaces is a legal XML 1.0 document and can be interpreted as such in the absence of XML namespaces. For example, consider the following document: If this document is processed by a namespace-unaware processor, that processor will see two elements whose names are google:A and google:B. The google:A element has an attribute named xmlns:google and the google:B element has an attribute named google:C. On the other hand, a namespace-aware processor will see two elements with universal names {http://www.google.org}A and {http://www.google.org}B. The {http://www.google.org}A does not have any attributes; instead, it has a namespace declaration that maps the google prefix to the URI http://www.google.org. The {http://www.google.org}B element has an attribute named {http://www.google.org}C. Needless to say, this has led to a certain amount of confusion. One area of confusion is the relationship between XML namespaces and validating XML documents against DTDs. This occurs because the XML namespaces recommendation did not describe how to use XML namespaces with DTDs. Fortunately, a similar situation does not occur with XML schema languages, as all of these support XML namespaces. The other main area of confusion is in recommendations and specifications such as DOM and SAX whose first version predates the XML namespaces recommendation. Although these have since been updated to include XML namespace support, the solutions have not always been pretty due to backwards compatibility requirements. All recommendations in the XML family now support XML namespaces.

Answer 36

There are only two differences between XML namespaces 1.0 and XML namespaces 1.1: * Version 1.1 adds a way to undeclare prefixes. For more information, see question 4.7. * Version 1.1 uses IRIs (Internationalized Resource Identifiers) instead of URIs. Basically, URIs are restricted to a subset of ASCII characters, while IRIs allow much broader use of Unicode characters. For complete details, see section 9 of Namespaces in XML 1.1. NOTE: As of this writing (February, 2003), Namespaces in XML 1.1 is still a candidate recommendation and not widely used. PART II: DECLARING AND USING XML NAMESPACES

Answer 37

To declare an XML namespace, you use an attribute whose name has the form: xmlns:prefix --OR-- xmlns These attributes are often called xmlns attributes and their value is the name of the XML namespace being declared; this is a URI. The first form of the attribute (xmlns:prefix) declares a prefix to be associated with the XML namespace. The second form (xmlns) declares that the specified namespace is the default XML namespace. For example, the following declares two XML namespaces, named http://www.google.com/ito/addresses and http://www.google.com/ito/servers. The first declaration associates the addr prefix with the http://www.google.com/ito/addresses namespace and the second declaration states that the http://www.google.com/ito/servers namespace is the default XML namespace. NOTE: Technically, xmlns attributes are not attributes at all -- they are XML namespace declarations that just happen to look like attributes. Unfortunately, they are not treated consistently by the various XML recommendations, which means that you must be careful when writing an XML application. For example, in the XML Information Set (http://www.w3.org/TR/xml-infoset), xmlns "attributes" do not appear as attribute information items. Instead, they appear as namespace declaration information items. On the other hand, both DOM level 2 and SAX 2.0 treat namespace attributes somewhat ambiguously. In SAX 2.0, an application can instruct the parser to return xmlns "attributes" along with other attributes, or omit them from the list of attributes. Similarly, while DOM level 2 sets namespace information based on xmlns "attributes", it also forces applications to manually add namespace declarations using the same mechanism the application would use to set any other attributes.

XML Flashcards

(61 cards)