XML
Home Up Search Trademarks how to use

For best results: this site requires that cookies be enabled for proper operation - see Legal Page for more info

Starting December 1, 2006 Techsinfo.be will no longer be available please update your links to http://techinfo.e2uhosting.net Thank you

Select Any of These

XML

LAST UPDATED: 22 May 2006 19:19:28 +0200

XSQL SERVLET

Oracle has a nifty servlet for converting the results of SQL queries into XML. It's part of the Oracle XDK (XML development kit). It combines SQL, XML, XSLT, and HTTP into a streamlined method for delivering SQL result sets as XML over the Internet.

Check it out:

Oracle Technology Network

XSL EDITOR

Let's face it--XSL is not the cleanest of scripting languages. Editing XSL documents can get hairy, and quick! Fortunately, one of our readers sent me a link to a neat XSL tool at the VBXML site. It's called XSL Tester and it's pretty nice. Check it out:

XSL Tester

XMLINT

There are never too many ways to validate an XML document. Microsoft has a nice command-line tool--called XMLINT--for parsing and validating XML documents. It is available at the Microsoft Web site (look under XML Downloads, XML Validation Tool). XMLINT is an XML validation tool that you can execute from a DOS command prompt by passing the name of the XML file as a command-line parameter. You can also specify -w if you want to check for well-formedness and not perform DTD validation. If XMLINT finds any problems, it generates an informative message and tells you the offending line number. Otherwise, it just echoes the name of the file.

Microsoft XML

SIMPLIFY XML WITH VBSCRIPT CLASS

Level: Advanced
Categories:
* VBScript
* XML
* ASP
Browsers targeted:
* Internet Explorer 5
* Internet Explorer 4

VBScript has long been considered a fairly rudimentary language, lacking the robustness that the full-featured Visual Basic product
demonstrates and lacking any real tools for handling even basic object-oriented capabilities (something that JavaScript does handle). However, with the release of VBScript 5.0 (part of the general Internet Explorer 5 upgrade, although the scripting engine can be loaded separately for IE4), at least some of these basic objections are being answered.

One of the most common problems with ASP scripts (which is where most VBScript code is written) is that the combination ASP and HTML tends to make for difficult-to-manage, highly coupled code. Changing such code is difficult, which is why ASP files tend to be fairly difficult to write and manage. Yet, VBScript 5.0 released the ability to build classes. Specifically, the Class and End Class keywords let you define encapsulated entities that can expose properties and methods (and events in a limited fashion). You can use such classes to simplify your Web page generation radically.

For example, the following VBScript class CWebPage exposes two methods: GetPageData, which retrieves the name of an XML file, an XSL transform file, and a mime type (all optional); and ShowOutput, which performs the XML transform and outputs it to response stream.

<%@LANGUAGE="VBScript" %>
<%
Class CwebPage
Dim xmlSource
Dim xslTransform
Dim mimeType
Dim resultText

Public Sub GetPageData()
dim sourceFileName
dim transformFileName
sourceFileName=request("source")
transformFileName=request("transform")
mimeType=request("mimetype")
if mimeType="" then
mimeType="text/html"
end if
if sourceFileName<>"" then
set xmlSource=createObject("Microsoft.XMLDOM")
xmlSource.load server.mapPath(sourceFileName+".xml")
end if
if transformFileName<>"" then
set xslTransform=createObject("Microsoft.XMLDOM")
xslTransform.load server.mapPath(transformFileName+".xsl")
resultText=xmlSource.transformNode(xslTransform)
else
resultText=xmlSource.xml
mimeType="text/xml"
end if
response.ContentType=mimeType
End Sub

Public Sub ShowOutput()
response.write resultText
End Sub
End Class
%>
<%
Function Main()
set WebPage=new CWebPage
WebPage.GetPageData
WebPage.ShowOutput
End Function

Main
%>

The advantage to using classes in this page can be seen in the Main() function. Classes can be loaded in through the use of directives so that the entire body of the code consists of the includes and the Main function call (which should be in the calling page simply for testing purposes).

You can then make calls to your server to download XML formatted with the given XSL stylesheet, and the output is sent to your browser as formatted HTML. This makes it ideal for using with browsers that don't support XML natively (everything but IE5) or those that support it but can't handle printing XML documents properly (IE5).

XML: SITES TO SEE

I've had many requests for references to XML Web sites and tutorials. Here are a few that I always start with when I'm looking for XML information on the Web:

W3C: Official XML Specification http://www.w3c.org/xml

The annotated (and much less confusing) specs http://www.xml.com/axml/axml.html

tools, news, articles, etc. http://www.xml.com, http://www.xml-zone.com/

tutorial

http://www.projectcool.com/developer/xmlz/

Finally, I have to highly recommend you do NOT visit these sites--you may no longer have a reason to read my tips!

RETRIEVING RESPONSES AS XML

Level: Advanced
Categories:
* VBScript
* ASP
* XML
Browsers targeted:
* Internet Explorer 5
* Internet Explorer 4
* Internet Explorer 3
* Netscape Navigator 3
* Netscape Navigator 4

The ASP Request object can be confusing to beginning Web programmers. One common problem is to get all of the terms sent up in a form or query string without knowing precisely what to expect. The problem is that the Request object appears to be a collection of name/value pairs, but in fact it's much more complex. It's actually an interface that exposes four distinct subcollections:

* QueryString--This retrieves all of the name/value pairs that were sent on the command string or were sent as part of a form with its
method sent to GET.
* Form--This retrieves all of the name/value pairs that were sent through a form via a POST method.
* Cookies--This retrieves all of the cookies that have been defined for this page.
* ServerVariables--This retrieves the server variables that were sent as part of the HTTP header or are maintained by the server.

One problem that I frequently encounter is a need to load a Response object into an XML file that I can then pass into an XSL filter.
Knowing that the Response object supports all four collections, I wrote a small routine that queries each collection in turn and turns
the resulting query into an XML document:

<%@LANGUAGE="VBScript"%>
<%
'GetRequestKeys.asp
function getRequestXML()
dim xmlDoc
set xmlDoc=server.createObject("Microsoft.XMLDOM")
xmlDoc.loadXML "<keys/>"
setKeysCollection xmlDoc,"querystring"
setKeysCollection xmlDoc,"form"
setKeysCollection xmlDoc,"cookies"
setKeysCollection xmlDoc,"servervariables"
set getRequestXML=xmlDoc
end function

function setKeysCollection(xmlDoc,collectionName)
set collectionNode=xmlDoc.createElement("collection")
collectionNode.setAttribute "id",collectionName
for each key in eval("request."+collectionName)
set keyNode=xmlDoc.createElement("key")
keyNode.setAttribute "id",key
keyNode.setAttribute "value",request(key)
collectionNode.appendChild keyNode
next
xmlDoc.documentElement.appendChild collectionNode
set setKeysCollection=collectionNode
end function

function main()
set requestDoc=getRequestXML
response.ContentType="text/xml"
response.write requestDoc.xml
end function

main
%>

XML ZONE

The XML Zone is an excellent Web site for XML resources. It displays news that is updated daily, contains links to some of the best XML stuff on the Internet, and posts articles from XML Magazine. It also has a feature called Ask The XML Pro that lets you post a question and browse answers to previously asked questions. (Topics are even categorized for easy browsing.)

XML Zone http://www.xml-zone.com/

--------------------------------------------------------------------------------

XML VOCABULARY

You may have heard the term XML Vocabulary and wondered what it means. An XML Vocabulary is a PUBLIC XML DTD that is used within an industry. One of the fundamental design goals of XML has always been to facilitate the creation of a common, non-proprietary medium for the exchange of data. In the XML world, this means the creation of industry-standard DTDs, also known as Vocabularies. Some examples are the Chemical Mark-up Language (CML) and Electronic Business XML (ebXML).

--------------------------------------------------------------------------------

XML VIA HTTP

HTTP is a great protocol for exchanging XML files over the Internet. You typically associate HTTP with transferring HTML files from a Web server to a browser. However, the HTTP protocol is not limited to HTML. HTTP works just as well for any text-based data, including XML.

Using HTTP to transfer XML data allows you to make use of existing Web servers and network infrastructure, avoid firewall issues (HTTP typically uses the well-known port 80), and reuse server-side technologies like CGI and Java Servlets with which you may already be familiar. Exchanging your XML data can be as simple as an HTTP PUT or GET!

--------------------------------------------------------------------------------

XML TERMINOLOGY--WELL FORMED VS. VALID

As with any technology, XML has its share of terminology and buzzwords with which to deal. Two common terms you'll often hear in relation to XML documents are well-formed and valid. A well-formed document is one that adheres to all the language structure specified by the XML spec. Among other things, this means that the element names are all valid and have matching start and end tags. In addition to being well formed, a valid XML document must adhere to the semantic constraints defined by a DTD.

More on DTDs later.

--------------------------------------------------------------------------------

XML TAG NAMES

It's always a good idea to develop standards for naming entities in a document structure. XML is no exception.

Large, complex documents full of XML mark-up can get ugly in a hurry. Although it's not specified by the XML spec, a de facto standard for naming XML tags is to lowercase the first letter and uppercase the start of additional words. For example:

<thisIsATag></thisIsATag>

Make your XML files easier to maintain by picking a tag-naming standard and sticking to it!

--------------------------------------------------------------------------------

XML TAG NAME GOTCHAS

The XML spec states that the string 'xml', or any combination of upper- and lowercase letters x,m,l at the beginning of a tag name, is "reserved for standardization." This means that tag names like <xmlFileName> and <XMLtag> should not be used. Although most XML parsers will allow tag names that start with 'xml', you should avoid it.

It's also legal to use a colon in a tag name, but it's not recommended. The XML spec reserves the use of the colon for namespaces. For instance, a tag using a namespace indicator would be of the form

<namespace:tagName>

Typically, an XML parser will not throw up a red flag if you use colons in your tag names, but using them is nevertheless a bad habit and should be avoided.

If you aren't sure what a namespace is, we'll have more on that later. For now, avoid using colons and the word 'xml' in your tag names!

--------------------------------------------------------------------------------

XML SCHEMAS

When I hear the word schema, I typically think of relational databases, table columns, and field types. In the XML world, schema refers to using XML instead of DTDs to define the structure of XML documents. XML Schema is a soon-to-be-released standard from the World Wide Web Consortium. It has its roots in four other standards:

XML-Data Data definition markup language (DDML) Document content definitions (DCDs), based on Resource Description Framework (RDF) Schema for object-oriented XML (SOX) All of these standards are similar in that they define a way to replace DTDs with XML. You can read more about the standards and XML Schema at the W3's Web site:

http://www.w3.org/

--------------------------------------------------------------------------------

XML SCHEMA SAMPLE

Today's tip gives you a taste of what an XML schema looks like:

For the DTD:

<!DOCTYPE Name [ <!ELEMENT Name (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ]>

the XML schema would be:

<schema xmlns="http://www.w3.org/1999/XMLSchema"> <element name="Name" type="NameType"/> <complexType name="NameType"> <element name="First" type="string"/> <element name="Last" type="string"/> </complexType> </schema>

This is a pretty simple example--we'll get into more detail later!

--------------------------------------------------------------------------------

XML POCKET REFERENCE

For those of us with major-memory-impairment (MMI), "The XML Pocket Reference" is a great book to keep handy. It contains a tutorial for learning the basics, and it's a great reference for the XML and XSL specifications. Best of all, it costs only about eight bucks!

XML Pocket Reference by Robert Eckstein O'Reilly & Associates, October 1999 (107 pages) http://www.amazon.com/exec/obidos/ASIN/1565927095/tipworld

--------------------------------------------------------------------------------

XML NOTEPAD

After you've been using XML for a while, Notepad starts to lose its luster as an editor. Fortunately, there are several nice XML editors available to make our life a little easier. Among them, Microsoft XML Notepad has a key feature that always gets my attention: It's free! XML Notepad provides a clean, simple-to-use interface for creating XML documents. It has a pane on the left side that displays the XML in a tree structure and a pane on the right side that lets you edit content.

Microsoft XML Notepad http://msdn.microsoft.com/xml/notepad/intro.asp

--------------------------------------------------------------------------------

XML MAGAZINE ONLINE

You've probably seen a copy or two of XML Magazine--it's one of the leading periodicals for XML info. But have you been to its Web site recently? The company publishes most of its articles there, creating a great source of XML information that's available even after your coworker walks off with your copy of the magazine.

XML Magazine http://www.xmlmag.com/

--------------------------------------------------------------------------------

XML MACROS

Internal XML entities are basically macros for XML. You can define an internal entity in a DTD, like this:

<!ENTITY byline "by John Doe, (c) 1996">

and then reference it in your XML document:

<Author> &byline; </Author>

The XML parser is required to substitute all cases of the entity reference (&byline) with the text defined by the ENTITY declaration in the DTD. Therefore, the parser would return the markup like so:

<Author> by John Doe, (c) 1996 </Author>

This is useful if you have a piece of text that will be repeated throughout your XML document.

--------------------------------------------------------------------------------

XML IS CASE SENSITIVE

You probably know by now that XML (unlike HTML) is case sensitive. What this means is that <starttag> and </STARTTAG> do not match as a pair of start-end tags. One of the consequences of XML's case sensitivity is that keywords have to be capitalized. Whenever you use keywords like DOCTYPE, ELEMENT, and ATTLIST, they must be in upper case.

--------------------------------------------------------------------------------

XML IN YOUR PALM

If I ever lose my head, I will know where to start looking for it... in my Palm Pilot. I wouldn't know my own phone number if I didn't have a trusty little PDA keeping track of it. Needless to say, when I ran across an article about using XML with a Palm Pilot, I was a happy puppy. Norman Welsh, a staff engineer with Sun Microsystems, has written a great article about synching your Palm database with other desktop applications using XML. Check it out:

XML from Your Palm http://www.sun.com/software/xml/developers/palm/

--------------------------------------------------------------------------------

XML IN THE INDUSTRY

A clear indicator of XML's presence in the market is the widespread support from industry leaders. The "big boys" in the field have provided some great resources and tools for XML developers. Here are just a few worth checking out:

Microsoft http://msdn.microsoft.com/xml/default.asp

Oracle http://www.oracle.com/xml

IBM http://www.ibm.com/developer/xml

Sun http://www.javasoft.com/xml

--------------------------------------------------------------------------------

XML ESCAPE

The less-than (<) and ampersand (&) symbols can appear in a document only as markup delimiters. If you need to use one of these symbols as content in an XML document, you have to "escape" it by using < for < and & for &. (In the next tip, I'll show you an exception to this rule.)

--------------------------------------------------------------------------------

XML ENCODING

Recently, you received tips on using the encoding attribute to tell an XML processor the character format your XML document uses. For example, the following XML prologue indicates the document uses 8-bit character encoding:

<?xml version="1.0" encoding="UTF-8">

The funny thing is, the XML processor has to know what the encoding standard is before it can read any of the document, including the first line, which specifies the encoding being used. Sounds like the proverbial chicken-and-egg problem. XML processors get around this by reading the first few characters (which should always be <?xml) and matching them against their UTF-8 and UTF-16 values. Using this little trick, XML processors can determine whether the characters are 8-bit or 16-bit and read the rest of the <?xml declaration. Of course, I wouldn't rely on the processor auto-detecting the character encoding. It's always a good idea to remove any ambiguity and include the encoding attribute.

--------------------------------------------------------------------------------

XML DECLARATIONS

XML documents should always begin with an XML declaration. But is it required? Nope.

The following is a well-formed XML document:

<?xml version="1.0"?> <name>john</name> <age>10</age>

and so is this:

<name>john</name> <age>10</age>

--------------------------------------------------------------------------------

XML COMMENTS

XML borrows its commenting style from HTML. The comment starts with <!-- and ends with -->.

Here's an example:

<!-- this is a comment -->

You can put comments in DTDs and XML documents. An XML parser is required to ignore everything between the start and end comment delimiters. There are, however, a couple gotchas with comments: You cannot put a double-dash (--) within a comment block, and you cannot intersperse comments with markup.

--------------------------------------------------------------------------------

A BIZTALK BOOK WITH AN EARLY LOOK AT BTS

Consider reading "Understanding BizTalk," by John Matranga, Stephen Tranchida, and Bart Preecs. Extensively researched and reviewed for accuracy (at the moment, anyway) by several Microsoft insiders, the book offers the best public look to date at Microsoft's ideas for how XML fits into its selection of server-side solutions for business-to-business and business-to-consumer sales. There's also an advance look at BizTalk Server (BTS), and though it will probably change before final release, the early peek is tantalizing.

"Understanding BizTalk" by John Matranga, Stephen Tranchida, and Bart Preecs Sams Press ISBN 0672317877 http://www.amazon.com/exec/obidos/ASIN/0672317877/tipworld

--------------------------------------------------------------------------------

XML BOOKS

One of our readers dropped me a line to recommend the following two books. I've not checked them out yet, but he tells me these are great as introductory material for the newbie and as a reference material for the more experienced XML developer.

XML: A Primer (2nd Edition) By Simon St.Laurent IDG Books Worldwide, 9/1999 ISBN: 076453310X http://www.amazon.com/exec/obidos/ASIN/076453310X/tipworld

XML Unleashed By Michael Morrison, David Brownell, and Frank Boumphrey Sams, 12/1999 ISBN: 0672315149 http://www.amazon.com/exec/obidos/ASIN/0672315149/tipworld

--------------------------------------------------------------------------------

XML AND JAVA SERVLETS

Servlets are the Java answer to CGI. You can specify a servlet as the target of an HTTP request and dynamically generate HTML that is returned to the Web browser. The great thing about servlets is that they are not limited to generating HTML; you can use them to return XML documents as well. The following is a snippet of code that shows how to return XML from a servlet:

public class XmlServlet extends HttpServlet { public void service(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { response.setContentType("text/xml");

PrintWriter out = response.getWriter();

out.println("<?xml version=\"1.0\"?>"); out.println("<name>Tarzan</name>"); } }

A key aspect of this code is the call to response.setContentType(), which sets the content type to "text/xml".

--------------------------------------------------------------------------------

XML AND HTML--DISTANT COUSINS

For those of you with a background in HTML just now starting to look at XML, there are a few gotchas that always pop up: First, XML tag names are case-sensitive; <myTag> and </mytag> will not match. In HTML, you could liberally use (or abuse) any combination of lower and upper case in tag names. With XML, however:

<aTag> some data </aTag> --good <aTag> some data </atag> --bad

Also, you must always be sure to include the starting and ending tag or the XML parser will throw up a red flag. In HTML, you can sometimes leave off ending tags, but in XML a start tag without a matching end tag will cause a parser to return an error.

--------------------------------------------------------------------------------

XML AND BUSINESS APPLICATION SERVICE PROVIDERS

 

Lots of people write to me asking what XML is good for. Almost without

exception, these people are independent operators or members of small

organizations. XML hasn't really become accessible to these subsets of

the computer world yet. Consider for a minute how small businesses

work. They employ very few people, hiring only those who relate

directly to the company's product or service. Most other functions are

contracted out to other companies--legal services, payroll,

distribution, whatever. Those functions that aren't farmed out are

kept in-house mainly because the available contractors aren't set up

for small jobs. Advertising falls into this latter category.

 

Think about the possibility that XML has here. If a standardized

system of tags enabled you to exchange data with other companies

easily, the cost of contracting out even more work would fall. You

could use XML to share data with an accountant (sort of like you might

export QuickBooks files for the accountant's review now). You could

give a marketing agency direct access to some of your sales figures,

enabling them to have immediate feedback on their work. The same goes

for payroll, benefits, and lots more. Easy data exchange enables small

businesses to be more agile. For a look at how this might work, look

at a couple of pages:

 

News about XML-facilitated Application Service Providers:

http://www.aspnews.com/forum.htm

 

A semi-working Web app that hints at how multiple services might

integrate in the future:

http://www.gldialtone.com/webledger.htm

 

A list of industries in which someone's working on the ASP model:

http://www.aspnews.com/dirwhodef.htm

 

 

----------------------------------------------

 

XHTML--PART 1 OF 4

eXtended HTML is the next-generation HTML. I'm sure you're aware that HTML is the presentation language of the Internet. HTML has done an excellent job of providing a platform-neutral, easy-to-understand language for creating Web pages. Unfortunately, HTML comes in many different flavors, and most browsers hide the differences. XHTML 1.0 is the current W3C recommendation for the latest version of HTML. In a nutshell, XHTML takes HTML 4.01 and applies the syntax rules of XML. These basic guidelines will get you started converting HTML to XHTML:

All elements with content must have a start and end tag. Empty elements must have an end tag, or be closed with a / at the end of the start tag. Names, including elements and attributes, should be lowercase. Elements must be properly nested. Script elements should be placed in a CDATA section to avoid improper parsing. Attribute values must be within single or double quotes. We'll look at examples of these in our next tip.

--------------------------------------------------------------------------------

XHTML--PART 2 OF 4

Following up on our previous tip, here are some samples of HTML gotchas in XHTML:

1. All elements with content must have a start and end tag: <p> -- no good <p> </p> -- very good

2. Empty elements must have an end tag, or be closed by adding a / at the end of the start tag: <br> -- no good <br /> -- much better

3. Names should be in lowercase: <FORM ACTION=...> -- oh no! <form action=""> -- whew!

4. Elements must be properly nested: <b><i> .. </b></i> -- out of order <b><i> .. </i></b> -- ooh, I like this

5. Script elements should be placed in a CDATA section to avoid improper parsing: <script> .. </script> should be replaced with... <script> <![CDATA[ ... ]]> </script>

6. Attribute values must be within single or double quotes: <img src=/hi/img.gif> -- this scares me <img src="/hi/img.gif" /> -- I feel better

--------------------------------------------------------------------------------

XHTML--PART 3 OF 4

Keeping with our HTML theme, lets take a look at a typical (simple) HTML page and see what its XHTML counterpart would look like.

Here is a simple HTML page:

<HTML> <HEAD><TITLE>Hello World!</TITLE></HEAD> <BODY> Hello world! <HR> <p> Hello again! </BODY> </HTML>

When converted to XHTML, it looks like this:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head><title>Hello World!</title></head> <body> <p>Hello world!</p> <hr/> <p>Hello again!</p> </body> </html>

Here are a few things to note:

An XML processing instruction has been added. A DOCTYPE declaration has been added. A namespace (xmlns) attribute appears on the <html> tag. All elements and attribute names are lowercase. All tags are properly closed, either with a / or an ending tag.

--------------------------------------------------------------------------------

XHTML--PART 4 OF 4

I hope I've piqued your interest in taking the next step with HTML and you're ready to start converting all of your HTML documents to XHTML. I know what you're thinking: "Is he nuts?" (Nevermind the answer to that.) If you have a stockpile of HTML documents you just can't wait to dig into, or if you're just curious, there is a great tool called HTML Tidy that will help you convert HTML to XHTML.

HTML Tidy http://www.w3.org/People/Raggett/tidy/

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: WHITESPACE

IN HTML, it's possible to use whitespace liberally. In fact, it's good practice to do so, to make your code more readable and more easily editable. XHTML isn't nearly so forgiving, and different parsers will react to large quantities of whitespace in different ways. In particular, you want to avoid whitespace within attribute sequences, so something like this is a bad idea:

<input type="button" value="Press Me" id="button1" />

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: TAG MATCHING

One of the differences between HTML and XHTML is that the XHTML specification is absolutely strict about matching tags. In HTML, something like this is acceptable:

<LI> Cod liver oil

In XHTML, that's not okay, and you'd have to take care to match opening tags with closing tags, like this:

<li> Wart remover </li>

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: SCRIPT REFERENCES

XHTML gets confused if your scripts contain certain (fairly common) character sequences, so it's best to always keep your scripts in remote files and refer to them from within XHTML documents. For example, if you use the decrement operator (--) in a script, an XHTML parser will get confused and misinterpret things. If you refer to a separate file, the problem goes away. You can conceal your script code with HTML/XHTML comments for now, but that approach may not work under future versions of XHTML. The same rule goes for stylesheets.

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: LOWERCASE LETTERS

By and large, XHTML is identical to the HTML you've probably gained some familiarity with during recent years. Most of the tags are the same, and in fact, the main surface difference is that you (as the page creator) are bound to obey stricter rules about how you attach tags and attributes to the elements of your documents.

One of the biggest differences between HTML and XHTML is that XHTML requires all tag and attribute names to be in all lowercase letters. In HTML, you can use <Img>, <IMG>, and <img> interchangeably. Not so in XHTML. The tag that defines image elements is <img>, and there can be no debate about that. The same rule goes for element attributes. All lowercase is the standard.

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: IDENTIFIERS

In HTML, we've traditionally used the NAME attribute to assign a unique identifier to document elements, mainly so we can refer to those elements easily in scripts and other instruction sets. XHTML prefers the ID attribute in place of the NAME attribute. Therefore, where HTML would use something like this...

<INPUT TYPE="BUTTON" VALUE="Press Me" NAME="button1">

...XHTML would use something like this (note the new structure for empty tags):

<input type="BUTTON" value="Press Me" id="button1"/>

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: EMPTY TAGS

Last time, we learned that XHTML is picky about the way you tag elements in your documents. XHTML won't let you get away with implied closing tags (as in <P> tags without </P> tags). But what, you may ask, do you do with HTML tags that are empty--that is, that don't have closing tags? To embed an image, we've traditionally used this tag:

<IMG SRC="filename">

No dice in XHTML. In that environment, you have to use XML-style empty elements. Therefore, the tag above would be restructured like this:

<img src="filename"/>

The forward-slash character must appear before the closing angle bracket.

--------------------------------------------------------------------------------

XHTML FLAVORS

The XHTML specification defines three DTDs to be used in XHTML documents. All XHTML documents must include a DOCTYPE declaration that points to one of these variations:

XHTML Transitional

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/transitional.dtd">

This is the low-impact XHTML DTD and simply requires that you clean up your HTML to follow the XML language constraints. It also allows the use of tags like <font> to control visual aspects of the page. I recommend using Transitional when you first start converting HTML over XHTML:

XHTML Strict

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">

As the name implies, this DTD has some additional constraints for hardcore XML compatibility. It also assumes the layout of the page is specified using a technology like Cascading Style Sheets (CSS), so tags like <font> and <color> are not allowed.

XHTML Frameset

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/frameset.dtd">

This is to be used if your HTML uses framesets.

As usual, these DTDs are freely available for your perusal at the W3 Web site:

http://www.w3.org/TR/2000/REC-xhtml1-20000126/#dtds

--------------------------------------------------------------------------------

WORKING WITH EMPTY ELEMENTS

In HTML, the <IMG> element is known as an empty element. There is no </IMG> tag. The <IMG> tag is used to insert a piece of information--specifically, a reference to an image file and some information about how it is to be displayed--into a document. Nonempty elements, such as those defined by <H1> and </H1>, are passages of text.

You can have empty elements in XML documents, too. Let's say we want an XML document that lists the drawings associated with a particular architectural project. We won't write any sort of parser that displays actual image files, but we will explore the XML rules that allow us to create and use empty elements.

For the purposes of this project, we'll say we want to be able to list three kinds of drawings--floor plans, elevations, and cross-sections. So, we need to define three elements in our DTD. That's for next time.

--------------------------------------------------------------------------------

WML VARIABLES

WML variables provide a facility for storing data while the user navigates cards in a deck. WML variables have deck scope, which means they're visible from all cards in the deck. There are two main ways to create and set a WML variable: via <setvar> and through any input element. Here is a sample using the <setvar> tag, which assigns the value "admin" to the variable named "login":

<setvar name="login" value="admin" />

Here is a sample of an input field that assigns to the "login" variable the string entered by the user:

<input name="login" type="password" />

--------------------------------------------------------------------------------

WML HYPERLINKS

Hyperlinks in WML are created using the <go> tag. Building on the example from a few tips back, you can see how to use the <go> tag to navigate between cards in a deck.

<wml>

<card id="card1"> <p> page 1 </p>

<do type="accept" label="card 2"> <go href="#card2"/> </do>

</card>

<card id="card2"> <p> page 2 </p>

<do type="accept" label="card 1"> <go href="#card1"/>

</card>

</wml>

This sample introduces a few new concepts. The <do> tag with a type="accept" simply displays a submit-like option to the user. When it is selected, the <go> tag within the <do> tag will be executed. In addition, the <href> attribute of the code tag uses the familiar pound-notation (#name) for linking to a card within the same deck. Lastly, the <go> tag uses the shorthand notation defined by XML for specifying an empty tag: <tag/>.

--------------------------------------------------------------------------------

WML DECK

WML files are commonly called decks. A deck is the smallest unit of WML that is transferred to a WAP device. It can be thought of as a single page of interaction, much like a single HTML page. Since WAP devices are typically scarce on screen real estate, a deck is broken up into cards. A WAP browser displays only one card at a time. A deck is like bundling several closely related Web pages into a single Web page and transmitting it as a single unit--the deck is the WML file, and each card would be a page. Here is a simple example of a WML application with two cards:

<wml>

<card id="card1"> <p> page 1 </p> </card>

<card id="card2"> <p> page 2 </p> </card>

</wml>

--------------------------------------------------------------------------------

WIRELESS APPLICATION PROTOCOL

The Wireless Application Protocol (WAP) is a standard proposed by the WAP Forum for Internet-enabling wireless devices. The WAP Forum--an organization formed by Nokia, Phone.com, Ericsson, and Motorola--has a membership that reads like a who's who of heavy-hitters in the wireless and Internet technology industry. Its goal is to bring Internet content and services to digital cell phones and other wireless devices.

The WAP specification defines an application development environment and transport protocols for creating and delivering content to WAP-enabled devices. You might be wondering, "What does this have to do with XML?" Well, the client portion of a WAP application is written using an XML mark-up language called Wireless Markup Language (WML). In the next few tips, we'll talk about WAP and how to use WML.

WAP forum http://www.wapforum.org/

--------------------------------------------------------------------------------

WINDOWS DNA EXPLAINED

Microsoft excels at attaching marketing words to whole collections of products and technologies with individual identities. Such is the case with Windows DNA and Windows DNA 2000. Windows DNA comprises Windows NT Server 4, Visual Studio, Site Server 3, SNA Server 4, SQL Server 7, Transaction Server (MTS), Message Queue (MSMQ), IIS 4, and the Component Object Model (COM) architecture. Visual Studio is itself an assemblage of Visual Basic, Visual C++, and half a dozen other development environments. Basically, then, Windows DNA is an umbrella term for Microsoft's products that facilitate the storage, retrieval, and sharing of data. The problem is, Windows DNA isn't particularly open. You can write adapters that ease interorganizational data sharing, but that's a lot of trouble. An XML-based solution is needed, and Windows DNA 2000 serves that need.

--------------------------------------------------------------------------------

WINDOWS DNA 2000 EXPLAINED

Windows DNA 2000 incorporates generally the same list of software (though in updated form--Windows 2000 Advanced Server and SQL Server 2000 fall under the Windows DNA 2000 umbrella). Interestingly, Windows DNA 2000 also incorporates a new product called Microsoft BizTalk Server (BTS). That piece of software is responsible for facilitating the exchange of information among applications and businesses. No one has really seen it to date--a beta is due out in the middle of 2000--but some details have come out. We'll get into them for a few days.

--------------------------------------------------------------------------------

WEB SITE: XML DOM

By now, you are probably familiar with the acronym DOM. It stands for Document Object Model, which is the standard for accessing the contents of an XML document using a nested tree structure. That sounds easy, but when you start digging into DOM you will find a multitude of interfaces, properties, and methods. The DevX Web site listed below will help you wade through the DOM interface:

XML Document Object Model http://www.devx.com/upload/free/features/xml/objectmodel/xmldom1.asp

--------------------------------------------------------------------------------

WAP TOOLS

I'm sure all the recent talk about WAP has got you itching to start developing applications in WML. Fortunately, there are several development kits and tools available that enable you to create and test WAP applications using your desktop PC. Here are a few:

Phone.com offers an SDK for developing HDML and WML applications: http://www.phone.com/

Nokia has a popular WAP development SDK: http://www.nokia.com/wap

Here is a cool WAP browser that is great for viewing WML content: http://www.slobtrot.com/winwap/

--------------------------------------------------------------------------------

VB-XML BOOK

If Visual Basic is a tried and true friend, and you're wondering how to use it with XML, look no further. I recommend you get a copy of "Professional Visual Basic 6 XML." It covers XML history and background, provides detail on how to parse and validate XML, and includes coverage of a myriad of XML-related technologies--all this as it relates to Visual Basic!

Professional Visual Basic 6 XML by James Britt, Teun Duynstee Wrox Press, April 2000 (500 pages) http://www.amazon.com/exec/obidos/ASIN/1861003323/tipworld

--------------------------------------------------------------------------------

VB AND XML--PART 1 OF 7

I'm going to spend the next few tips showing you the ABCs of parsing an XML document using Visual Basic. As usual, if you want to process XML documents from application software, you will need an XML parser. Microsoft distributes a free XML parser that works with Visual Basic. It's commonly referred to as MSXML and is available here:

http://msdn.microsoft.com/xml/general/msxmlprev.asp

MSXML supports both the SAX and DOM methods of processing a document. Recall that SAX is an event-based method and that DOM reads the XML document into an in-memory tree structure that you can access via an API. We will use the DOM method, and in particular the following classes and interfaces:

DOMDocument: Top-level class for a DOM document; loads and validates the document IXMLDOMNode: Single node in the DOM tree IXMLNodeList: List of nodes from a DOM tree The following DevX site is a great reference tool for learning more about the DOM interface:

DevX: XML Document Object Model http://www.devx.com/upload/free/features/xml/objectmodel/xmldom1.asp

--------------------------------------------------------------------------------

VB AND XML--PART 2 OF 7

Today's tip gets us started on parsing XML documents from Visual Basic. The first step is to make sure you've set up the Visual Basic environment to use the XML parser. If you're using MSXML, this is done by selecting Project, References. Make sure the Microsoft XML component is selected. The following code shows how to use the DOMDocument class to load an XML document (test.xml) located in the same directory as the application:

Set xml = New DOMDocument xml.Load (App.Path & "\test.xml")

Pretty simple, huh? In our next tip, we'll expand this to include handling errors and validating the document.

--------------------------------------------------------------------------------

VB AND XML--PART 3 OF 7

In our previous tip, we saw the bare-bones, simplest way to load an XML document into a DOMDocument object. Today's tip will create a more generic VB function, which will validate and load an XML document using the following code:

Private Function LoadXMLDocument (f As String) As DOMDocument

On Error GoTo LoadXMLError

Dim myErr As IXMLDOMParseError Dim doc As DOMDocument

Set doc = New DOMDocument doc.validateOnParse = True doc.Load (f)

Set myErr = doc.parseError If (myErr.errorCode <> 0) Then GoTo LoadXMLError End If

Set LoadXMLDocument = doc

GoTo LoadXMLOk

LoadXMLError: Debug.Print ("xml parse error " & myErr.reason) Set LoadXMLDocument = Nothing

LoadXMLOk:

End Function

Here is how you would use this code:

dim xml as DOMDocument Set xml = LoadXMLDocument(App.Path & "\test.xml") If xml Is Nothing Then Debug.Print "XML is not valid" Else Debug.Print "XML is valid" End If

Note that the function uses the validateOnParse attribute of the DOMDocument class to instruct the parser to validate the document. It then uses the parseError attribute to determine whether an error occurred.

--------------------------------------------------------------------------------

VB AND XML--PART 4 OF 7

Now that you've validated your XML and loaded it into a DOMDocument, you're ready to start processing elements. One class with which you'll deal when grabbing elements out of a DOM tree is IXMLDOMNode. Here are few of the attributes and methods commonly used with IXMLDOMNode:

dataType: Access the node's data type childNodes: Access child nodes attributes: Access the attributes for the node xml: Retrieve the xml for the node and all its children text: Contains the text content of a node and all its children This sample code walks through the child nodes of the document root element and prints their xml content:

Dim root As IXMLDOMNode Set root = xml.documentElement

Dim node As IXMLDOMNode For Each node In root.childNodes Debug.Print node.xml Next

--------------------------------------------------------------------------------

VB AND XML--PART 5 OF 7

When processing XML documents, you'll often want to pull out the content of specific nodes. For instance, suppose we have the following XML:

<MyBusinessData> <Address> ... </Address>

.. other tags here

<Customers> <Customer> <Name> Joe Green </Name> <Phone> 888 888 8888 </Phone> </Customer> <Customer> <Name> Becky Smith </Name> <Phone> 999 999 9999 </Phone> </Customer> </Customers> </MyBusinessData>

Now, let's suppose you want to print a list of customer names. Fortunately, the IXMLDOMNode class has a method that allows you to extract a list of nodes based on the element name. Here is the sample code to retrieve all <Customer> tags from the root document and then print the contents of the <Name> tag:

Dim root As IXMLDOMNode Set root = xml.documentElement

Dim listNodes As IXMLDOMNodeList Set listNodes = root.selectNodes("Customer") For Each node In listNodes Debug.Print node.selectSingleNode("Name").Text Next

This example introduces one new class, IXMLDOMNodeList. This class provides a simple container for a collection of IXMLDOMNodes. The example also introduces the use of the selectNodes method of IXMLDOMNode, which takes a single parameter to indicate which nodes you want to select and returns a list of those nodes.

--------------------------------------------------------------------------------

VB AND XML--PART 6 OF 7

So far, all the tips in this series have demonstrated ways to load and read an XML document. However, you'll often want to update your document as well. Here is a code sample that updates the text of a node and saves the new XML document to a file:

Set node = root.selectSingleNode ("Name") If Not node Is Nothing Then node.Text = "Bubba" End If

xml.save App.Path & "customer.xml"

This code works because the Text attribute of the IXMLDOMNode class can be used to change the value of an element. In addition, the DOMDocument class has a save method that can be used to write an XML document to a file.

--------------------------------------------------------------------------------

VB AND XML--PART 7 OF 7

Today's tip wraps up the VB and XML series. We've already seen how to load, read, and write an XML document using Visual Basic and the Microsoft XML parser. Obviously, I have only scratched the surface--and those of you who would like to dig into the details of XML and Visual Basic should check out the VBXML site. It's a great site for VB/XML-related resources and sample code.

One last thing about VB and XML: Note that some of the attributes and methods supported by MSXML are extensions to the W3C DOM specification.

VBXML http://www.vbxml.com/

--------------------------------------------------------------------------------

VALIDATE XHTML

So you've spent hours converting all of your HTML documents to XHTML and you're wondering, "Now what?" If you're like me, you would like a little instant gratification for a job well done. In lieu of taking a vacation, you might want to try W3's XHTML validator. You simply enter a URL and hit the Submit button, and it will download and validate your document. If your document is valid, it prints a nice summary page congratulating you. Otherwise, it displays a message showing what went wrong.

W3's XHTML validator http://validator.w3.org/

--------------------------------------------------------------------------------

UTF-UH-OH-8

It seems I was way off the mark in some previous tips concerning the use of UTF-8 character set encoding. Recall that the xml processing instruction has an encoding attribute that can be used to specify what character set is being used:

<?xml version="1.0" encoding="UTF-8"?>

In a previous tip, I had stated that UTF-8 uses one byte to represent characters, making it unusable for large character sets. This is incorrect--UTF-8 uses one byte to represent ASCII characters. However, it uses more than one byte (up to 3) to represent character sets (such as Asian characters) that require more than one byte.

Thanks to our readers for keeping me honest!

--------------------------------------------------------------------------------

USING INTERNAL AND EXTERNAL SUBSETS

It is quite common to use both internal and external subsets at the same time. This can be useful if you want to include application-wide elements in all your XML documents. Here is an example:

In the file common.dtd, put the following:

<!ELEMENT application_common (name,version)> <!ELEMENT name (#PCDATA)> <!ELEMENT version (#PCDATA)>

In your XML file, put

<?xml version="1.0"?> <!DOCTYPE Customer SYSTEM "common.dtd" [ <!ELEMENT Customer (First,Last,application_common? )> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ]>

<Customer> <First>Mean</First> <Last>Greene</Last> <application_common> <name>My Application</name> <version>1.0</version> </application_common> </Customer>

Note the use of a SYSTEM specifier in the DOCTYPE declaration and the addition of an internal subset as well. The internal subset includes the elements defined in the external DTD by adding an optional child element (application_common) to the document root element (Customer). This is a good way to include common elements across several different DTDs.

--------------------------------------------------------------------------------

USING BIZTALK EDITOR

Microsoft BizTalk Server (BTS) includes what appears, from early reports, to be a strong XML document type definition (DTD) editor. It's based on a tree analogy, which means you can expand and collapse elements to see and edit their subsidiary elements. It's easy to adjust elements' attributes, too, so you'll have no problem making elements required, optional, or dependent on other fields. There's also a facility for importing existing DTDs and editing them to fit your specific needs.

--------------------------------------------------------------------------------

UNDERSTANDING THE ROLE OF BIZTALK SERVER

The idea behind Microsoft BizTalk Server (BTS), which remains deep in development, is that it's an engine for establishing and managing the rules that govern the exchange of data between two business processes--two businesses, particularly. Say, for example, that Telstra (an Australian telecommunications company) buys switches for its networks from Nortel (a maker of such equipment). For the purchase to go ahead, several things need to happen:

Telstra needs to announce its specifications, including product, quantity, delivery date, warranty requirements, and so on. Nortel needs to state its ability to meet the specifications, and at what price it can do so. Telstra needs to accept Nortel's quote. The order needs to be filled and delivery confirmed. Easy enough, but there's potentially rather a lot of paperwork involved. A lot of human effort could be expended on reading values out of one company's forms and entering them in the databases of the other. BTS establishes relationships between fields in the companies' databases, allowing direct connectivity between Telstra's "orders-placed" database and Nortel's "work-in-progress" database. BTS might also handle translation issues, such as the conversion of Canadian dollars into Australian dollars in this case. A big part of BTS's job is to promote secure data exchange--allowing Nortel to see what it needs to see in Telstra's database, and no more.

--------------------------------------------------------------------------------

UNDERSTANDING BTS PIPELINES

Central to the operation of Microsoft BizTalk Server (BTS) is what's called a pipeline. A pipeline is a pathway by which information may travel, complete with rules about how the information is formatted and used. A pipeline would specify a data source (a company or person) and a data destination (another company or person) for an established business relationship. Alternately, there could be no specified source (an option that comes in handy, for example, when a buyer is shopping for a vendor) or no specified destination (useful when a company sells its products to all comers). The pipeline also would specify how data transiting it should be formatted, and how fields on either end of the pipeline map to the transmission format. For example, a vendor's invoice fields could be made to map directly into a buyer's purchase order fields.

--------------------------------------------------------------------------------

TIP OF THE YEAR

Yep, this is it. Put this one on your refrigerator, make copies, and send it to the folks. I once spent hours trying to figure out why the XML parser I was using could not read a SYSTEM DTD that I was specifying like this:

<!DOCTYPE Customer SYSTEM "common.dtd">

where common.dtd was in the same directory as the XML file I was processing. It turns out that some XML parsers are smart enough to convert the filename to a valid URI; some are not. To get around the problem, I had to change the DOCTYPE declaration to this:

<!DOCTYPE Customer SYSTEM "file:/D:/JOHN/xmltips/test/common.dtd">

--------------------------------------------------------------------------------

THE XHTML NAMESPACE

In writing an XHTML document, you must include an announcement of the XHTML namespace immediately after the XHTML DTD declaration. The namespace definition basically imports a complete set of HTML-like tags for you to use in applying formatting and other design rules to your documents. In XHTML, the namespace declaration appears as an attribute of the <html> tag that opens the document, like this:

<html xmlns="http://www.w3.org/TR/xhtml1">

That brings in the XHTML variations of the familiar HTML elements. However, one of the top attractions of XHTML is that it allows you to import and use your own (or at least, other) namespaces. That's the topic for next time.

--------------------------------------------------------------------------------

THE XHTML DTD

When writing an XHTML document, you're required to declare which Document Type Definition (DTD) the document follows, the same as with any XML document. For XHTML, the DTD announcement looks like this:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

That must be the first set of lines in any XHTML document.

--------------------------------------------------------------------------------

THE REAL WORLD

Have ever wanted to talk to your PDA and have it take notes for you? Or maybe tell your Web browser to return to the previous page by verbal command? VoiceXML is an initiative to provide just those capabilities. It is an XML-based markup language targeted at providing voice access and interactive voice response to Web-based content and applications. VoiceXML is designed for supporting two-way, interactive dialogs using various forms of audio input, such as synthesized speech, digitized audio, and speech recognition.

VoiceXML Forum http://www.voicexml.org/

alphaWorks http://www.alphaworks.ibm.com/tech/voicexml/

--------------------------------------------------------------------------------

THE REAL WORLD

Every once in a while I like to throw in an example of how XML is being used in the industry. Microsoft's Channel Definition Format (CDF) is one example. CDF is an XML-based format that describes the content for an active channel. An active channel is basically a group of Web pages. The main distinction is that while a user is viewing your channel, the page is updated automatically.

Here is a pseudo-sample to give you a flavor of what CDF looks like:

<?XML Version="1.0"?> <CHANNEL HREF="http://blah.blah/x.html"> <ABSTRACT>This is a sample channel</ABSTRACT> <TITLE>Sample Channel</TITLE> <LOGO HREF="http://blah.blah/icon.ico" STYLE="icon" /> <LOGO HREF="http://blah.blah/image.gif" STYLE="image" /> <LOGO HREF="http://blah.blah/wide.gif" STYLE="image-wide" /> <ITEM HREF="http://blah.blah/item1/"> <ABSTRACT>Item 1</ABSTRACT> <TITLE>Item 1</TITLE></ITEM> <ITEM HREF="http://blah.blah/item2/"> <ABSTRACT>Item 2</ABSTRACT> <TITLE>Item 2</TITLE></ITEM> </CHANNEL>

Learn more here:

http://www.pcworld.com/r/tw/1%2C2061%2Ctw-xm5-43%2C00.html

http://www.pcworld.com/r/tw/1%2C2061%2Ctw-xm5-42%2C00.html

--------------------------------------------------------------------------------

THE NAME GAME

A valid XML tag name starts with a letter or one of a couple of punctuation characters, followed by a combination of letters, digits, and punctuation marks. Specifically, a tag name must start with a letter (a..z, A..Z), underscore, or colon.

The rest of the name can be any combination of letters, digits, colons, hyphens, periods, and underscores. Some examples of valid tag names are

<tagName></tagName> <TAG-NAME></TAG-NAME> <_tagName1></_tagName1>

Some invalid tag names are

<,name></,name> <3name></3name> <a tag></a tag>

--------------------------------------------------------------------------------

TESTING THE UNIQUENESS REQUIREMENT

Let's modify our XML document to look like this:

<?xml version="1.0"?>

<!DOCTYPE drawingList SYSTEM "drawings4.dtd">

<drawingList>

<floorPlan name="First Floor" URL="http://www.yahoo.com" />

<floorPlan name="First Floor" URL="http://www.yahoo.com" /><elevation name="East Elevation" URL="http://www.yahoo.com" />

<elevation name="South Elevation" URL="http://www.yahoo.com" /> <crossSection name="CS A-A" URL="http://www.yahoo.com" />

<crossSection name="CS B-B" URL="http://www.yahoo.com" /></drawingList>

Save that as drawingList4x.xml and load it with Microsoft Internet Explorer.

We expect an error, because the two floorPlan elements have the same name--something that's forbidden by the ID in the attribute definition. But IE doesn't return an error. This goes to prove that you can't always trust a browser as a tester of XML validity.

--------------------------------------------------------------------------------

SYSTEM EXTERNAL SUBSETS

You have two options when specifying the location of a DTD in a DOCTYPE declaration: SYSTEM or PUBLIC. If SYSTEM is used, the DTD specifier should contain a URI (Uniform Resource Identifier) that points to a DTD file. The DTD file, plus any internally defined elements, make up all declarations needed to validate the document. Here is an example:

<!DOCTYPE AddressBook SYSTEM "http://www.myserver.com/xml/address.dtd">

--------------------------------------------------------------------------------

STYLE SHEETS

There are two main technologies used for displaying XML documents: eXtended Style Language (XSL) and Cascading Style Sheets (CSS). Today's tip will show you how to use CSS with XML documents. You can associate a style sheet with an XML document using the <?xml-stylesheet> processing instruction, as follows:

<?xml version="1.0" ?> <?xml -stylesheet type="text/css2" href="foo.css" ?> <FOO> Hello XML! </FOO>

The CSS file could contain

FOO {display: block; font-size: 24pt; font-weight: bold;}

The attributes for the processing instruction are

href: The style sheet location type: The style sheet language (possibilities include text/css, text/css2, text/xsl) media: The target media (screen, print, etc.) title: Title for the style sheet (not of much use) alternate: yes or no; tells the style sheet engine if there are alternative style sheets You can use the style sheet processing instruction for XSL style sheets, too--more on that later!

--------------------------------------------------------------------------------

STANDALONE DOCUMENTS

By now, everyone has seen the standard XML processing instruction at the beginning of an XML document:

<?xml version="1.0" ?>

One of the attributes of the XML processing instruction is standalone. Possible values are yes and no. For example:

<?xml version="1.0" standalone="yes"?>

If standalone is set to yes, the XML document is declaring that it does not use any external entities (like an external DTD). Some XML processors have optimized algorithms for handling standalone documents, so if your document doesn't use any external stuff, it's a good idea to specify it.

--------------------------------------------------------------------------------

SCHEMA

In reading books and articles about XML--which at this stage of its life remains mired in large quantities of hype--you encounter the word "schema" a lot. You can be led to believe that a "schema" is something you create and/or use in the process of building XML documents and parsing them.

Strictly speaking, no. A schema, in the XML sense, is a set of rules for marking up documents with XML tags. A DTD of the sort we've been creating in the previous series of tips is one kind of schema. There are others, most of them still ideas and proposals that haven't yet been standardized and may never be. Schemas other than DTDs can solve such problems as DTD's poor extensibility and lack of capacity for inheritance.

Grammar hint: The word "schema" is singular, not plural. The valid plural forms of the word are "schemas" and "schemata," with the former seeming more popular.

--------------------------------------------------------------------------------

REUSING DTDS

It's always a good idea to reuse work you've done--particularly work that has been tried and tested. Fortunately, XML provides a simple way to help you reuse DTDs. By using entities, you can include external DTDs as follows:

<!ENTITY mycompanydtd SYSTEM "dtds/company.ent">

.. other stuff here ..

%mycompanydtd; <!-- include the DTD here -->

This can be extremely useful, especially if you need to share common DTDs across multiple projects or organizations and you want to avoid the nightmare of maintaining multiple versions in different files.

--------------------------------------------------------------------------------

REQUIRING UNIQUENESS WITH ID

One of the requirements we originally laid down in the specification for the list of drawings was that each instance of a given element would be required to have a unique name attribute. We can accomplish this with an XML keyword-ID. ID goes into the DTD in the same place we put CDATA before.

Here's a modified DTD:

<!ELEMENT countryList (floorPlan?, elevation?, crossSection?)>

<!ELEMENT floorPlan EMPTY>

<!ATTLIST floorPlan name ID #REQUIRED URL CDATA #REQUIRED >

<!ELEMENT elevation EMPTY>

<!ATTLIST elevation name ID #REQUIRED URL CDATA #REQUIRED >

<!ELEMENT crossSection EMPTY>

<!ATTLIST crossSection name ID #REQUIRED URL CDATA #REQUIRED >

Save that as drawings4.dtd. You'll find that if you save projectDrawings3.xml as projectDrawings4.xml and change the DTD reference to drawings4.dtd, projectDrawings4.xml is just as valid as the unmodified projectDrawings3.xml.

Next time, we'll test the uniqueness requirement.

--------------------------------------------------------------------------------

REQUIEM FOR SOME ATTRIBUTES

We've defined empty attributes, but they're truly empty. They're valid XML, but they contain no information. What we need is a way for each instance of each of our blind tags to carry some useful information. Since we want to create lists of drawings, each element should carry a couple of things:

The name of the drawing A URL where the drawing may be found For the purposes of this exercise, the URL won't refer to anything real; it will just be a placeholder.

Put into XML terms, we want to require each instance of each of our elements to have exactly one name--which should be unique--and exactly one URL, which also should be unique.

Next time: Modifying the DTD to include attributes.

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 1 OF 5

If you've ever written code to create a graphical user interface (GUI), you know things can get hairy--and fast. For example, every time you want to change a label or add a button, you have to re-code, re-compile, and re-distribute the whole ball of wax. Bean Markup Language (BML) is a free toolkit from IBM for creating, configuring, and connecting Java classes using XML. It comes with two applications: a compiler and an interpreter. The compiler generates Java code based on what you've defined in a BML file. The interpreter reads the BML and creates the GUI at runtime.

The next few tips will discuss BML, so if you want to follow along, download it here:

http://www.alphaworks.ibm.com/tech/bml

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 2 OF 5

There is no better way to get started with any new programming paradigm than with the standard "Hello World" application. So let's jump right in with a sample "Hello World" in BML:

<?xml version="1.0"?> <bean class="java.awt.Panel"> <add> <bean class="java.awt.Label"> <property name="text" value="XML Tips Rule!!"/> </bean> </add> </bean>

If you have installed the BML toolkit, you can run the application by opening a command window and typing

java demos.drivers.PlayerDriver helloxml.bml

where helloxml.bml is the file containing your BML markup. Also, you will have to add the BML root directory (bml-root), {bml-root}/lib/xml4j_2_0_11.jar, and {bml-root}/lib/bmlall.jar to the java classpath. This sample demonstrates a few basic BML concepts: adding a bean to the application (java.awt.Panel, java.awt.Label) and setting bean properties (text).

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 3 OF 5

I hope I've peaked your interest with our look at BML. Today's tip describes each of the ten tags defined in the BML Document Type Definition (DTD).

These tags are used for creating and connecting beans:

<bean> Create a new bean or look one up. <args> Pass arguments to the constructor of a bean. <add> Add a bean to another bean (like adding a label inside a panel). <string> Create an instance of the java.lang.String class. These tags allow you to set properties on a bean:

<property> Set or get a bean property. <field> Set or get a bean field. Here are the rest:

<event-binding> Create an event connection from one bean to another. <cast> Convert bean reference to another type. <script> Embed a script. <call-method> Call a method on a bean. If you like reading DTDs in your spare time (and really, who doesn't?), take a look at the bml.dtd file in the doc directory where you installed the BML toolkit.

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 4 OF 5

All applications need a way to handle events--like when a button is pushed or text is entered in an edit field. BML handles events using the <event-binding> tag. It supports several kinds of event handling, including the standard java event listener pattern. The <event-binding> tag takes the form

<event-binding name="event-set-name"> <bean .. /> </event-binding>

where the name attribute specifies the event to bind and the nested bean is a class that implements the Listener interface. As usual, this is easier to see with a sample. Below is a BML file that creates two classes: an ActionListener class (myEventHandler) and a button. The button will generate action events when it is pressed. Notice that the first <add> tag creates an instance of the class myEventHandler and binds it to the id myHandler. The <bean> tag for the button contains an <event-binding> tag that references the handler bean by its id.

<?xml version="1.0"?> <bean class="java.awt.Panel"> <add> <bean class="myEventHandler" id="myHandler"/> </add> <add> <bean class="java.awt.Button"> <event-binding name="action"> <bean source="myHandler"/> </event-binding> </bean> </add> </bean>

Here, then, is myEventHandler.java:

public class myEventHandler extends Component implements ActionListener { public void actionPerformed (ActionEvent a) { System.out.println("howdy"); } }

BML also supports a couple other flavors of <event-binding>, which include calling a method of a class or calling a JavaScript function within the BML file.

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 5 OF 5

I'm not convinced BML is ready for primetime, mission-critical deployment of applications. However, I'm not convinced it isn't, either. It certainly presents an alternative, dynamic approach to creating applications. It also hints at some of the possible uses for XML outside the usual data definition and Web-page worlds. BML starts to get more useful as you embed it in your own applications. PlayerDriver, which comes with the toolkit, is a good starting point for learning how to do that. Overall, I think the BML toolkit is easy to use, has great documentation, and is just plain neat.

--------------------------------------------------------------------------------

PUBLIC EXTERNAL SUBSETS

If the PUBLIC specifier is used in a DOCTYPE declaration, things can get a bit nebulous. Using PUBLIC means the URI points to a "well-known" DTD. It gives the XML processor the opportunity to locate the DTD using its own algorithms. It might have a local copy it can use, or possibly retrieve it from a database. The key point is that the identifier does not specify a particular file, but instead a "well-known" name for the DTD. In addition, the means of finding the DTD is left to the XML processor. Here is an example:

<!DOCTYPE Chemical PUBLIC "global/Chemical"> -- this is not real, it is only a sample

--------------------------------------------------------------------------------

PUBLIC AND SYSTEM SUBSETS

You can use both PUBLIC and SYSTEM specifiers in a DOCTYPE declaration. This lets the XML processor try to look up the PUBLIC identifier (if it can), while still having the option of loading it from a file. Here is an example:

<!DOCTYPE AddressBook PUBLIC "mycompany/global/Address" "http://www.myserver.com/xml/address.dtd">

Note that the SYSTEM keyword is left out.

--------------------------------------------------------------------------------

PROCESSING INSTRUCTIONS

The creators of the XML specifications went to great lengths to create an open standard that is not dependent on any particular application, operating system, or platform. (This, obviously, is a good thing!) However, in the real world there are times when a little hint to the application that processes your XML document can pay huge dividends. The XML specification defines processing instructions (PIs) to accommodate this need. PIs take the following form:

<? target ...instruction... ?>

where target is the name of the target application, and ...instruction... is the directive to the targeted application. Naturally, <? and ?> are the PI delimiters.

Here is an example of a PI you've likely been using without realizing it was a PI:

<?xml version="1.0" encoding="UTF-8" standalone="y" ?>

This PI tells any XML processor the version, encoding, and standalone status of the XML document.

--------------------------------------------------------------------------------

PREDEFINED ENTITIES

XML predefines entities for character references that correspond to markup. This is useful if you need to include a character in your content that the XML processor would otherwise look at as markup. The predefined entities are:

&lt; for (<) &apos; for (') &amp; for (&) &quot; for (") &gt; (>)

For example, the following XML is not valid because the processor would try to interpret the less-than (<) character as markup:

<formula>x < y = 8</formula>

Instead, you should use the predefined entity for the less-than character (&lt;):

<formula>x &lt; y = 8</formula>

--------------------------------------------------------------------------------

PCDATA

PCDATA stands for Parsed Character Data, which is used as a content model for defining an element in a DTD. For example, the following defines an element named Description, which can contain PCDATA:

<!ELEMENT Description (#PCDATA)>

An occurrence in an XML document would look like this:

<Description> this is a description of something </Description>

The Parsed in PCDATA means that an XML parser will read the content, looking for mark-up characters like < and &. The counterpart to PCDATA is a CDATA section, which is not parsed for content.

--------------------------------------------------------------------------------

PARSING XML

XML files are great, but by themselves they don't do a whole lot. To put XML files to use, developers typically write software programs using an XML parser. The parser provides an API to process the elements of an XML document.

Here are some pointers to Java XML parsers:

http://xml.apache.org/xerces-j/ http://www.alphaworks.ibm.com/tech/xml4j/ http://java.sun.com/xml/download.html http://www.jclark.com/xml/xp/index.html http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp

--------------------------------------------------------------------------------

PARSERS, PARDNER

XML is pretty neat, but you're not going to get very far without a parser. Here are a few of the most popular ones:

XML4J - IBM's Java parser http://www.alphaworks.ibm.com/tech/xml4j

XML4C - IBM's C++ parser http://www.alphaWorks.ibm.com/tech/xml4c

MSXML - Microsoft Parser, works with Visual Basic and Visual C++ http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp

JAXP - Sun's Java parser http://java.sun.com/xml/download.html

--------------------------------------------------------------------------------

NAMESPACES

Namespaces are used to ensure uniqueness among element names. While it's not a requirement to use namespaces, it is highly recommended. For instance, if you were creating a DTD for your business data that contained an element named <Customer> and you wanted to exchange information with a partner company that also uses the element name <Customer>, you could potentially run into some serious tag-name clashing. Namespaces were created to solve this problem. We'll dive more into namespaces soon.

--------------------------------------------------------------------------------

MSXML PREVIEW RELEASE

Microsoft has made available a preview release of its XML parser, MSXML. It's an update to the product release of March 2000. This release now has support for the SAX2 API and XSLT/XPath.

Here are the links:

http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp http://msdn.microsoft.com/workshop/xml/articles/sax2jumpstart.asp

--------------------------------------------------------------------------------

MS AND XML

You may not know it, but you most likely already have an XML parser on your PC. Microsoft includes a parser with Internet Explorer 5--in the form of a standard COM object implemented in a file named msxml.dll. If you search your system and find that file, you've got yourself a parser! You can use it with any Active-X/COM capable development environment like Visual Basic and ASP.

--------------------------------------------------------------------------------

MORE WML

Wireless Markup Language (WML) is an XML-based markup language used to create applications for use with WAP-enabled devices. The WAP specification borrows heavily from the existing request-and-reply model used by the Internet today. A WAP device makes a request to a WAP gateway for a WML file, much like a Web browser makes a request to a Web server for an HTML file. A browser on the WAP device (called a micro-browser) renders the page according to the contents of the WML file. WML performs the same function as HTML; it contains XML tags that define what the page looks like. Here is Hello in WML:

<wml> <card> <p> Hello </p> </card> </wml>

--------------------------------------------------------------------------------

MORE ON ENCODING

XML by default supports Unicode characters. Unicode is the evolution of ASCII to include support for all spoken languages. ASCII only uses 7 bits--which is fine for English--but doesn't come close to providing enough range for languages like Chinese. For that reason, Unicode uses 16 bits per character. An XML processor is required to support UTF-8 and UTF-16 character encoding. (Obviously, UTF-8 uses 8 bits and UTF-16 uses 16 bits.) If you're using UTF-8 or UTF-16, you can leave off the encoding attribute. If you're using another character encoding, you must specify it using the encoding attribute.

Here are some examples:

<?xml version="1.0">, must use either UTF-8 or UTF-16.

<?xml version="1.0" encoding="ISO-8859-1"?>, uses Latin-1 encoding, the Microsoft Windows default character set.

<?xml version="1.0" encoding="UTF-8">, uses 8 bits per character.

--------------------------------------------------------------------------------

MEET THE FUNCTOIDS

In Microsoft BizTalk Server (BTS), it's possible to take multiple fields from a source document and manipulate them before putting the results into single fields in the destination document. Such manipulations might include arithmetic operations, string manipulations, or anything else you care to write in ECMA-262 script (which is basically a standardized version of JavaScript). The transformation rules are essentially Extensible Stylesheet Language (XSL) transformations, but BTS calls them--gloriously--functoids. Let's hope this term makes it beyond the beta stage and becomes part of the lingo.

--------------------------------------------------------------------------------

LINKING TO EXTERNAL FILES

In XHTML, you're generally bound to refer to scripts and stylesheets as files that are independent of your XHTML documents. The syntax for doing this is exactly the same as that in standard HTML (except in standard HTML, imports usually are a matter of convenience more than anything else). The syntax looks like this:

<script type="text/javascript" src="/libraries/math.js"> </script>

That line allows the file in which it appears to refer to functions in the file math.js, which resides in the libraries folder. The file contains only JavaScript code--no HTML or XHTML at all.

David Wall, based near Washington, D.C., works as a writer, lecturer, and consultant. You'll find example code at http://www.davidwall.com/xml. You can contact David at xml_dave@davidwall.com.

--------------------------------------------------------------------------------

LET XML SCHEMAS DO THE WORK

XML schemas attempt to overcome two basic shortcomings of DTDs: data typing and a complex syntax. DTDs use a unique, often confusing syntax for describing the content constraints of an XML document type. XML schemas are written using XML, so you don't have to learn a new language. In addition, since schemas are written with XML you can parse them using standard XML tools to obtain the metadata for the schema.

XML schemas also provide a way to apply data typing to element content. With a DTD there is no way to specify a datatype for elements. For example, neither of the tags

<salary>1000</salary> <salary>hi</salary>

would be vetoed by a validating parser using a DTD. With an XML schema, you could specify that <salary> can contain only integer values and let the parser do the data validation for you.

--------------------------------------------------------------------------------

JARGON

Sometimes the jargon associated with a technology can be overwhelming. XML is definitely no exception. Here are a few commonly used XML terms dealing with document structure:

Root element: Every XML document must have a root element. It must always be the first element, and all other elements are sub-elements of the root element. Document entity: The term document entity comes into use when you're dealing with XML that is not stored in a file. It is often the case that an XML document is transmitted over the Internet between cooperating software programs. The term document entity is used to describe the entire XML document. We typically associate this with an XML file. When dealing with byte streams, it is simply the chunk that makes up each XML document. Child element: A child element is a sub-set of another element. If the child element does not have any children, it is called a leaf element. Parent element: A parent element is an element that contains other elements (child elements).

--------------------------------------------------------------------------------

IS BIZTALK REALLY OPEN: THE BIG QUESTION

Because it's so closely integrated with specific applications (and not a little bit because BTS is a Microsoft product), it's fair to ask whether the BizTalk concept is really reliant on open XML standards or merely paying lip service to them as Microsoft marches off to create yet another market-dominating, standards-stifling product. At this early stage, it seems as though the company has approached XML quite well, by making its products read and write standards-compliant XML code. The company simply has built strong XML tools into its commercial server software and seems to be hoping to attract people to Windows (and Windows network information services) on the strength of that software. Let's beware of any effort to extend XML, though--that happened with HTML and JavaScript, and the results were some products that, while nifty in their own right, damaged the community. Here's hoping that Microsoft sticks with standards-compliant XML tools.

--------------------------------------------------------------------------------

INTRODUCING XHTML

In an XML universe that's nothing if not loaded with specifications and sequences of letters to describe them, Extensible Hypertext Markup Language (XHTML) sounds promising. This emerging standard (they're all emerging standards, aren't they?) promises to take HTML, the Web markup language that's already well established, and add to it some of the benefits of XML, which isn't nearly as widely understood or supported.

You can read the XHTML 1.0 specification at the World Wide Web Consortium's Web site:

http://www.w3.org/TR/xhtml1/

--------------------------------------------------------------------------------

INTRODUCING DTD--PART 1 OF 3

DTD stands for Document Type Definition. It refers to a formatted ASCII file that defines what tags, attributes, and tag relationships are allowable for a class of XML documents. DTDs are used in conjunction with a validating parser to ensure that XML documents are valid.

Remember that a valid XML document, in addition to being well formed, adheres to the language semantics specified by a DTD. Over the next few tips, we'll explain how to create and use a DTD. Note that DTDs will most likely be supplanted by XML Schemas (we'll get into these later), but for now they are still in widespread use and supported by many tools.

--------------------------------------------------------------------------------

INTRODUCING DTD--PART 2 OF 3

One way to specify a DTD is to include it right in the text of the XML file. The following is a sample XML file that contains a DTD:

<?xml version="1.0"?>

<!DOCTYPE Customer [ <!ELEMENT Customer (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ]>

<Customer> <First>John</First> <Last>Doe</Last> </Customer>

If you're not up on DTDs, we'll cover more on the syntax later. The point here is that the DTD can be contained in the text along with the XML.

Given the above XML file, a validating parser will throw an exception if the XML does not adhere to the constraints of the DTD. A non-validating parser will ignore the DTD.

--------------------------------------------------------------------------------

INTRODUCING DTD--PART 3 OF 3

The most common way to specify a DTD for an XML document is to include a DOCTYPE element with an external reference. For example, using the following XML file:

<?xml version="1.0"?>

<!DOCTYPE Customer SYSTEM "customer.dtd">

<Customer> <First>John</First> <Last>Doe</Last> </Customer>

the file customer.dtd would contain:

<!ELEMENT Customer (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)>

In XML-speak, the DTD file is called an external-subset of the full DTD for the XML file.

--------------------------------------------------------------------------------

INTERNAL SUBSETS TAKE PRECEDENCE

Internal subsets always take precedence over an external subset. This is particularly handy if you want to make use of an existing DTD but need to tweak it a little to suit your needs. Here is a sample XML file using an internal and external DTD where the internal subset overrides the value of an entity defined in the external subset:

<?xml version="1.0"?> <!DOCTYPE Sample SYSTEM "company.dtd" [ <!ELEMENT Sample (copyNotice)> <!ELEMENT copyNotice (#PCDATA)> <!ENTITY copy "my copyright notice"> ]>

<Sample> <copyNotice> © </copyNotice> </Sample>

where company.dtd is a separate file and could contain the following:

<!ENTITY copy "Company Wide Copyright Notice">

In this example, the internal subset defines an entity copy that replaces the externally defined entity of the same name. So the parsed XML would come back as

<Sample> <copyNotice> my copyright notice </copyNotice> </Sample>

--------------------------------------------------------------------------------

INTERNAL AND EXTERNAL SUBSETS

When talking about document type definitions (DTDs) and the associated Document Type Declaration (<!DOCTYPE>), you will often hear the terms internal and external subset. Internal subset refers to a document type definition that is declared inside the XML file. Here is an example:

<?xml version="1.0"?> <!DOCTYPE Customer [ <!ELEMENT Customer (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ]>

<Customer> <First>Mean Joe</First> <Last >Greene</Last> </Customer>

External subsets are used when the DTD is contained in an external file, and the DOCTYPE declaration refers to it. Here is the previous example using an external subset:

<?xml version="1.0"?> <!DOCTYPE Customer SYSTEM "customer.dtd">

<Customer> <First>Mean Joe</First> <Last >Greene</Last> </Customer>

where customer.dtd is a separate file and would contain the following:

<!ELEMENT Customer (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)>

--------------------------------------------------------------------------------

GRASPING THE BIZTALK SPECIFICATION

Professional computer programmers spend much of their time making systems interoperate. That is, they write parsers, filters, converters, and adapters that take the output of one program and render it readable by another. The idea is that the business that uses the two pieces of software is made more efficient by having an automatic translator that sits between them.

The problem is, custom programming work is very expensive. Hiring a team of programmers to integrate your applications can do much to offset the financial gain that comes from integration. A more generic solution is needed, but the generic solution can be no less robust or reliable than custom work.

This is the purpose of BizTalk, a data-interchange scheme developed by Microsoft and others. Based on XML standards, BizTalk is meant to facilitate data interchange among applications--including applications running on separate companies' computers. BizTalk, therefore, is a tool for (among other things) buying and selling across networks. It's been in the works for more than a year now, and it's getting to be mature enough to reward experimentation.

--------------------------------------------------------------------------------

GO AGAIN!

The previous tip showed you how to navigate between cards in a deck using the <go> tag. Why stop there? You can also use the <go> tag to navigate to other WML decks. Just as in HTML, you can use the href attribute to specify a WML deck:

<go href="http://someserver.com/wml/deck1.wml" />

You can take this one step further and generate dynamic WML pages by specifying a server-side script (CGI, servlets, ASP) instead of a WML file. Your script is responsible for generating valid WML in response to the request. Here is a sample:

<go href="books.cgi" method="get" > <postfield name="author" value="Jordan" /> </go>

Note the use of the <postfield> tag to pass values to the script.

--------------------------------------------------------------------------------

FIXED ATTRIBUTES

When I first saw FIXED attributes, I was a little confused. If an attribute always has a fixed value, why in the world would you bother to specify it in a DTD? The answer is quite simple: Fixed attributes allow you to have a default attribute value that you don't have to specify all the time. This makes your XML leaner and cleaner. For example:

<!ELEMENT House EMPTY> <!ATTLIST House type CDATA #FIXED "ranch">

When the parser encounters <House>, it's the same as <House type="ranch"/>.

--------------------------------------------------------------------------------

FINDING THE XML IN BIZTALK SERVER

We've been talking about pipelines under Microsoft BizTalk Server (BTS). Pipelines establish paths and rules by which data can go from one company's server to the server of another company, completing a business transaction in the process. So, where's the XML? XML markup comes into play as the data fields are transmitted from the source to the destination. While in transit, the data are in a sort of message format, not unlike an email message. Individual pieces of data (quantity, description, delivery date, and so on) are tagged in XML. XML document type definitions (DTDs) define the markup, and it's possible to translate (via mappings you specify) the source XML format into the destination XML format. It's also possible to draw DTDs and other markup specifications from those stored at biztalk.org and other library sites.

--------------------------------------------------------------------------------

ENTITIES

There are two types of entities: general and parameter. General entities are what you normally think of as an entity in XML, and are typically used as replacement text in the content of a document. Parameter entities are used strictly within a DTD, also as replacement text.

General entities are declared like this:

<!ENTITY copyright "MyCompany.com, Inc., 1999">

and are referenced using an ampersand (&) and semicolon (;) as delimiters:

&copyright;

Parameter entities are declared like this:

<!ENTITY % peopleAttrib "name CDATA #IMPLIED age CDATA #IMPLIED weight CDATA #IMPLIED>

and are referenced using a percent (%) and semicolon (;) as delimiters:

%peopleAttrib;

Next time we'll see a common use for parameter entities.

--------------------------------------------------------------------------------

ENCODINGS

Have you ever seen the following and wondered what the heck encoding is and what that funny looking value is?

<?xml version="1.0" encoding="ISO-8859-1"?>

If you're like me, you probably pay little attention to the encoding attribute... and most of the time that's fine. The encoding attribute is used to tell an XML processor what standard the characters in the XML document are encoded with. An encoding standard is simply a specification of how a character is represented in bits--typically how many bits and what character each possible value represents. For instance, ASCII defines 7-bit characters, where the value 97 is the letter a, the value 98 is the letter b, and so on.

In general, most people ignore the encoding attribute. For those of you who have to deal with alternative character sets, I'll provide a few more tips over the next weeks.

--------------------------------------------------------------------------------

EMPTY ELEMENTS IN XML DOCUMENTS

In our previous tip, we used this DTD to define three empty elements for use in a list of architectural drawings:

<!ELEMENT floorPlan EMPTY>

<!ELEMENT elevation EMPTY>

<!ELEMENT crossSection EMPTY>

Now let's look at the syntax for using these elements in an XML document. Pay close attention--this is one of those situations in which HTML knowledge will cause trouble for you.

In XML, empty elements open with the < character as usual but close with a /> sequence. Therefore, an empty element called isNATOMember, which we might have used in the list of countries we worked with in a previous tip, would appear in an XML document like this:

<isNATOMember/>

That's the general syntax. Here's a document that uses our empty elements:

<?xml version="1.0"?>

<!DOCTYPE drawingList SYSTEM "drawings.dtd">

<floorPlan/>

<elevation/>

<crossSection/>

Save that as projectDrawings.xml. Though you might have predicted that your browser wouldn't display any sort of formatted text (after all, there's no stylesheet defined here), you might not have anticipated the error that appears. Next time, we'll see why we get that error.

--------------------------------------------------------------------------------

EMPTY ELEMENTS IN A DTD

In our previous tip, we decided to create a DTD that includes definitions of three empty elements: one for floor plans, one for elevations, and one for cross-sections. This DTD would be used to generate lists of drawings associated with a building project.

According to XML syntax, the key to defining an empty element, logically enough, is the keyword EMPTY. Use EMPTY in an element definition, and you've defined an element that requires no closing tag.

The DTD looks like this:

<!ELEMENT floorPlan EMPTY>

<!ELEMENT elevation EMPTY>

<!ELEMENT crossSection EMPTY>

Save that as drawings.dtd.

This DTD allows us to use empty elements called floorPlan, elevation, and crossSection in an XML document. In our next tip, you'll see the syntax to use in the XML document itself.

--------------------------------------------------------------------------------

ELEMENTS OR ATTRIBUTES

Here are a few factors to consider when trying to decide whether to use elements or attributes:

Elements can contain nested elements and content; attributes can contain only content. Obviously, if there's any chance you may need to add nested structure to a data container, elements are the way to go. On the other hand, attributes provide more options for constraining the type of data in the container, and they can contain default values. The ability to limit the possible values and provide default data lets the parser do some of the work for you.

--------------------------------------------------------------------------------

ELEMENT CONTENT

Today's tip describes your options for content type within an element. An element's content type is defined in a DTD using the ELEMENT specifier:

<!ELEMENT ElementName ( .. content type .. )>

The options for content type are element-content, mixed-content, character-content, and empty-content. Element-content consists of nested elements only. Mixed-content can contain elements and character data. Character-content can contain character data only. Empty-content is self-explanatory. Here are samples of each:

<!ELEMENT Name (a,b,c)> - element content <!ELEMENT Name (a | #PCDATA)*> - mixed content <!ELEMENT Name (#PCDATA)> - character content <!ELEMENT Name (#EMPTY)> - empty content

--------------------------------------------------------------------------------

ELEMENT ATTRIBUTES

Attributes are used to associate name-value pairs with an element. You declare them in a DTD, where they can appear only in a start element tag or an empty element tag. Here is how they're declared:

<!ATTLIST 'name' 'attribute definitions' >

where 'name' is the name of the element and 'attribute definitions' is a list of attribute definitions for that element. Attribute definitions have a name, type, and default specifier.

Here's an example:

<!ELEMENT Lights (#PCDATA)> <!ATTLIST Lights state (on|off) "on">

The XML would look like

<Lights state="off"> light sample </Lights>

More on attributes later!

--------------------------------------------------------------------------------

ELECTRONIC BUSINESS XML (EBXML)

The Electronic Business XML (ebXML) is an initiative aimed at developing a technical framework that will enable XML to be used for electronic exchange of business data. The United Nations body for Trade Facilitation and Electronic Business (UN/CEFACT) and the Organization for the Advancement of Structured Information Standards (OASIS) have joined forces to form ebXML. A major aspect of ebXML is the participation of industry leaders in working groups with the intent of generating DTDs and schemas for e-commerce using XML.

If you're interested in e-commerce, ebXML is something to keep an eye on.

Electronic Business XML (ebXML) http://www.ebxml.org/

--------------------------------------------------------------------------------

DTD ELEMENTS--STANDARD AND NESTED

You declare an element in a DTD using the following format:

<!ELEMENT 'name' ('content-spec')>

where 'name' is the name of the tag and 'content-spec' defines what the tag can contain.

There are two basic element types that make up most XML DTDs (and which will be discussed in the next few tips). The first looks like this:

<!ELEMENT 'name' (#PCDATA)>

and is the standard XML tag that contains text data. For example, this:

<!ELEMENT Address (#PCDATA)>

would define a tag that looks like this:

<Address>any text here</Address>

The other type of element consists of a series of nested elements and looks like this:

<!ELEMENT 'name' ('name','name',etc.)>

In this case, the DTD might look like this:

<!ELEMENT Name (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)>

and the resulting XML would look like this:

<Name> <First>John</First> <Last>Doe</Last> </Name>

You can accomplish a lot with these two simple types of elements. PCDATA allows you to define nodes that contain data, and nested elements enable you to provide structure. Of course, XML offers many more options for declaring elements, and we'll dive into more of those in future tips.

--------------------------------------------------------------------------------

DTD ELEMENTS: THE ROOT ELEMENT

Every DTD must define exactly one root element in which all other elements are contained. When using the DOCTYPE specifier to include a DTD, the DOCTYPE name has to match the root element name.

For example:

<?xml version='1.0'?> <!DOCTYPE Order [ <!ELEMENT Order (Customer,Item)> <!ELEMENT Customer (#PCDATA)> <!ELEMENT Item (#PCDATA)> ]>

<Order> <Customer>John Doe</Customer> <Item>Palm V</Item> </Order>

Note that the name following DOCTYPE and the name of the root element are both Order.

--------------------------------------------------------------------------------

DTD ELEMENTS: MORE MODIFIERS

XML provides two modifiers you can use to dictate ordering of nested elements.

The comma (,) specifies sequential ordering. For example:

<!ELEMENT Name (First,Middle,Last)>

Name must contain a First, Middle, and Last element, in that order.

<Name> <First>John</First> <Middle>C</Middle> <Last>Doe</Doe> </Name>

The pipe (|) is called a choice modifier. It allows you to specify a list of choices. For example:

<!ELEMENT Sport (Football | Baseball | Basketball)>

Sport must contain one (and only one) of the tags in the choice list.

<Sport> <Basketball>Hey</Basketball> </Sport>

--------------------------------------------------------------------------------

DTD ELEMENTS: MODIFIERS

Recall that you can declare an XML element that contains nested elements. The following example creates a Name element that must contain a First and Last element:

<!ELEMENT Name (First,Last)>

XML provides several modifiers for specifying the number of occurrences of nested tags. The plus sign (+) indicates one or more, the asterisk (*) indicates zero or more, and the question mark (?) indicates zero or one.

Here is an example:

<!ELEMENT Name (First+)>

Name must contain one or more First elements, for example:

<Name> <First>Joe</First> <First>Joseph</First> </Name>

If no modifier is added, the nested element must appear exactly one time.

<!ELEMENT Name (First)>

In this case, Name must contain exactly one First element.

--------------------------------------------------------------------------------

DTD ELEMENTS: MIXED-CONTENT ELEMENTS

Mixed-content elements can contain a mixture of PCDATA and one or more other elements. A mixed-content element must start with PCDATA followed by | and a list of other element types. In addition, it must end with the zero or more modifier (*).

Here is an example:

<!ELEMENT item (#PCDATA)> <!ELEMENT items (#PCDATA | item)*>

<items> Here is some text <item>item 1</item> and some more text <item>item 2</item> </items>

--------------------------------------------------------------------------------

DTD ELEMENTS: EMPTY ELEMENTS

As you may know, XML has a shortcut syntax for specifying an empty element:

<Name></Name> can also be expressed as <Name/>

Likewise, you can specify that an element must be empty in the DTD element declaration:

<!ELEMENT myFlag EMPTY> <myFlag>On</myFlag> is invalid!

--------------------------------------------------------------------------------

DTD ELEMENTS: ANY ELEMENTS

The ANY element specifier indicates an element that can contain any other defined element or PCDATA. It's declared by using the ANY keyword as the content specifier:

<!ELEMENT anything ANY>

The following is an example of an XML file using an ANY element:

<?xml version="1.0"?>

<!DOCTYPE Customer [ <!ELEMENT Customer (Name)> <!ELEMENT Name ANY> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> <!ELEMENT Middle (#PCDATA)> ]>

<Customer> <Name> <First>John</First> <Last>Doe</Last> </Name> </Customer>

The Name element can contain any of the other defined elements (in any order), as well as PCDATA.

--------------------------------------------------------------------------------

DTD ELEMENTS: AN EXAMPLE

The past few tips have focused on declaring elements in a DTD. Now, we want to provide some examples using these techniques.

<!ELEMENT Order (Customer, Item*)+>

The Order element must contain a Customer element, followed by one or more Item elements. The plus (+) sign indicates that we could have one or more occurrences of the sequence.

<Order> <Customer> .. </Customer> <Item> .. </Item> <Customer> .. </Customer> <Item> .. </Item> <Item> .. </Item> </Order> <!ELEMENT Food (Egg | Apple | Steak)+>

Food can contain one or more occurrence of Egg, Apple, or Steak, in any order.

<Food> <Steak> .. </Steak> <Egg> .. </Egg> <Egg> .. </Egg> </Food>

<!ELEMENT Document (Title, Author+, (Para | Img)+, Summary?)>

Document must contain a Title, followed by one or more Author tags, followed by one or more (Para or Img) tags, optionally followed by a Summary tag.

<Document> <Author> .. </Author> <Title> .. </Title> <Img> .. </Img> <Para> .. </Para> <Para> .. </Para> </Document>

I'm sure you've guessed by now that the options are endless, but we hope these examples have given you a taste of how you can mix and match element modifiers to define the semantics of your XML documents.

--------------------------------------------------------------------------------

DOM AND SAX

XML parsers process XML documents, making the elements and data available to an application via an API. There are two parsing methodologies used for XML: DOM and SAX. An XML parser supports one or both of these APIs. DOM allows you to read the XML data using API calls to walk a nested tree structure. It's useful if your application is concerned with the structure of the document. SAX is based on callbacks; your application is called as each element is encountered while parsing the document. SAX is great for parsing large documents, since all of the data isn't pulled into memory at one time.

--------------------------------------------------------------------------------

DISPLAYING XML

For a large class of applications, XML is a tool for the electronic exchange of data. In these cases, your XML may never be seen by human eyes. However, XML is also commonly used as an intermediary format for data that may be viewed through many types of interfaces, from Web browser to smart-phone. Typically, the content is stored as XML. When it is requested by a client, it is transformed into the client-specific format (HTML for a browser, for example).

You could, of course, use a homegrown solution to transform the XML--but fortunately, there are two widely accepted technologies to fill the void: Cascading Style Sheets (CSS) and Extensible Style Language (XSL). CSS is a relatively simple technology that was originally created for HTML and has been extended for XML. XSL is more like a scripting language and has become quite popular. It is typically mentioned in conjunction with XSL Transformation Language (XSLT). An XSL file describes the visual layout for an XML file and is typically transformed into the client format using XSLT.

--------------------------------------------------------------------------------

CONDITIONAL SECTIONS--PART 1 OF 2

Most programming languages support the notion of conditionally compiling a section of code into the final executable, based on the presence of a keyword. XML supports a similar concept, known as conditional sections. A conditional section can appear only in the external subset of a DTD.

Here's an example of how you might declare a conditional section:

<![INCLUDE [<!ELEMENT DebugRecord (timestamp,description)> <!ELEMENT timestamp (#PCDATA)> <!ELEMENT description (#PCDATA)> ]]>

The include keyword tells the processor to include this section in the DTD. If IGNORE was used, the section would not be included. In the next tip, I'll show you a good use for conditional sections.

--------------------------------------------------------------------------------

CONDITIONAL SECTIONS--PART 2 OF 2

In our previous tip, we showed you how to optionally include a section in a DTD. Today's tip will give you a concrete example of how this might be useful. It's pretty typical in the real world to have a system in production and still have ongoing application development. The sample code below shows how you can optionally include a DebugRecord in your XML files, which can be very useful during development or for debugging production problems.

Here is the XML file. Note the entity debug and the inclusion of debug.dtd:

<?xml version="1.0?> <!DOCTYPE Customer SYSTEM "debug.dtd" [ <!ELEMENT Customer (First,Last,DebugRecord?)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ...

<!ENTITY % debug "INCLUDE"> ]>

&t;Customer> <First>john</First> <Last>allen</Last> <DebugRecord> <timestamp>88</timestamp> <description>howdy</description> </DebugRecord> </Customer>

and here is debug.dtd:

<![%debug [<!ELEMENT DebugRecord (timestamp,description)> <!ELEMENT timestamp (#PCDATA)> <!ELEMENT description (#PCDATA)> ]]>

By setting the debug entity to INCLUDE, each XML file can optionally include a DebugRecord. Set this entity to IGNORE to leave it out.

--------------------------------------------------------------------------------

COMMON WML ELEMENTS

Today's tip lists some common WML elements and what they do:

<wml> </wml> Root element for all WML decks

<head> </head> Similar to HTML, you can specify optional information about the deck as a whole.

<card> </card> Used to define a card in the deck. Decks can have multiple cards.

<table> </table> Used to define a table, just like HTML.

<setvar> </setvar> Used to set the value of a deck-wide variable. (More on this later.)

<go> </go> Similar to the HTML <a>; specifies a URL to go to based on a user action.

<prev> </prev> Navigate to the previous card.

<input> </input> Creates an edit field for data entry.

These are just a few frequently used elements to give you a flavor of how WML works. I hope I've piqued your curiosity!

--------------------------------------------------------------------------------

COMING TO A THEATER NEAR YOU

That's right--XML is on video. "Introduction to XML" is an executive-level introduction to XML available on VHS tape. If you're looking for nuts and bolts, you'll not get it here--but if you're looking for an excellent introduction to XML and ways to use it to implement systems, this tape is for you. I have to admit, I haven't purchased the video for myself yet--I'm holding out for the DVD version with Dolby Digital Surround Sound.

Introduction to XML (VHS) Director: Bryan L. Bell http://www.amazon.com/exec/obidos/ASIN/0967848806/tipworld

--------------------------------------------------------------------------------

CHARACTER REFERENCE

What do you do if you want to embed in your XML a character that you cannot type on your keyboard? To handle this, XML supports character references. A character reference allows you to specify a number that, when parsed, will be replaced by the equivalent Unicode character. A character reference starts with &#x followed by the hexadecimal character code, or &# followed by the decimal character code. For example, to display a copyright symbol you would use the following:

<copyright>&#169; 2000 My Company , all rights reserved.</copyright>

--------------------------------------------------------------------------------

CDATA

If you need to put into an XML document a chunk of text that's not interpreted as markup or content, the CDATA section is for you. A CDATA section takes the following form:

<![CDATA[ 'your stuff here' ]]>

CDATA sections are basically a convenience for document authors. A common use would be to embed an example of XML that you don't want to be mistaken for markup. For example, using this:

<![CDATA[ <name>Jane</name> ]]>

the <name> tag would not be interpreted by a parser as markup.

--------------------------------------------------------------------------------

BUT IT WON'T RENDER...

You may wonder why we've been ignoring the fact that the documents we've created don't render. When you load them into your browser, all you see is a listing of XML code. We haven't even tried to create a stylesheet.

The point of XML is not to generate pretty documents for publication on a network. XML documents CAN be made to render prettily, but it's almost as if the facilities for doing that are afterthoughts. XML is designed to serve as a container for information, which is to be extracted and used by a computer program.

In the case of the list of drawings we've been working on, a program might look at the list documents, determine which drawings relate to which projects, and present the user with an attractive interface for accessing his or her image files. The real interface-rendering work would be done by the program that read the XML files and extracted information from them.

--------------------------------------------------------------------------------

BOOKS

A few readers have asked me to suggest good XML books, particularly those related to Java. Here are three I recommend:

Java and XML