• « Aspect-Oriented Programming follow-up
    • |
    • Main
    • |
    • XML technologies being used in US Social Security Administration pilot project »
            • December 02, 2004

              Dr. Michael Kay explains the benefits of Schema Aware processors

            • As many of you already know there are two types of processors described in the XSLT 2.0 spec: Basic and Schema Aware. Earlier today Jay Bryant posted a question to XSL-List asking:


              I imagine that this question has come up in the past, but I thought I’d ask anyway: What are the benefits of schema-aware XSL processors?

              I’d like to learn more about the issue in both general terms and as it applies to the specific application I’m working on. I’m consulting with a firm that specializes in data warehousing for the insurance industry. We are using XML for a variety of document-production purposes (started with a dynamically generated data dictionary, now working on dynamically generating online help, etc.), but we are also exploring the xml->database and database->xml potential of XSL. What benefits might we derive from using a schema-aware-processor in such an environment?

              Thanks.

              Jay Bryant Bryant Communication Services


              This is a question that has been asked several times on the list and in fact there is still an open monologue between myself and Dr. Kay in which I have tried to drill down a bit further into the possibilities that exist for Schema Aware processing. I hope to find some time to finish that monologue at some point in the future as I feel it is an important area to focus on. In the mean time in the extended part of this entry I will post the answer that Dr. Kay gave to Jay’s question. As the editor of the XSLT 2.0 specification Dr. Kay obviously has the edge as far as understanding the implications that the Schema Aware processor portion of the spec brings to XSLT developers. His answer is quite lengthy so as mentioned I will paste it into the extended portion of this entry.

              The full text of Dr. Michael Kay’s answer in regards to Jay Bryant’s Schema Aware question is as follows:

              The benefits fall into two categories: robustness and optimization. Optimization is still a theoretical, speculative benefit, so I’ll concentrate on robustness.

              You can argue the case in high-level abstract terms, or with low-level coding examples. Let’s try a bit of each.

              Firstly, stylesheets are written with knowledge of the input and output schema, but at the moment this knowledge is in the programmer’s head and isn’t shared with the compiler. This means that when the programmer makes mistakes, due to incorrect reading of the schema, or perhaps because the schema has changed, the compiler cannot detect them. It’s good software engineering discipline to describe the inputs and outputs of a component (the preconditions and postconditions) and this applies to XSLT as much as anything else. The more complex the schema becomes (and some industrial schemas are very complex indeed) the harder it is for the programmer to keep everything in their head. In addition, it’s very hard to achieve 100% test coverage. Many schemas contain parts that are only rarely used, but if you want to produce a production-quality stylesheet you need confidence that it can handle everything that will be thrown at it.

              At a practical level there’s no doubt that debugging and testing XSLT stylesheets is currently rather difficult, and most of us don’t do it very rigorously. We tend to test a stylesheet on a rather small sample of input documents, and we check the output visually to see if it looks OK, perhaps running a few sample outputs through a schema checker if we’re being conscientious. When we get things wrong it can be very hard to spot where the trouble is, especially if it’s in code written by someone else a while back when the schema was rather different from the way it is now.

              I like to demonstrate this by taking a correct stylesheet and introducing random errors, and showing how without a schema they produce bad output that can be very difficult to spot (in one example I can cite, it meant that out-of-range numbers were not being highlighted as they should have been, and no-one noticed), while if you make the same error with a schema-aware processor, you get an explicit error message telling you exactly what’s wrong.

              If you can define the schema for your input and output documents and make this known to the XSLT processor, this can make a big difference to the development cycle. In practical terms, the biggest benefit I’ve seen is from integrated validation of the result document: if your stylesheet tries to write invalid output, you get an error message pinpointing exactly where the error in your stylesheet is, rather than 300 identical errors from the schema processor telling you that the output is wrong, which you probably knew anyway. At present with Saxon this validation is mainly done at run-time, but more and more of it will be done at compile time, which means that you even get to know about errors in code that hasn’t been executed because you haven’t written the test data for it yet.

              I’ve yet to see such a big impact from using a schema for the input document, but I think the potential is there too. The biggest potential advantage is better reuse of stylesheet code, and better resilience to changes in the input schema, by driving your template rules from schema types rather than lexical patterns. This mainly applies to the kind of complex schemas with hundreds of element types. In such schemas there is usually some kind of type hierarchy, and if it is well-designed then you should be able to get the kind of benefits you see from object-oriented programming, by writing code that’s generic or specific as the need arises.

              I hope that gives you something to think about!

              Michael Kay http://www.saxonica.com/

            • Posted by m.david : December 2, 2004 04:55 PM GMT

            Trackback Pings

            TrackBack URL for this entry:
            http://www.xsltblog.com/xslt-blog-mt/mt-tb.cgi/83

            Comments

            Post a comment




            Remember Me?

            (you may use HTML tags for style)

          • © 2005 :: <XSLT:Blog/> (xsltblog.com) is a product of M. David Peterson and FunctionalX Consulting. See Licensing Info Below.
          • Except where otherwise noted, this sites content and source code is licensed under the Attribution License from Creative Commons.