• « March 2005 | Main | May 2005 »
  • XSLT:Blog[@author = 'M. David Peterson']/Main: April 2005 Archives
              • April 30, 2005

                Not exactly a screen shot... but at least its something more than nothing.

                The time I have had to spend on getting ChannelXML to a half way usable state has been minimal... Can't blame anyone other than myself as its my own rediculous habits of having my finger in 27 pots at any given moment that is the cause of such a vicious clock-cycle... I wouldnt do it if I didnt love it so it makes the chance for any sympathy points even less than it was (which had to be pretty close to zero)...

                I finished this logo last night for the left side panel of the main page and thought I would quickly post it such that at least something that represented progress could be seen. I assure you that I have A LOT more done than this, but its in a state of incompleteness -- too much so to warrant any further sneak-peaks... but hopefully today will be a lot less hectic than yesterday and as such there will be more to show a bit later. One can always hope anyway... :D

                ChannelXML - Left-Panel Logo

                This will hopefully make a lot more sense as to how this fits into the site design when you see the actual site but hopefully it doesnt seem to out of whack/place.

                Posted by m.david at 01:21 AM | Comments (0) | TrackBack

                April 29, 2005

                Wait! Before we get ahead of ourselves, is there possibly another way to look at pipelining?

                Understanding XML: Moving through the Pipeline

                [UPDATE: If you consider the idea that this could quite easily become an "extension specification" to the XSLT 2.0 spec or, in essence, taking this exact proposal and reediting it to take only the proposed p namespaced elements that do not exist in the xsl namespace and explaining the general idea behind how this would/could work I would bet that you could see this proposal reach reccomendation in record time, quite possibly even at or around the same time as XSLT 2.0, XPath 2.0, and XQuery 1.0 get the golden stamp of approval if you consider that its about this time next year that most people have mentally placed the final rec hittin' the bitwaves' for these three beasts of burden.]


                [ORIGINAL POST]
                I've seen both the original announcement and several posts as follow-up, including this one from Kurt. Just to play Devil's Advocate I thought I would propose the following question:

                With as rediculously close as this specification is to XSLT why not just... wait, before you fill in the blanks here I wasn't planning on continuing with "use XSLT" -- instead what I've been thinking about (and am using as we speak as I continue work on the ChannelXML/WWWebTop project -- second iteration of possible additions to the yet-to-be-release LispML project) is actually allowing dynamically created pipelining that is directly embedded in the transformation, enclosing the end-user output in a "sequenced pipeline shell" of sorts that is namespaced (of course! :) and is processed by the specified process for each pipeline sequence contained in the output of the transformation.

                For example...

                Lets say I want to dynamically create a "pre-cache" of the next five posts to my blog such that I can then tell the client upon receiving the http stream to create 5 new DOM objects and store within these the content of the five pre-cached entries(note: in thinking through this particular thought I realized about 10 better ways to invoke this particular example, but I'll stick with it with the disclaimer that this isnt necessarily the perfect use case.) These could very easily be represented by a reference to an atom feed so in reality theres no actual content being sent with the stream, and instead just a namespaced sequence that tells the browser that once the document is loaded to then using XMLHTTP [Given that in this particular case we would be accessing and XML data feed I guess it would be more accurate and appropriate to use the document function via XSLT, loading the result of the transform into the DOM object, now wouldn't it :D] to access a particular atom feed and cache the contents to give the impression of an even faster load time. The actual stream received my look like:


                [Please Note: If reading this from the front page you will probably not see any code in the box below. Please access the actual content page for this post to view the sample]

                ---
                HTTP HEADER STUFF

                <html>
                <head>
                <title>foo</title>
                <script src="pipeline.js" type"javascript"></script>
                <p:preCacheDOM src="nextFive.xml">
                <script type="javascript">
                <p:for-each select="document(parent::*/@src)/atom:feed/atom:entry">

                <!-- Wow! I really need to pay attention when writing sample code that is at least is somewhat accurate! Before this update this read "select="parent::*/@src/atom:feed/atom:entry"" which would obviously be worthless, returning an empty node-set instead of the document we are in desire of. Actually, the way its written at the moment wouldnt really work either unless the embedded stylesheet was run over the top of itself, which is possible but not the in the format it currently exists in. I'll assume you get the point and not worry about the obvious fact that this wouldnt work in it current state... -->

                <p:variable name="id" select="atom:id"/>
                var <p:value-of select="$id"/> = new DOMDocument();
                <p:value-of select="$id"/>.Load(atom:link/@href);
                </p:for-each>
                </script>
                </p:preCacheDOM>
                <script type="javascript">
                processPipeline();
                </script>
                </head>
                <body>
                ...
                </body>
                </html>

                ---
                From a client side browser app this would become even more compelling for dynamicaly generated data that is to be used as part of an XForms application, for example. In reality the "p" namespace could just as easily have been "xsl" and could have been invoked either by client-side XSLT or a call back to the server via XMLHTTP, whichever is appropriate based on the client browser.

                But obviously the client is only a piece of where this concept could be applied. But whether invoked via a client side process or on the server which then continues the pipeline until there are no more "p" namespaced sequences in the output to process, either way its of no great concern. As long as the ability exists to process the embedded pipeline sequence then I cant foresee anything that would keep this concept from simply working.

                The greatest benefit that I can see (beyond the obvious) is that this would allow for a completely dynamic specification that would give room for growth as new processing sequences or pipeline processing applications are developed. It would be easy enough to take an AspectXML approach to this that, in essence, would correctly embed a processing sequence based on the particular platform or transformation engine or ... As long as there exists a definition file that properly maps the correct aspects to the processing sequence in question (or, if left out, the correct logic put into place to allow for a default sequence, or no sequence at all, etc...) then type of system could become really quite powerful and useful...

                Anyway, just a random thought that I thought I would throw out there (with personal backing to the idea that this methodology of XSLT processing (2.0) works and works quite well) just to see what the rest of you might think or might have to add to this. It just seems to me if the effort is going to be made to get a pipelineing specification in place then maybe all possible avenues should be considered before putting in all that work only to find that, in 2-3 years when final recommendation is reached, nobody really cares anymore....

                Posted by m.david at 08:06 PM | Comments (0) | TrackBack

                April 28, 2005

                The Day in Pictures

                saxon.net_watertower.jpg

                Its sad when vandalous acts such as this are given the air time their creators so lust after... CRIMINALS... ALL OF 'EM!!! ;)

                DISCLAIMER: The author of this entry claims no affiliation with whomever it was that could be responsible for such a horrific act as to deface the property of one of our public utility companies such as this... To his credit/defense however he states "I was just trying to speak for the common hacker... he/she in whom never is given a chance to speak for him/her self and instead has his/her words spoken for him/her. Power to the Hack[him|her]s! :D"

                Posted by m.david at 12:13 PM | Comments (0) | TrackBack

                Kurt Cagles speaks to GenerationXML

                Understanding XML: The XML Generation

                I believe that we are looking at another cohort in the process of forming (which I also believe readily marks some good opportunities down the road for investors). I call these people Generation XML (with a nod to my good friend M. David Peterson, who first suggested the term).

                Hey cool! Thanks for the nod Kurt :D

                This is a nice overview that Kurt has brought together thats definitely worth the read. I actually have planned for quite some time to take GenerationXML.com and create an XML development community group blog/e-zine as a place where members of the XML generation can publish their work, communicate with others, etc... But with as many projects as I have going at one time there nevers seems to be the time to build it. I think that the framework that is coming together for ChannelXML (using my UI/Desktop extension project, WWWebTop, as the core UI and framework that it extends from) would fit quite nicely with what I envisioned the GenerationXML project to be so maybe I can kill 10 birds (binarily (sp?) speaking) with 1 stone (unfortunately 1 is just 1 in binary so no "oh look how clever you are :)" points can be gained... actually they probably can't be gained with "10" either... hmmmm, Damn!

                I will add GenerationXML.com to my potential Channel for ChannelXML list as I think it would fit quite nicely into this arena. More on this "Real Soon Now." :)

                Posted by m.david at 11:55 AM | Comments (0) | TrackBack

                April 26, 2005

                For Rent | The top 150px of XSLTBlog

                NOTE: No, I havent given in to the obvious temptation that exists from the potential of adding an extra $20 US Dollars to my bank account ( I wouldnt even know where to start to spend that kind of money! ;) by adding Google ads to top of my site... Tempting yes, but NEVER gonna happen...

                No, this is something much more exciting than allowing Google to rape-and-pillage the content of my site, selling the rights to place ads that I have no control over right along side the content that I do have control over. In essence this is a chance for you and or your company to be given access to the top 150px of this site for the period of 30 days, enough time in which would allow this content, whatever you may choose for it to be, to be seen well over 100,000 times by a base of around 5,000 regular site visitors and 15-18,000 who will stop by every once in a while to gander at what I might be bitching and moaning about on any give day... Interested? Then read on...

                Last week G. Ken Holman made an announcement to XSL-List in which he made a call for community contribution to help close out a majority of the remaining issues with Saxon 6.5.3, the current recommended version of Saxon for XSLT 1.0 production environments. It seems that Ken has been able to negotiate with Dr. Kay an amount that it would take for him to temporarily set things aside and finish out this list of issues, bringing Saxon 6.5.4 into existence which would contain a majority of the fixes necessary to bring this release even closer to near perfect conformance with the 1.0 specification.

                I contacted Ken and asked, beyond financial contribution, what it was he felt I could do to help make this a reality. I quickly received a reply...

                In discovering the actual dollar figure in which would allow Dr. Kay to justify temporarily setting aside work on the next Saxon 8.x release, of which is now funded via a base of customers who have either licensed Saxon SA or have asked for customization work, I was shocked by how little he was asking compared to what I would have expected for the amount of effort required to make this happen. As such I suggested to Ken that I send out a call for a corporate sponsor to simply pay the fee and in return recieve all the praise that would come from a loving community who would forever idolize them, looking to them as a source of inspiration for many, many years to come. I then realized that it just might be possible that a potential corporate sponsor might not buy in to my... hmmm, how should I say... obvious attempt to sweet talk them out of their hard earned money? So instead I decided to pimp out my blog... well, at least the top 150px of it, to the first person who contacts Ken and, upon discovering the amount of the requested fee, agrees to do so, working directly with Ken to make this happen. Upon receiving word back from Ken that things have been taken care of I will then await your contact as to just what it is you want to use this space for. I do reserver the right to reject content that I deem as innappropriate (e.g. slander towards a competitors product, "M. David Peterson is a big phreaking geek", you know that kind of stuff ;) or doesnt seem to serve any real purpose to help promote whatever it is your company is in the business of doing or making. In other words please only do this if you plan to use the space to help promote a product and not as a billboard for a bunch of worthless crap. Theres enough worthless crap on this blog, nobody is gonna wanna have to see any more than they are already forced into reading just to gain access to that occasional nugget of wisdom and insight I pay someone else to black label for me ;)

                So, there you have it. This is literally on a first come, first served basis so even if its just an "I'm interested, tell me more" email you may want to hurry up and send it out such that you can be as close to the first one on Kens list and as such the potential suitor to a 30 day romp with the top 150px of XSLTBlog.

                Ready, set, go!

                Posted by m.david at 03:08 PM | Comments (2) | TrackBack

                April 25, 2005

                Praise for SubversionSharp. Denis Gervalle, and Softec

                I have been meaning to make a post to sing praises of joy for the development and availability of the SubversionSharp project via Denis Gervalle and his company Softec. I had decided to do this last Tuesday as Tuesdays generally tend to be the most active as far as visitors to this blog. But because of the preceding weeks "vacation" this didnt happen as planned. I wanted to ensure the maximum reach possible as I feel this to be an EXTREMELY important project in regards to the development of several of my own projects as well as a multitude of possibilities that exist that can easily be developed using Subversion as the underlying "content/data control engine". As such I decided to wait until today to send this out over the wire. I believe this project becomes all the more important with the current push towards a decentralized web that has been given what I consider to be a GIGANTIC boost via Adam Bosworth's keynote (Thursday morning, last week, MySQL users conferenc)e in which he specified the fact that Google is working towards a decentralized system in which they are no longer housing and serving up the content requested via search queries and instead providing services that (NOTE: From this point forward in this sentence I am assuming the rest as I don't think he specified exactly what services would be provided) will enable this decentralized system to stay connected and up to date with the rest of the decentralized web.

                In fact it was the announcement as to the availability of a summary of this keynote to XML-DEV via Jonathon Robie that prompted my rush to announce my ChannelXML project which, in essence, embraces the ideals of a decentralized system, allowing the ability to gain access to both system and user created/hosted "content channels" which can contain any combination of XML feed references that (generally speaking, although there is no requirement as such) all rally around a specific theme such as what Edd Dumbill and Company has created with planet.xmlhack.com, syndicating the XML feeds from various XML hackers and commentators from around the web and allowing the ability to ping one source to gain updates from many related sources. In a world where there are over 4 million blogs and growing rapidly the benefits that come from someone like Edd who is able to locate quality bloggers on a given topic, in this case XML development, is enormous as it gives the rest of us a head start in locating the content we have interest in from commentators who are in one form or another qualfied to develop this content.

                If my day continues as it has thus far (not one interuption that has caused a loss in focus and subsequently a loss in potential progress) I should have some screen shots of the project to send out for general viewing as well as make available the source repository such that the first group of developers who have have shown direct interest in involvement can access and checkout a copy to begin playing with such that we can begin to develop a community wide effort to create something that in various ways and formats will be useful to our own specific needs while at the same time developing potential modules that might be useful to a much more broad audience. I will make another post as soon as one, the other, or both of the above becomes available.

                In the mean time I want to, once again, throw out a HUGE amount of thanks and praise to Denis and Softec for the development of the SubversionSharp project. I am among a very large base of C# developers who have had interest in a project like this but for various reasons have held off from starting one up due to the horror stories that have been passed around as to the complexity of such a development effort. Apparently Denis either didn't care or didn't know of the potential issues and therefore ignored the naysayers and simply developed the code and released the source for the rest of us to benefit from. I know that I am only one of many who are extremely grateful for his efforts, especially that in which he tackled what seems to be the possible source of the naysayers warnings, that of breaking open the CIL and making handcoded hacks to allow usage of this code base on top of both .NET and Mono CLI implementations. Its that type of effort, to bring a project into fruition regardless of the technical hurdles, that demands a HUGE amount of respect and applause for going the extra 150 miles to force the code base into submission. My hat goes off to you Denis! I think I can speak for many of us when I say thank you for all of your efforts from a community who has been eagerly awaiting this projects arrival. YOU ROCK!!! :D

                I would encourage all of you, even if you don't have an interest is this particular project, to visit Softec and learn more about the projects they have developed and the services they provide. It seems obvious to me that if you are looking for a company who does whatever it takes to complete a project, no matter what might be in the way, then Denis Gervalle and Softec might just as well be that company.

                Cheers Denis :)

                Posted by m.david at 10:03 PM | Comments (0) | TrackBack

                April 22, 2005

                Announcing the ChannelXML project AND yours truly will be hittin' the airwaves :D

                From my recent post to XML-DEV in response to a post from Jonathon Robie in regards to a revalatory announcment made by Adam Bosworth at the MySQL conference earlier today (I believe. Ill make sure). While I amnot surprised that others have realized the same thing I have I am surprised to find out that it is Google who is giving into decentrlization and is preparing there business for just that day.

                Well, here I was thinking I would be able to slip my ChannelXML project under the radar but I dont think that is the case anymore... So, while its not quite ready to be sent out to the early beta testers (I will be looking for more soon and will ask for volunteers via this blog in the not to distant future) I think its probably a good idea to at least get the anouncment out there and begin preperations for a day in the not too distant future that this project will be going live on the WorldWideWeb and there will be no looking back from there. :) My post and its continued follow-up is below...

                [UPDATE: I seemed to somehow miss 6 or so of the key technology explanation paragraphs. I've added them back][EXTENDED-UPDATE: Uh, yeah, they were definitely paragraphs missing from this article. Where they were in the original order of things I have to admit I'm having a tough time attempting to discover. Tell you what, just pretend your reading "Yet Another Random Thought Post That Doesn't Make Any Damn Sense" and nothing should seem out of the ordinary. Cool :) Problem solved ;)]

                Ummmmm... this is really quite [strange||funny||eerie||whotold'em] ... Well, I think I needed a reason to simply get this project up and out the door anyway so no worries... it wont be today but it will be soon -- probably by Monday or Tuesday in early alpha format for access for a subset of develoeprs, many of which are already fully aware that there name is on "the list".

                While I cant say I was ever all that worried, fortunately there are enough people who are aware of my ChannelXML.com project and know the details that I can feel safe not being termed some sort of copycat. You see, beyond the actual content contained within the developed system and the functionality provided to view and search this system the primary focus of this project is to decentralize the servers by never centralizing them in the first place and implementing a grid-like network in which the line between client and server is not just blurred, it flat out doesnt exist. Each node on the system (I will give you and overview of all the technologies going into this project below -- the following are not the only two, I can assure you :) is driven by two primary applications and one application interface. Anyone want to take a guess as to what these might be? :) At the heart of it all is Subversion. The C# interface into Subversion is driven by SubversionSharp, and the #1, all knowing, all powerful, or if nothing else he sure does have his fingered diritied by anyone or anything that comes anywhere near him is Saxon.NET.

                [UPDATE: I missed several paragraphs somehow when I moved this from the email I posted to xml-dev -- yeah, probably a good thing I copped it back down to two paragraphs and move the rest -- well, temporarily missing some of the more key paragraphs containing explanations of what the project is about]

                [This previously missing paragraphs start here]

                If not obvious each and every node on the system will be enabled to both create, delete, and merge together channel content that exist on their server in which they created and defined. Each and every node on the system will also be able to access any other "Channel" on the system, as long as they have credentials to do so. While no final determination has been made it is more than likely that SSH will become the primary identification and access technology used on the project. All Channels can be labeled either public, private, personal, or 'membership required'. There may be more security keywords put into place as needed but I hope to try and keep things as simple as possible. It should be fairly obvious what these keywords mean from a security standpoint.

                While initially the system will lack an extensive amount of application capability this will quickly change as this system continues to develop and push forward into a larger community base when the time feels right to do so. There are no set goals for when things will be ready as I believe that to be done properly and complete the necessary time for development and testing must be allowed. There will be a core set of applets and applications in which will be core members of the ChannelXML domain. Beyond this the only limitations that will be placed upon extended application development is both the capabilities built into the Java and .NET platforms as well as any security restrictions placed by any given domain that is registered as part of this system.

                While the primary content for each of the channels will originally be delivered/accessed using the RSS and ATOM feeds provided by their source I plan to develop a simple "permission slip" system in which a normal non-XHTML based web page owner can give explicit permission to be "tidied" and as such completely queriable using XPath 1.0/2.0, and XQuery. Explicit permisson must be obtained by the sites owner before acessing and then indexing any content. that is non-XHTML or XML compliant While I cant say I know of or have even heard of any numbers I'm confident that a huge majority of the web is delivering non XML or XHTML content to its users. While XSLT 1.0/2.0 could be used for this, unless theres a recognized advantage in using XSLT XPath and XQuery will be the primary technologies used to either update or "change" the channel. While support for a basic URI system will be put into place it will be XPath and XQuery that will be carrying the referencing and access workload as with an

                The ChannelXML project (see below for a legend as to what each licensed top tier domain will be used for.) utilizes 2 primary platforms, 3 primary applications (not including the specialized applications built from the ground up specific to this project) 5 primary development languages, 2 experimental development languages, 1 primary (yet still VERY expirimental) cross-domain-common-language-interface and processing engine, one WS-* wrapper technology who's primary (better said, only) purpose is to provide a common interface to the land of WS-* to ensure responsibility of support and maintenance is left in our own hands as well as to ensure that the specific needs of the blogging communities are handled through an easy to use and understand interace.

                Ok, so what does ChannelXML do? Its pretty simple. And are you ready for this? (pay attention Microsoft as this may bring back some memories) The entire underlying Channel definition, subscription, content update notification, etc.. interface -- or in other words the way you create a content channel, explain to the system what content you want it to contain, where to find that content, what images you want to associate with this channel, and how often you want to syncronize the content for each member of this channel is a technology entitled "CDF" or Channel Definition Format -- which is historically the very first XML-based technology to makes its way to the streets [this tidbit came in conversation I had with Tim Bray last October so I feel safe stating it as fact]

                (.com will be the primary interface, .net the communication channel in which all server members of all domains will, well communicate, .org will be the underlying structure of individuals and corporations who will help keep things in order and on target as well as to allow representation from the various committees and group already in existence (or potentially developed specifically for ChannelXML representation of a specific interest) that ensure that everyone is keeping to the guidlines set for by the various disability acts that guarentee that the rights of its members are met, no one is left behind, nor is anyone forgotten about.

                [and end here]

                Along with this half annoucment/half "I'd better say something now before I lose any and all credibilty for developing this project." :) So, while there is no official start date (still a lot of line and equipment installation and testing to finish) I will be beginning production of an XML and related technology talk show that will eventually be phasing into a daily format that is delivered for distribution through the main ChannelXML.com repository. If you have any sort of impression that I seem to type a lot -- my friend, you have no idea whats about to hit you when you experience how much more I can say when I'm speaking. But dont worry, I wont be the only one. One of the primary objectives of the ChannelXML project will be to open the "airwaves" in both sight and sound and allow for quick, easy, and freely accessible delivery to anyone and everyone who may want to hear, watch, or read what you have to say. ChannelXML.tv will be completey devoted to both the existing cable and broadcast style television but even more so towards bringing the world of colorful sites and sounds from any given source on the system and based on permissions made available to all of those in whom have permission to view it. Obviously this will open the door to a lot of junk but theres a lot of junk on regular television as well and somehow it still survives. None-the-less, every technology and specification that is making attempt to bring some sort of sanity to the quality of content out there will be evaluated and if it seems effective, implemented.

                I plan to continue this thread on XSLTBlog but a few more important points:

                - The main source code for this project will be delivered, for the most part, as open source. That doesnt necessarily mean that there are no or will not be licensing fees involved. With that said there will never be a charge for any use of this system for individuals and non-profit organizations. In fact even the individual employees of a corporation will never find themselves having to pay any fees to gain access to the system. However, I am in no way going to make any requirements that any particular channel that is created on the system must absolutely give access to this content to anyone and everyone who claims they want it. If you are a private organization and want to charge a membership fee for access to content or if your a software development firm and want to charge fees for access to your web services there will be no limitations placed on you to keep you from doing this. However, where there is profit being made from utilizing the resources on the system there will, at least eventually, be fees associated with this. They will be minimal and designed as an easily justifiable way to help keep a system designed for free and easy access to free and easy speach, free to be used by anyone at anytime -- yep, just like the WorldWideWeb.

                With this said please keep in mind that ChannelXML will be set-up (I say set-up in the sense that as time moves forward I expect there will be a continued enrollment of members in whom have interests staked in this project and want to ensure that these interests are being served, etc...) as a non-profit organization in which any fees collected by the main organization will be used for the ongoing development of infrastructure and content as well as the day to day maintenance of a project with potential for such a grand scale. Third-tier ChannelXML domains will always have a fee associated with them unless the requestor is a non-profit and it can be shown that the desired domain does not infer a profiteering type business, get in the way of a company in whom would be using the domain for a profiteering business, nor does it encroach upon the trademarks and copyrights of existing corporations in which through various laws passed over the years could quite easily claim that third tier domains on heavily trafficked sites can easily be construed to belong to another entity all together and therefore are in violation of copyrights, etc... While there has been no official licensing put into place for any of this it is in the works and will fall under the domain OpenUnderstanding.com which will be a site devoted to the creation and protection of a new type of systems and domains licensing program which can entail any number of open and closed source components, royalties for content be redistributed with permission, but for a fee, Talk Shows that play a song that requires a royalty fee to be paid, etc..)

                One other thing to note and then I am back to hacking more code -- ANY channel that utilizes the directory structure under the root of the site will never, ever, EVER be charged any licensing fees. The primary objective of the non-profit org overseeing this project is to create a system in which the mass majority of users can gain the benefits of those who are willing to pay fees to gain greater notice or to license particular rights to distribute good and services related to this project, etc... etc... etc... As far as advertisements go I am currently reserving the right to implement an advertising system on the top portion of the interface that will always interface back into the main system if and only if the need for additional funding arises. I hope that this day never comes and I believe this is achievable if a proper licensing sytem it put into place and the value of justifying paying such licensing fees will be considered a no brainer and is sufficient to keep everything else free of clutter. Keep in mind though I am placing no restrictions upon the advertisements one might place within there subchannel that they make available on the system (e.g. An exisiting webzine might create a channel in which you can easily access quick overviews to the content or if your a subscriber, access to all of the content. While this doesnt make sense until you see what I am refering to everything below the bottom of the ChannelXML separtion bar belongs to whomever creates the channel. There will be requirements that ALL adult related channels must accuarately represent their content via a ratings system and will have the added restriction that private servers must be set up at there own expense to actually host any content. No adult material will be allowed to be "buddy-syncronized" as is allowable for any other type of content on the system. The reason for this is simple. Protection of our children.

                This will be an ongoing post over the next few days...

                Posted by m.david at 06:12 AM | Comments (0) | TrackBack

                April 21, 2005

                Quick Survey: Is anybody else as surprised as I am that my Microsoft/Mozilla investment strategy has not, at least as far as I know, been announced as a go?

                You know what I bet the hold up is? I bet you they hired the same publicists who did such a masterful job of keeping the Adobe/Macromedia merger covered-up airlock tight until the moment it was officially announced. Thats gotta be it because it's such an OBVIOUS winner of an idea I'm surprised I haven't won some sort of award already for coming up with such a brilliant and unflawed plan, layed out in practical and perfect detail. I think the only thing I could have done better is have provided a link to Mozilla's PayPal donation account. Crap! If thats the reason this all falls through I am going to be SOOOOO mad at myself!

                Hmmm... Well, we live and learn don't we? ... live and learn ...

                So, anybody wanna hire me as their "idea" man? Yeah, I didn't think so ;)

                Posted by m.david at 11:07 AM | Comments (0) | TrackBack

                April 19, 2005

                Dear Microsoft and Mozilla -- I have an idea that as crazy as it may sound, just might work really well for everyone involved: You, Us, Everybody, and maybe even a few of their pets

                Hmmm... not sure where the pets comment came from but its too late to care. Heres the thing: back in the '96ish time frame you made this realization that much good would come if you pinched a few pennies from your change jar to help a then fledgling Apple, your long time nemisis, stay in business such that you could avoid -- you know what, while I have my ideas Im not going pretend I have even the slightest clue what the decisive factor was that caused a $150 million dollar investement that sent a shockwave of horror through the land of Mac yet for all intents and purposes was exactly what was needed to turn Apple around just enough to be able to rebuild itself -- and its done quite an amazing job of that... So let me ask you this... Could it be possible that another simple 150 could find its way into the funding account of Mozilla? Wait! Don't click that delete button from your news reader just yet...

                ...hear me out and in ten years from now, like Apple has done with iTunes for Windows, maybe we will find that this one simple gesture has found the good graces and will of this Rising Pheonix left from the Netsape Nemesis Act that so nicely replaced the now repealed Apple Armagedon Act of the late eighties, early nineties fame, building versions of software that actually use the CLI/FCL as the foundation for the platforms with -- wait, by then the last few percentage of platforms not able to run native CLI-based applications should all but be consumed by Mono/Novell and Apache, or, do I dare suggest even by a MS version of a cross-platform CLI? Maybe I shouldnt press my luck and just stick with something that I feel has a bit more staying power.

                So hears the thing... The Apple Abolition Act of '96 was a nice gesture but no one really bought into the idea that this had anything to do with good will and everything to do with saving a billion a year Macintosh applications business, most of which came in the form of Office for the Mac. Now, nearly 10 years later a very happy and healthy Apple continues to drive these types of numbers if not more so if this was the only benefit that came of this then its safe to say that a 150 million dollar investment in '96 has brought in over 10 billion in sales that may have otherwise gone the way of the DoDo, a Next computer more likely to fill the void left by the missing Mac than would a PC, no matter who made the hardware... MacAddicts are not just fans nor even loyalists -- FANATICS would be a much more appropriate term to accurately describe a member of the Mac community -- Fanatics, Plain and Simple. But enough about those phreaks(here's to you Russ -- its all about the love :D) lets get on with this idea so ya'll can find yet one more reason to hate the very damn day you stumbled (actually, I tripped you -- sorry ;) my blog, wishing that you had taken that left turn instead of that right... then none of this would have ever happened (now click your heels two times Dorothy and this will all just go away -- with the help of some benzedrine of course, but away is away, right? :D

                Ok! Heres the deal... by funding Mozilla you would find that you could ease back on IE7 a bit, focusing 100% on fixing the security issues and let the rest of the talent push forward with Longhorn development. While this wouldnt exactly save you any money its not going to cost you any either... but saving money isn't the concern... security is the concern and its the most important piece of the IE upgrade and in many ways would probably be plenty enough to keep your customer base safe and happy until such time as Longhorn can present a more complete and robust solution that can take full advantage of all that XAML has to offer. You had no plans until just a month or two ago to work on another IE version so its obvious that the browser market is not where you feel your energies are best spent. Thats perfect! Because Firefox fills in this void quite nicely. Think of how much better it would be if this dedicated team of Mozilla engineers were finally given a chance to properly fund the areas of the project that without being funded may never have a chance to be properly addresses as their is simply not enough time to do it all... that is unless that time came pre-paid from a caring and devoted Uncle Bill who is only looking out for the best interests of his Windows users... and Im not kidding... by doing this you WOULD be looking out for the best interests of your users and by doing so would probably save yourself enough paying Windows customers who are considering moving to Linux to jusatify the 150 expenditure alone and all by itself.

                But thats just the start. By giving way to Firefox to take over the role of the (un)official Windows web browser you disperse of a business you havent had much interest in anyway and give way for a flood of happy and eager FF developers to begin developing extensions that tie directly into the the CLI and as such give them an instant and much needed boost in the development of robust and useful web-based applications. Please keep in mind that this would mean that this same luxury would happen for other platforms that have CLI-support such as Linux and OSX via Mono and Apache. But how is this a bad thing? You obviously took the standardization route for a reason and I can honestly say that I would call you a bold face liar if you hadn't considered the fact by doing this one simple act you were opening the doors WIDE OPEN for the OSS community to walk on in and begin the obviously HUGE effort to bring a Windows Class Library to a non-Windows kernel. I believe it was one of the most brilliant moves ever made, quite possibly the most brilliant act that Microsoft has ever conceived and followed through with, At some point the day will exist that an application written in C# or COmega or how 'bout even Java.NET ('cmon, we're smarter than that not to know that the lovey-dovey crap going on in front of the cameras means more than just trying to work out your differences -- I cant imagine that in this day and time that there could be a more lethal combination than to be able to take all thats great about Java and all thats great about C# and combine them together to make quite the little powerhouse tool box) by years end there will be at least a beta of a Java-1.5/6-to-CLI compiler and in the mean time we can begin the migration using IKVM -- pretty simple, huh? :)

                Okay, quick review...

                - Give the Mozilla folks 150mill
                - Ensure that, like the Apple deal, this investment does not allow you the ability to define the exact direction of the Mozilla organization, nor Firefox directly.
                - Focus on IE security only
                - Reccommend that for web browsing the Firefox browser is your smartest choice for a standards-based web browser and that IE will continue to be your best choice if desirous to extend your applications from the web browser. We already know of AOL's plan to deliver a product that supports both browsers rendering engines in the same UI so if the idea of a hybrid browser product doesn't worry you then this should be a no brainer.
                - As the release of Longhorn comes closer, open the IE source to a newly formed browser technology committee who would oversee the development of an integrated product that would keep existing IE-based applications running happily (and secure -- not suggesting a global opening of the source... just those who need to see it to bring this project to fruition -- I still believe that when it comes to platforms closed source is a lot harder to "pick" than is open) while at the same time allowing the Mozilla organization to expand the horizons of this hybrid browser to tie in XUL and all of the Mozilla components and make them a part of all the OS has to offer -- existig Moz apps continue to run and can be expanded into new areas via new development using any combination of supported .NET technologies.
                - And the list goes on....

                Come on MS think about giving it and Mozilla, think about accepting it. Nobody loses if something like this were to happen and the development world can stop focusing on what they can't do because its platform specific and instead what can't they do now that we were all strong enough to recognize how much good could become of this and as such mended the fences and began working for the same cause for once... its almost scary how much good would become of this...

                Posted by m.david at 06:06 PM | Comments (0) | TrackBack

                Ummm. that wasnt Kurt Cagle that made those comments AND some excellent comments from Steve Loughran from my Firefox/IE post earlier

                Yeah, so apparently my little networking issues this morning caused all of the sites I have on this server, which includes Kurts UnderstandingXML, to serve up the XML feeds for XSLTBlog given that Apache is set to default to the the directory I have this blog stored in. So what would have seemed like a really strange post from Kurt -- that of praising the power of IE and shunning the religous cult of Firefox -- was in fact a REALLY strange post as it came from me, not him. Hopefully enough people subscribe to both our feeds that this would have been easy to spot but I know there are a lot of people who read what Kurt has to say in the much larger XML universe and have no real interest in XSLT so I fear that there may be some people out there scratching their heads wondering what on earth caused such a shift in Kurts opinions -- generally not quite so pro MS as I tend to be. Sorry 'bout that!!!!

                On a lighter note... Steve Loughran made some excellent comments on that same post and I wanted to quickly bring them to your attention as it does bring some interesting points to light that I hadn't really considered... I will post his comments in the extend portion of this post.

                [via Steve Loughran]

                I like a bit of flame bait :)

                First, what use is power, if it is abused? The two primary routes for drive-by spyware (that is, not the stuff that sneaks in with apps), is ActiveX and security hole exploit. both browsers are weak on the latter -even firefox seems to like a WinXP reboot (well, on that vmware image anyway) after an update. And both apps are prone to security holes, because they are written in buffer overflow languages (how’s that for flame bait). As Mozilla becomes more popular, it will become more of a target for malware and driveby spyware attacks.

                But here is why IE is less secure, today

                -Prompted AX download is still enabled in the internet zone. Unless you know how to adjust zone members and security, you cannot disable that without breaking windows update. Which you need, after all.

                -IE is embedded everywhere. That isnt usually a bad thing, but it means that the attack surface is so broad. I think the mailers (outlook express especially) are trouble here, as they permit direct exploits of security holes. Mind you, thunderbird has the same problem. hmmm.

                -Browser Helper Objects. Somebody thought it was a good idea to let COM components have access to stuff that gets POSTed, even over HTTPS links. Mistake :( . By providing the toeholds for spyware and malware, they provide a source of trouble for end users.

                Regarding ‘power’, how relevant is it? Who cares about “more powerful”. None of the friends and family whose boxes I have had to antispyware; they are grateful to be given a copy of mozilla and told it is more secure.

                There are some things that’d be nice in mozilla, a good HTML editor component, better XML/XSLT handling another. But then IE could benefit from CSS2 -a bit of power for site designers that IE lacks.

                Anyway, its good to have competition again. Would we have popup blocking in IE without Mozilla? I doubt it. Not given that MSN must have made many €€ from popup adds -a bit of conflict of interest there. Would we get IE7? Not a chance. But will IE7 move windows update into its own zone and turn off AX download in Internet Zone? I hope so, but doubt it.

                Posted by m.david at 08:51 AM | Comments (0) | TrackBack

                This 'Firefox is more secure' religous war has got to come to and end

                Its flat out bullshit! My mind is numb at the moment as I have been frantically trying to get caught up with project deliverables and to top it all off I just had to spend the last 3 hours rebuilding my internal network because of a stupid little mistake I had made that caused my router to return to its default settings (which, in essence, makes it think it should be controlling the show instead of my network server.) I finally figured out the problem, fixed it, and then decided my brain needed a break. So I took 10 minutes to catch up on the latest updates in my news reader. Of course the "IE and Firefox are about equally secure" report has had tons of responses, all of which are religous based and generally try to use the fact that because Firefox doesn't natively allow you to run ActiveX controls it is more secure and therefore better. BETTER!!!! What the fuck does less power and capability have to do with better!!!???

                It is true that there was a time when IE did a poor job of managing the access and control it allowed to the underlying system. Unfortunately there are to many little fucks running around who can't keep themselves under control and had to build silly little applets that did silly little things like fuck around with your personal information or this, that, and whatever else. That had to stop. And the way to do this was to close off access to anything unless given direct permission from the user to do so. Ok, no problem... a few extra clicks from the user is not all that much to ask in the name of ensuring that these criminals could no longer have their way with your OS without your explicit permission. Spyware is obviously a HUGE piece of all of this "access" that was given and because of this we will continue to see lingering effects until such time as EVERYONE has taken the steps necessary to get their system cleaned-up via MS's new clean-up and protect tools. Thats not to say that there will never be a problem again but the fact of the matter is that now that MS has done this it can be easily stated that IE and Firefox are about equals when it comes to potential security problems. Why? Because they are both requiring the user to provide explicit access before it is alllowed to do ANYTHING that could potentially endanger the users system. Thats it, end of story. This is the plain and simple fact and theres no justifiable argument that can be made anymore.

                There are about 500 of you heading straight for the comment area to tell me that because FF is OSS that it therefore can be more easily and quickly patched when security holes are found. Why is that? Because every 14 year old script kiddie can look at the source code and go "look, theres the problem, I'll fix it and submit it and become a world renowned hero." Ummm.... sorry, but if you honestly believe that a band of volunteer software developers who lack both experience and in many cases pubic hair are going to be able to develop a bullet proof browser you really need to rethink your strategies. Now please don't take that as a slam against the Mozilla.org folks... thats not who I am refering to at all. I LOVE Mozilla. These guys are INCREDIBLE and should be given every amount of respect they have rightfully earned. But the problem is that Mozilla.org is a finite group of elite developers and as such have limited resources to tap into when it comes to trying to combat against the increased attacks they have been and will continue to receive. So at what point have we reached critical mass -- the point in which those who are fighting against FF have succeeded in finding enough flaws that it is near impossible for even this elite group of developers to keep up with. It will happen and when it does its going to suck! Mozilla.org doesn't deserve that kind of treatment but do you think the people fighting against them give a shit? If you do you are a flat out fool and theres no getting around that. Wake up to reality my friends... this is the real world of software development where there are people fighting against any and every successful venture: OSS, CSS, half Open, half Closed, and whatever other type of development genre that can be conceived.

                Keep this in mind: The fact that anybody can look at the source code does little more that make it A LOT easier to figure out where the flaws are in the first place... Its one of those cyclical paradox situations in which you have a greater chance of finding and fixing flaws while at the same time giving others a chance to find them before you do and exploit them before they are fixed. Has anyone done the metrics on this? I dont know the answer but they sure would be interesting to look at as I believe we would discover that it has nothing to doi with whether something is open or closed and everyting to do with the number of installs it has on a system. It only takes a few milliseconds once your inside a system to get what you want and get out. And you simply cant find and fix a security hole in the same amount of time. The Credit Card has long since been maxed before a problem is even realized (in fact its BECAUSE the credit card has been maxed that will probably be the source of the "enlightenment" that there is a flaw in the first place.)

                So this brings me to what is going to have to be my final point for now as I have to get back to work... The point?

                IE is more powerful than Firefox.

                Give me access to the OS API and I will build you something that is as powerful as the API will allow. Confine me to a restricted list of components that, while lengthy in how many are available, still limits me to what the components are capable of doing and all I can build you is what these components will allow me to build you. There are 10s of 1000s of preexisting components that I can use on top of a Win32-based system. And .NET-enabled components are catching up quick. The components are built by extremely talented developers -- developers just like the ones they have at MS and Mozilla. There is literally nothing that can not be built by scripting together these components to build kick a$$ applications without borders or limits. Does this mean theres a security risk? With power comes risk but also comes capability. Its a balancing act without a doubt but it doesnt change the fact that with more access to the API the more powerful an application I can build. IE has it. Firefox does not. End of story.

                My favorite quote in regards to this area of controversy comes from a presentation that was given at TechEd '97 in Nice, France. I had 3 presentations to give during the 5 day event and so had plenty of time to take in a lot of really good presentations. I cant for the life of me remember who it was that was giving this presentation nor can I remember the exact topic. I had only just walked in when an audience member raised there hand and asked a question regarding the security risks that came from using a Win32-based applications and suggested/asked why not instead migrate towards Java. The answer was quick, simple, and straight to the point:

                "We give you the 'Format' command and expect that you know when and when not to use it."

                Firefox is a nice browser. But lets not let religion blind us to why its a nice browser. Its usable. Its not more powerful, more stable, or in reality more reliable, But they got several usability features spot-on and thats what makes it so cool. Mozilla kicked a$$ on this web browser but thats all it is.... a browser. IE can do anything and everything a Win32 or .NET-based application can be programmed to do. Its no where near as usable. But its a lot more powerful. Does that make it better? It depends... Do you have a need to do more than just browse the web? If no, then probably not. If yes, then yes it does. Oh, and by the way... Do you think that IE7 might take cue from the Mozilla folks and borrow a few of their neat UI ideas in return for Mozilla's use of IE's XMLHTTP ActiveX control (XmlHttpRequest) or the keyboard shortcuts like Alt + D and CTRL + Enter, etc... I'm not sugggesting Mozilla has done anything wrong... not at all! The smartest thing you can do is to take the good pieces (and the things that the user base you are going after are used to) of the competing product and copy them into your own product. I have a feeling MS will probably follow suit and may even add a few things that none of us have even thought. Money buys talent and MS has plenty of both. Be ready. Somethings gonna come from Redmond thats going to make all of us go "GooGoo" and we have the good folks at Mozilla to thank for this. Maybe AOL can ask for another 750 million for all their troubles ;) Isn't that how the system works... If you cant beat 'em, sue 'em?

                Anyway...

                IE an Firefox both have their place in this world. Can we please move on with our lives now?

                Posted by m.david at 02:21 AM | Comments (4) | TrackBack

                April 18, 2005

                No Title - Just a gaping, stunned, and stone-faced stare at what I am reading on my screen

                Adobe to buy Macromedia for $3.4 billion | CNET News.com

                update Desktop publishing specialist Adobe Systems is buying multimedia applications maker Macromedia in a $3.4 billion deal geared toward building a software powerhouse.

                Wow! I have been heads down trying to catch up from missing all of last week but when I took a break to take a peak at the outside WorldWideWeb this caused a realization that a blog posting was in order.

                Too much going through my head on this one to even make the smallest of comments so I will leave it to you to fill in the blanks...

                Posted by m.david at 08:21 AM | Comments (0) | TrackBack

                April 16, 2005

                A Quick Graphic Teaser for my up and coming WWWebTop Project

                Just wanted to showcase the talents of my nephew, Courtland Gustafson, who I have hired to work on all of the graphics, icons, etc... for my up and coming release of the WWWebTop project. Please visit http://betelgeux.deviantart.com for more of his work...

                guise_of_skies_by_Betelgeux.jpg

                Posted by m.david at 01:47 PM | Comments (0) | TrackBack

                April 15, 2005

                I'm back... Lots to catch up on and Dave Pawson's blogging!!!

                My apologies for my sudden and immediatte departure last Friday! I'm back now and playing catch up. But one thing I wanted to quickly point out is that via a post from Norm Walsh it seems that Dave Pawson is blogging!!! Finally! :) It will be nice for all of us to have one of the true experts in the field of XSLT and XSL-FO to turn to for a greater understanding of the language we hate to love... or is it love to hate... Or do we love it because it's... you know what, I'm just going to stop right there. No matter what it is Dave's expert voice will be a pleasant change to my mixture of the Pleasure and Pain that is the world of XSL-related software development.

                Lots more to catch up on which I will do piece by piece over the next couple of days... But in the mean time, Welcome Dave!!! :D

                Posted by m.david at 08:03 PM | Comments (0) | TrackBack

                April 09, 2005

                I will be away until Wednesday night 13th

                I have had a last minute family emergency and will be away until Wednesday evening.I will not have access to email, but on my return I will respond to your emails immediately. I apologize for any inconvenience this may cause any of you.

                Posted by m.david at 02:45 AM | Comments (0) | TrackBack

                April 07, 2005

                Thats it! I'm moving to New Mexico...

                El Defensor Chieftain: Community Calendar

                Introduction to XML, 5:30 p.m. — Speare 116. "XSLT Extensions: An Extended Case Study." Free class.

                In regards to my decision to head southwest if this is the last post I ever make to this blog I'm guessing my Socorro County/Bill Gates/Microsoft/Area 51/Roswell Alien conspiracy theory was right on target... but with classes like this being offered for free I simply have to take the risk. Wish me luck :)

                Adding a bit of interest...

                ...the weekly listing of community events in Socorro County there has been plenty of XML, XSLT, and XPath (both 1.0 and 2.0 of XPath and XSLT) events taking place as well as a near 1-to-1 ratio of XML-related to Python related classes being offered each week. An interesting note can be taken when you notice that as advanced of an Alien race the Alien's of Socorro County must be they seem to neglect any coverage of XQuery as part of their weekly free programming class line-up. Hmmmm... well, maybe thats because 55% of programmers are already using XQuery and the rest... ok, sorry... it was tough to resist the temptation and I like Larry and I have a fondness for Stylus Studio so I'll just shut the hell up and go back to my hacking...

                Still, its interesting to n...

                ok, I'm going!...

                geez, relax their XQuery phreakazoid... I really do like your little macro add-on pack (kind of like the "Plus!" pack for Windows... all the things you didn't realize you even needed and you still remain unsure why you shelled out an extra $50 bucks for it in the first place... Ahhh, the power of the "gotta have it"s -- besides, you get that extra special "Plus!" logo which you have to admit is pretty cool (no you dont, that was most definitely meant to sound sarcastic, if the rest of this isn't ;) to XPath and when you see whats wrapped up inside of my next open source project slated for release sometime Real Soon Now you will further understand that I actually think XQuery is a significant piece of our development and even non-development related future. Just keep your distant from any claims of superiority in Transformations land and you won't find yourself labeled "Sir XQu'mmunicated" come next Real Soon Now when the next set of X-specs make it to the big-time :)

                Have a fantastic X-related day!

                (raise of hands for those who read "X-rated" instead of "X-related" as it should have been read -- Wow! That many?! Geez, maybe you shouldn't spend so much time on the Internet... find a hobby like gardening or something... but then you would probably relate gardening to dirty and then were right back where we started.... :) Maybe you better just stay where you are. At least the rest of us are somewhat safe when you're on the 'Net instead of out prowling the neighborhood, right? Good, I'm glad we had this talk... I feel a lot better about things now... :)

                Posted by m.david at 01:57 AM | Comments (0) | TrackBack

                April 05, 2005

                Saxon 8.4 Now Available - Saxon.NET RC1 based on these bits will follow in the next few days

                In a recent post to Saxon-Help Dr. Michael Kay announced the availability of Saxon 8.4 from the http://saxon.sourceforge.net project page. The announcement in its entirety is in the extended portion of this blog entry. Obviously there is a need to integrate the necessary Saxon.NET changes into this release and run through the testing processing to determine conformance. Hopefully this will mean a release no later than Friday of this week. I will update both here and on http://weblog.saxondotnet.org when it becomes available.

                I have also updated the SVN repository and change the structure to conform more to the standard SVN recommended directory structure. You can view this at http://source.x2x2x.org/svn/x2x2x/saxon.net. Please note that the Saxon.NET-B-8.4-source folder does not contain any of the necessary changes to the source or additional files to build correctly with ikvmc. I plan that as an activity for this evening and tomorrow as at the moment I have some other deliverables that I need to get out ASAP! :D

                Saxon-B 8.4 is available at
                http://sourceforge.net/project/showfiles.php?group_id=29872

                Saxon-SA 8.4 is available at http://www.saxonica.com/ (existing license keys
                should continue to work).

                The main benefit of this release is bug clearance. There are also various
                updates to align with changes in the W3C specs (which are now on last-call,
                and haven't changed much).

                The more adventurous among you might like to experiment with the new "lazy
                construction" mode. This delays construction of temporary trees until the
                contents are actually needed, and means that under some circumstances they
                don't need to be constructed at all. This is switched off by default as (a)
                I think it needs more exposure before it becomes fully reliable, and (b) the
                performance results have yet to be fully assessed.

                I've started to try and distinguish in the JavaDoc which interfaces I regard
                as part of the "public stable Saxon API", which interfaces are purely
                internal, and which have some kind of intermediate status, e.g. "for
                experimental use only". Of course nothing can be guaranteed 100% stable, but
                I want to do better than in the past.

                The front page of the documentation included in the download, unfortunately,
                describes it as version 8.3 - but the rest is up to date.

                Michael Kay
                http://www.saxonica.com/

                Posted by m.david at 01:56 PM | Comments (0) | TrackBack

                April 04, 2005

                My reaction a recent response from Dimitre Novatchev to my post regarding experience with text processing in XSLT 2.0

                stunned-by-dimitre-fxsl.jpg

                [NOTE: See extended portion of entry for my post and his response]
                [UPDATE: Added additional follow-up by Dimitre to the end of this post]

                My post:

                == Text processing on XSLT 2.0 ==

                Working on projects such as XBiblio/Citeproc lead by Bruce D'Arcus
                I have realized that even as far as the XSLT 2.0 working draft goes in
                regards to bringing Perl'esque type text processing to the XML
                developer it is still up to the developer to fine-tune these
                capabilities to cover their specific needs. For example, a spell
                checker.

                Can anyone who may have extended experience in regards to the
                development of such capabilities using XSLT share with the rest of us
                your experience?

                == Re: [xsl] Text processing on XSLT 2.0 ==

                Hi Mark,

                These days I had fun with an f:binSearch() function and then,
                logically, with f:spell().

                I have a dictionary of about 47000 English wordforms, on which I
                search with f:binSearch()

                I had to produce a faster fn than the current quadratical
                str-split-to-words template -- this is the f:getWords() function.

                All these functions can be downloaded from the FXSL CVS (just let me
                know if you'd want me to send you the zip archive).

                The combination of these functions works quite well.

                This transformation (test-FuncSpell.xsl):


                <xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:f="http://fxsl.sf.net/"
                exclude-result-prefixes="f xs"
                >
                <xsl:import href="../f/func-getWords.xsl"/>
                <xsl:import href="../f/func-spell.xsl"/>

                <xsl:output omit-xml-declaration="yes"/>

                <xsl:variable name="vDelim" as="xs:string">
                ,—:.-&#9;&#10;&#13;'!?;</xsl:variable>

                <!-- To be applied on ../data/othello.xml -->
                <xsl:template match="/">
                <xsl:variable name="vwordNodes" as="element()*">
                <xsl:for-each select="//text()/lower-case(.)">
                <xsl:sequence select="f:getWords(., $vDelim, 1)"/>
                </xsl:for-each>
                </xsl:variable>

                <xsl:variable name="vUnique" as="xs:string+">
                <xsl:perform-sort select="distinct-values($vwordNodes)">
                <xsl:sort select="."/>
                </xsl:perform-sort>
                </xsl:variable>

                <xsl:variable name="vnotFound" as="xs:string*"
                select="$vUnique[not(f:spell(.))]"/>

                <xsl:value-of separator="&#xA;"
                select="$vnotFound"/>

                A total of <xsl:value-of select="count($vwordNodes)"/> words
                were spelt, (<xsl:value-of select="count($vUnique)"/>) distinct.

                <xsl:value-of select="count($vnotFound)"/> not found.
                </xsl:template>
                </xsl:stylesheet>

                when applied on othello.xml (around 29000 words)

                produces this result:

                Saxon 8.3 from Saxonica
                Java version 1.5.0_01
                Stylesheet compilation time: 1140 milliseconds
                Processing file:/C:\xml\Parsers\Saxon\Ver.8.3\samples\data\othello.xml
                Building tree for
                file:/C:\xml\Parsers\Saxon\Ver.8.3\samples\data\othello.xml using
                class net.sf.saxon.tinytree.TinyBuilder
                Tree built in 94 milliseconds
                Tree size: 18539 nodes, 154557 characters, 0 attributes
                Building tree for file:/C:/CVS-DDN/fxsl-xslt2/f/func-getWords.xsl
                using class net.sf.saxon.tinytree.TinyBuilder
                Tree built in 0 milliseconds
                Tree size: 43 nodes, 143 characters, 22 attributes
                Building tree for file:/C:/CVS-DDN/fxsl-xslt2/data/dictEnglish.xml
                using class net.sf.saxon.tinytree.TinyBuilder
                Tree built in 188 milliseconds
                Tree size: 139140 nodes, 528397 characters, 0 attributes
                Execution time: 7015 milliseconds


                A total of 28622 words
                were spelt, (3669) distinct.

                567 not found.


                So, checking 3669 distinct words in 7015 milliseconds makes

                523.02 words/sec.

                The actual speed is faster, as the total time includes splitting up
                the words and finding the distinct words.

                Among the unknown words are such nice words as:

                affordeth
                affrighted
                ariseth
                arithmetician
                arrivance
                bethink
                betimes
                bewhored

                :o)

                Cheers,

                Dimitre


                == Follow-up from Dimitre ==

                === Update One ===

                I didn't mention that the text I was spelling was the play:

                "Othello"

                by William Shakespeare


                === Update Two ===

                On Apr 5, 2005 7:10 AM, M. David Peterson wrote:
                > Well, I think that about covers it... FXSL it is then :)
                >
                > Please see http://www.xsltblog.com/archives/2005/04/my_reaction_a_r.html
                > for a slightly extended reaction...
                >
                > Thank you Dimitre!!! As always the capabilites of FXSL have proven to
                > be flat out amazing.

                Actually, the *great praise* here goes to Saxon 8.3 and Mike Kay.

                On this transform Saxon 8.3 is about 20 times faster than Saxon 8.2,
                with which I get:

                Saxon 8.2 from Saxonica
                Java version 1.5.0_01
                Stylesheet compilation time: 969 milliseconds
                Processing file:/C:\xml\Parsers\Saxon\Ver.8.3\samples\data\othello.xml
                Building tree for
                file:/C:\xml\Parsers\Saxon\Ver.8.3\samples\data\othello.xml using
                class net.sf.saxon.tinytree.TinyBuilder
                Tree built in 156 milliseconds
                Tree size: 18539 nodes, 154557 characters, 0 attributes
                Building tree for file:/C:/CVS-DDN/fxsl-xslt2/f/func-getWords.xsl
                using class net.sf.saxon.tinytree.TinyBuilder
                Tree built in 0 milliseconds
                Tree size: 43 nodes, 143 characters, 22 attributes
                Building tree for file:/C:/CVS-DDN/fxsl-xslt2/data/dictEnglish.xml
                using class net.sf.saxon.tinytree.TinyBuilder
                Tree built in 187 milliseconds
                Tree size: 139140 nodes, 528397 characters, 0 attributes
                Execution time: 135921 milliseconds

                So,

                Saxon 8.3 7 sec.

                Saxon 8.2 136 sec.

                I strongly hope that this great achievement is not revereted in future
                versions of Saxon.

                There are some other extremely nice features of Saxon, which I've been using.

                For example, can someone guess what would be the time if I didn't look
                up in the dictionary just the distinct words, but all words as they
                come in the text?

                Cheers,
                Dimitre Novatchev.

                Posted by m.david at 03:58 PM | Comments (0) | TrackBack

          • © 2005 :: <XSLT:Blog/> (xsltblog.com) is a product of M. David Peterson and FunctionalX Consulting. See Licensing Info Below.
          • Except where otherwise noted, this sites content and source code is licensed under the Attribution License from Creative Commons.