[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

XML Data Bindings



Hi All,

I have been working on the dtd -> XML Schema conversions and the data bindings. I do have some progress to report.

I have created a XML Schema (.xsd file) which looks to be pretty good in that:
1. It validates through XSV
2. The Xlinks work okay
3. It matches the dtd by my personal sanity check against the dtd.
Caveats:
1. It does not account for the Extension points placed in the dtd, so I am not sure what will happen if we encounter a file using gxl-extension or the like.
2. I have not checked all the low level data types.
3. I had to hand edit in a few lines to the generated XML Schema because of a bug in the converter. It was dropping a couple of attribute definitions. Also Xlinks are problematic so I have adopted an approach that has been successful on another project.


So the dtd->XML Schema process I am using is:

1. Take the gxl dtd and run it through Neko 0.1.11 (by Apache's Andy Clark though it is a rather old tool) to create a gxl xml schema.
2. Check the xml schema for validity according to w3c's xsv checker.
3. Fix Xlink and other problems by hand.
4. Validate the xml schema before progressing to data binding generators.


Alternatives:
1. Using the gxl schema off the gxl website. Unfortunately, it does not in it's present form validate. Also there are a few areas in it where it IMHO does not conform to the dtd. Specifically, note that "node" and "edge" elements in the XML Schema do not have ID attributes which appear to me to be required in the gxl dtd.
2. Using other tools to make XML Schemas. I used Syntext's dtd2xsd tool on Windows, but it created a non validating XML Schema. I have hand worked it a bit, but I like the output of Neko better because it is easier for me to understand.


So all this work must raise the question, why go to the trouble. Well, it's because the (free) data binding tools all seem to require an XML Schema.

Data bindings grab the output of the XML parser you use and form the tokens into objects more akin to the things you've got defined in your XML Schema. For instance, if you run a parser on something like:
<node id="25">
<type xlink:href="http://sesweb.ics.uci.edu/#simple"/>
<attr name="file">
<string>Helloworld.c</string>
</attr>
</node>


If you were grabbing this from a parser, you might get the tag name for node then a sequence of bytes for the id then the next tag name, etc. You'd have to put them in data structures yourself. And the big headache here is going to dealing with namespaces.

Using a data binding for this gets a class like:
	Node
which has all the methods (along with others like set):
	String getID()
	String getAttrName()
	String getAttrValue()
	
Classes like this are a lot easier to deal with.

So once you have the XML Schema, then you have to choose and use a data binding generator. I have been creating a set of bindings using Jaxb which comes from Sun and is part of their webservices/xml stuff. It generates .java files after you run the xjc.sh program that comes with it. It's a bit tricky to set up but has run all right for me.

Rob is looking into Castor, which is an open source data binding generator. There are others out there, and some even use dtd's, but those are commercial and I have not found one that is both widely used and has a nice license agreement like Jaxb and castor have.

I have tried to use to old dtd hooks in Jaxb to generate bindings from the the gxl dtd but have not been successful. They are still there, but they don't seem to work on anything other than "toy" examples.

I have been testing a the data binding builds and looking for errors in the builds based on errors in the XML Schemas. The builds do have some nasty bits in them and I have been working on helper classes to clean up some of that. They do work pretty well, so I am pleased over all. Hopefully I'll have something for Rob and the rest of you soon. It's a bit complicated in that I have to package up some of Sun's JWSDP and possibly some others that the data bindings import.

Now the data bindings are a big step closer to the IMR. They become in effect the backbone for the IMR, where the data or model resides.

Okay now to other things:

1. I reset a few of the directories to group executable. chmod is a command you can use reset directories and files. The man page here is weak, but just look through google and you can see the usage parameters (uga+-rwxst).

Okay that's about it from me :-)

Yuzo