[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
XML Data Bindings
Hi All,
I have been working on the dtd -> XML Schema conversions and the data
bindings. I do have some progress to report.
I have created a XML Schema (.xsd file) which looks to be pretty good in
that:
1. It validates through XSV
2. The Xlinks work okay
3. It matches the dtd by my personal sanity check against the dtd.
Caveats:
1. It does not account for the Extension points placed in the dtd, so I am
not sure what will happen if we encounter a file using gxl-extension or the
like.
2. I have not checked all the low level data types.
3. I had to hand edit in a few lines to the generated XML Schema because
of a bug in the converter. It was dropping a couple of attribute
definitions. Also Xlinks are problematic so I have adopted an approach that
has been successful on another project.
So the dtd->XML Schema process I am using is:
1. Take the gxl dtd and run it through Neko 0.1.11 (by Apache's Andy Clark
though it is a rather old tool) to create a gxl xml schema.
2. Check the xml schema for validity according to w3c's xsv checker.
3. Fix Xlink and other problems by hand.
4. Validate the xml schema before progressing to data binding generators.
Alternatives:
1. Using the gxl schema off the gxl website. Unfortunately, it does not in
it's present form validate. Also there are a few areas in it where it IMHO
does not conform to the dtd. Specifically, note that "node" and "edge"
elements in the XML Schema do not have ID attributes which appear to me to
be required in the gxl dtd.
2. Using other tools to make XML Schemas. I used Syntext's dtd2xsd tool on
Windows, but it created a non validating XML Schema. I have hand worked it
a bit, but I like the output of Neko better because it is easier for me to
understand.
So all this work must raise the question, why go to the trouble. Well, it's
because the (free) data binding tools all seem to require an XML Schema.
Data bindings grab the output of the XML parser you use and form the tokens
into objects more akin to the things you've got defined in your XML Schema.
For instance, if you run a parser on something like:
<node id="25">
<type xlink:href="http://sesweb.ics.uci.edu/#simple"/>
<attr name="file">
<string>Helloworld.c</string>
</attr>
</node>
If you were grabbing this from a parser, you might get the tag name for
node then a sequence of bytes for the id then the next tag name, etc. You'd
have to put them in data structures yourself. And the big headache here is
going to dealing with namespaces.
Using a data binding for this gets a class like:
Node
which has all the methods (along with others like set):
String getID()
String getAttrName()
String getAttrValue()
Classes like this are a lot easier to deal with.
So once you have the XML Schema, then you have to choose and use a data
binding generator. I have been creating a set of bindings using Jaxb which
comes from Sun and is part of their webservices/xml stuff. It generates
.java files after you run the xjc.sh program that comes with it. It's a bit
tricky to set up but has run all right for me.
Rob is looking into Castor, which is an open source data binding generator.
There are others out there, and some even use dtd's, but those are
commercial and I have not found one that is both widely used and has a nice
license agreement like Jaxb and castor have.
I have tried to use to old dtd hooks in Jaxb to generate bindings from the
the gxl dtd but have not been successful. They are still there, but they
don't seem to work on anything other than "toy" examples.
I have been testing a the data binding builds and looking for errors in the
builds based on errors in the XML Schemas. The builds do have some nasty
bits in them and I have been working on helper classes to clean up some of
that. They do work pretty well, so I am pleased over all. Hopefully I'll
have something for Rob and the rest of you soon. It's a bit complicated in
that I have to package up some of Sun's JWSDP and possibly some others that
the data bindings import.
Now the data bindings are a big step closer to the IMR. They become in
effect the backbone for the IMR, where the data or model resides.
Okay now to other things:
1. I reset a few of the directories to group executable. chmod is a command
you can use reset directories and files. The man page here is weak, but
just look through google and you can see the usage parameters (uga+-rwxst).
Okay that's about it from me :-)
Yuzo