Reducing redundancy in XML documents

An issue that always seems to arise when adding an element (whether core or extension) to a feed format is whether the element appearing at the feed level applies to the entries. The reason to want it to apply is obvious--avoiding having to repeat identical data over and over in each entry. But allowing inheritance can create problems. For example, if an entry needs to override the inherited value, that's easy enough, but what if you simply want to make the value undefined for a particular entry? Generally, no can do.

An idea I proposed while working on Atom 1.0 was to create a mechanism for specifying data in one place in a document and then simply referring to it from other places. The idea was voted down (it is admittedly more work for implementors...though it shouldn't be THAT much work--and compression does a good job of minimizing the impact of redunant data on he amount of data that has to be transmitted), but over and over again, I see cases where I'd really like to be able to do it.

So now I'm thinking of proposing a general XML extension for reducing redundancy in XML documents. Here's my current thinking:

The extension would define three attributes (I'll use "xref" as the prefix here): xref:id, xref:ref and xref:here.
xref:id would be a document-unique ID for the containing element (and any child elements--they come as a package)
It would be recommended that xref:id be a very short value--since it only needs to be unique in the scope of the document, it needn't be anything fancy.
xref:ref would point to the element whose xref:id had the same value (example below).
xref:here would only appear on elements with an xref:id. It would have a boolean value indicating whether the element applied the the document at the position where it appeared, or whether it was just there in order to be referred to from elsewhere in the document. The default would be "true" (ie., the element applies where it exists).
Elements referred to by xref:ref MUST appear earlier in the document than elements referring to them. It might be good to specify that the element referred to has to be "in scope" (ie. the child of an ancestor or the referring element), but maybe that should just be a best practice.
The extension could only be used in conjunction with elements whose specifications explicitly state that they are compatible with this extension. So for example, since the Atom specification doesn't know about this extension, these attributes could not be used on core Atom elements. (I'd be tempted to mint a namespace for "Atom plus xref" which would make documents looking exactly like Atom documents, but with xref support...and a different namespace for all the elements).
The extension would not be recommended for use with digitally signed data unless it is intended that data never be excerpted from it and that it never be combined with data from other documents that might use xref. The reason for this is that combining data could result in duplicate xref:ids in one document, and substituting in the full data to avoid that problem would break the digital signature.
(I'm not sure about this one): Items containing xref:ref must be empty and have no other attributes--they can't override values from the referenced element. Being able to do so would be nice for compactness, but would make processing more difficult. Which is more important?

A quick example:

Reader Comment:

Antone Roundy said:

Issue: Would inherited attributes like xml:base, xml:lang, etc. be taken from the context of the original element or from the context of the referrer? The original element makes more sense to me--I think it would be simpler to implement, and it would...

(join the conversation below)

<title>The title of the document</title>
<title xref:id="1" xref:here="0">No title</title>
<author xref:id="2">Antone Roundy</author>

<item>
<title>The title of one item</title>
<author xref:ref="2" />
<content>foo</content>
</item>

</foo>

Antone Roundy Says:
August 15th, 2006 at 2:28 pm

Subscribe to Our RSS Feed or Podcast:

...or subscribe by email:

Categories

Reducing redundancy in XML documents

Share with Friends & Colleagues:

One Response:

Share Your Comments: