When someone publishes an RSS or Atom feed, what are they expecting people to do with it? View it in their feed reader? Display it on their website? Aggregate it with other feeds and republish the results in a new feed? Display an aggregated feed on their website? Some publishers include comments in their feeds indicating what uses of their feeds they allow, though the precision and comprehensiveness of licensing statements vary widely. And many publishers probably never even consider the question. Is there an XML format for describing licensing terms? If not, why not? How might such a format look?

Bob Wyman posted some comments on the topic on his blog today. He brings up the question of whether it would be possible to create a license-describing XML format that wasn't encumbered by patents, and references a few patents that might be relevant. I took a look at one of them, which Bob suggested might "claim all uses of XML to describe licenses for software products if those licenses are in 'files'," and reached a different conclusion. Neither Bob nor I is a lawyer, so don't take our word for it, but it looks to me like this particular patent is focused much more narrowly than that. It seems tightly bound to licenses for executable code, not data, and even more narrowly, on executable code that runs in some sort of runtime environment--the Java VM being the one specifically discussed. Of course, even if I'm right about this one, there may be other patents out there that would get in the way of a free XML-based license description format.

A little while back, the discussion in the IETF AtomPub working group got me thinking about this question, and I hashed out a few ideas for a format. We're not going to include anything like this in the Atom core, partly because relegating it to an extension would help ensure that it was not considered to be useable for binding DRM because there's no requirement for software claiming to support Atom to handle any extensions. Anyway, for what it's worth, here's the current state of my idea:

• The extension would explicitly state that it is purely for informational purposes--not for access control--but that feed processing tools may optionally use the information to help their users be sure they are complying with the publisher's requirements.
• The extension would define a link type for pointing to the official legalese licensing information for the feed or entry ("item" in RSS).
• The licensing information would refer only to data actually contained within the document in which it appears--not to data that the feed links to.

Here's a sample:

<license:license default="deny" id="asdf">
<link rel="http://...license#documentation" xhref="http://www.mydomain.com/my-feed-license.pdf" mce_href="http://www.mydomain.com/my-feed-license.pdf" type="application/pdf" />
<license:allow who="individual,non-profit/* government/education for-profit/news+education,aggregator">*/free</license:allow>
<license:allow who="for-profit/*">internal/* mirror/* aggregate/free</license:allow>
</license:license>

<license:license ref="asdf" />

What does the above XML mean?

• default="deny": if a particular use of the feed is not mentioned below, then it is not allowed.
• id="asdf": this ID can be referred to from elsewhere in the document to indicate that that part of the document has the same licensing terms as are defined here.
• link rel=...: the official license can be found in a PDF document at the URL shown.
• license:allow...: this element defines one or more allowed uses of the feed or entry in which this license:license section appears.
• who=...: for whom does this element describe allowed uses?
• individual,non-profit/* government/education for-profit/news+education,aggregator: all "individuals" (definition below), all non-profits, government educational organizations, for-profit educational news organizations, for-profit aggregators. More on this below.
• */free: those describe by the "who" attribute may use the feed or entry for any use that they provide without restriction at no cost.
• for-profit/*: the next license:allow element applies to all for-profit entities.
• internal/* mirror/* aggregate/free: this feed or entry may be used (by for-profit entities) internally in any way, may be mirrored (ie. republished without any modification), and may be aggregated with other feeds if the aggregated feed is freely available at no cost.
• ref="asdf": the feed or entry containing this element has the same licensing terms as the one above which had the 'id="asdf"' element.

More information:
• The XML referred to by the ref attribute must be in the same file as the element containing the ref attribute.
• The "default" attribute may have the value "allow", "deny" or "unspecified".
• The "who" attribute contains one or more type (or combinations of types) and subtype (or combination of subtypes), separated by a slash ("/").
• The "who" types are "for-profit" (any for-profit organization or individual in a for-profit role), "non-profit" (a legal non-profit organization), "government", "individual" and "informal" (families, informal groups, etc., that don't fall into the other categories).
• The "who" subtypes are (...disclaimer--this is a difficult list to come up with, and may change drastically...) "education", "news", "aggregator" and "advertising". "advertising" is used to indicate that a "for-profit" would be not for profit except that it is advertising supported.
• The types for the element content are "website" (display the feed on a website--usually as HTML, but it could be flash, etc.), "intranet" (distribute via an intranet or VPN, in whatever format), "aggregate" (combine with data from other feeds in an aggregate feed), "single" (view in a desktop aggregator--this value may a allowed for all users by default in all cases unless explicitly forbidden), "mirror" (republish without any modification), and "email" (republish via email).
• The subtypes for the element content are "free" (accessible without restriction at no charge to anyone who has access to the network on which the data is being published), "non-free" (access-limited, whether requiring payment, requiring membership, or any other restriction), "advertising" (the same as "free" except that the user inserts advertising into the feed, email, area of a webpage, etc., in which the feed data is being published. Note that this does NOT apply to webpages where advertising appears outside of the rectangle encompassing the feed data).
• For both types and subtypes for both the "who" attribute and the element content, "*" is the wildcard character meaning "all".

Both in the "who" attribute in for the element content, multiple types and subtypes may be combined to enable more concise XML. The following symbols are used to combine types and subtypes.

• "+": means "and" within a type or subtype section. This symbol binds more tightly than any other (I'll show what that means below).
• ",": means "or" within a type or subtype section. Binds next most tightly to "+".
• "/": connects a type (or types) and subtype (or subtypes). Binds next most tightly.
• " " (space): means "or" between multiple type/subtype combinations. Binds least tightly.

For non-programmers, here's what it means that one symbol binds more tightly than others. Given the "who" subtype "news+education,aggregator", "education" is more tightly bound to "news" than it is to "aggregator". So this means "either 'news and education' or 'aggregator'." It does not mean "Both 'news' and 'education or aggregator'." Make sense?

Well, there's my idea. Someday I may get around to writing up a formal document for it. I'll have to look into the patent question a little more first.