December 13th, 2008

amused, happy
  • mart

Why do sites still publish RSS?

In my travels all over the web looking for examples to use as the basis for AtomActivity it was interesting to note the number of sites that are still publishing both Atom and RSS feeds in parallel.

Given that Atom was designed to be the "cure to all ills" of RSS, it seems like you ought to be able to publish anything you can publish in RSS as an Atom feed, even just as a mechanical transformation. Perhaps it's just the ease of doing it that's the motivation? "Neither are hard to generate, so let's just do both."

Where this becomes troublesome is the definition of extensions. AtomActivity, by virtue of being an Atom extension, can't be used directly in RSS. While it's true that you can plug the namespaced XML elements that are specified into an RSS item element, there are still several incongruities: first and most obvious is that activity:object is defined to contain the same elements you find in the atom:entry element (more or less), but also some of the object types we'll be adding in will also have a description of how to extract their properties (such as a photo's image URL) from an Atom entry, and that description won't work without modification on an RSS item.

So what's an extension author to do? Do I need to write a parallel "RSSActivity" spec that's fundamentally the same but uses RSS elements in place of Atom ones? Do we need to define for every object type a mapping for both Atom and RSS?

Another place this problem manifests is libraries that act as an abstraction layer over RSS and Atom. It's true that for the basic case of publishing feeds of weblog entries the interface to both of those is basically the same, but Atom is in fact a superset of RSS (functionality-wise) and so any such libraries are necessarily restricted to supporting only what RSS can do. The use-case for these libraries is "Here's the URL for a feed. Parse it at all costs. I don't care what format it's in". That sounds useful on the surface, but are there really any significant sites left that publish RSS but don't also publish Atom? Can't we just leave RSS to die and use Atom-specific libraries?

Browsers suffer in this department also. On just about every site I visited, when I clicked the "Feed" icon in my browser I got a pop-up menu with two options: "Feed (Atom)" and "Feed (RSS)". Do we really want to be forcing users to make the choice between two options that, as far as their browser is concerned, behave in exactly the same way? Firefox and Opera -- and, I assume, every other major browser -- supports Atom, so can't we just remove the RSS autodiscovery links, even if the underlying feeds remain? Consign RSS to the bucket of "we maintain this for backwards compatibility" rather than "this is functionality we actively promote". In an ideal world, browsers would ignore the RSS feed if an Atom feed is present, but since the browser can't reliably determine that the RSS and that Atom versions really are the same content, it's left to the page author to make that decision.

If you publish both RSS and Atom feeds on your site I'd love to hear why. If you're publishing exclusively RSS I'd love to hear why as well.

amused, happy
  • mart

The sorry state of media in Atom and RSS

Part of the AtomActivity work is defining a single standard way to publish the metadata about the core object types in an Atom entry. For the object type "weblog entry", our work is basically done: that's what Atom was built for. Things get interesting when you consider representing photos and videos in Atom.

Photos and videos have lots of interesting properties above what weblog entries have, the most important of which are the URLs for different-sized representations in different formats. Many moons ago, while Atom was in its infancy, Yahoo! invented Media RSS for this purpose. Media RSS is an extension to RSS that adds a multitude of interesting new elements, including content and thumbnail that together handle locating the different-sized representations of a media object.

MediaRSS seems to have been adopted in the RSS feeds exposed by most of the popular media hosting sites, including Flickr, YouTube and Picasa Web Albums. There's also a handful of "media aggregators" — mostly focused on audio — that support MediaRSS. However, as seems to be the case for many things RSS, the specification gives loads of options and no clear guidance on what to actually do and consequently everyone implements it slightly differently.

What of Atom? The situation in Atom land is considerably more grim. Atom itself has nothing more than a workalike of RSS's original enclosure element, and while a few publishers are making use of it (Flickr, for example) this isn't enough to provide the various representations you generally want to publish for a media resource. It seems that around the time MediaRSS was being developed there was a thread about developing something similar for Atom on the Atom IETF mailing list, though as far as I can tell the outcome was "wait until MediaRSS is finished and use it as a basis". MediaRSS was of course eventually finished, but I guess by that time the Atom working group at IETF had published its two RFCs and wound up.

My research suggests that today most publishers just omit media information (beyond the basic "enclosure" link) completely in their Atom feeds, while publishing it via MediaRSS in their RSS feeds. Google's YouTube and Picasa Web Albums are the only example I could find where MediaRSS elements are published both in RSS and Atom feeds, though in both cases they do it differently than everyone else (everything's wrapped in a single media:group element, rather than included as direct children of item) and FeedValidator says that Picasa's feeds are in fact invalid because they only include one media:content element in the group, though of course the MediaRSS spec itself says little about this.

MediaRSS also, on a more subjective level, feels like a bit of a foreign citizen in Atom. Many of its elements overlap with elements already defined in Atom, and of course it doesn't use Atom's link element because that concept does not exist in RSS.

So the question now is how to specify media element handling in AtomActivity. MediaRSS has far more functionality than is required for the Activity Stream use-cases. As I see it, the options here are:

  • Write AtomActivity to specify that, for photos and videos, you are to "retrieve the media object metadata as defined by MediaRSS" and leave it at that. However, I feel that MediaRSS is too big and underspecified, with two many possible variations.
  • Define a subset of MediaRSS that only includes the minimum necessary for the activity streams use-cases, and is far more rigid about how things are to be published.
  • Use MediaRSS as the basis for a separate specification that has a narrower scope and feels more at home in Atom.

If MediaRSS in Atom were already widely and consistently deployed I wouldn't hesitate to go for the second of these options, but since everyone except Google would have to add to their Atom feeds anyway, and since existing Atom parser implementations are unlikely to have support for MediaRSS right now anyway, I'm leaning towards the last of these, defining an extension that builds upon (and is backwards-compatible with) how "enclosures" are already represented in Atom. The MediaRSS folks have already done the hard work of figuring out the featureset, so the work would largely be just mapping MediaRSS concepts onto Atom structures.

I'm fully expecting to hear loads of cries of "don't reinvent the wheel!", which is fair enough, but my review of current practice suggests that Atom enclosures are currently far more widely deployed than MediaRSS-in-Atom, so defining something that extends Atom's enclosure mechanism seems like a better way to go than switching to something entirely different. I'm going to take a whack at an "Atom Media Extensions" and see how it turns out.