This blog can't be viewed on LiveJournal. Instead see http://www.apparently.me.uk/22262.html.

The sorry state of media in Atom and RSS

13th Dec 2008

Part of the AtomActivity work is defining a single standard way to publish the metadata about the core object types in an Atom entry. For the object type "weblog entry", our work is basically done: that's what Atom was built for. Things get interesting when you consider representing photos and videos in Atom.

Photos and videos have lots of interesting properties above what weblog entries have, the most important of which are the URLs for different-sized representations in different formats. Many moons ago, while Atom was in its infancy, Yahoo! invented Media RSS for this purpose. Media RSS is an extension to RSS that adds a multitude of interesting new elements, including content and thumbnail that together handle locating the different-sized representations of a media object.

MediaRSS seems to have been adopted in the RSS feeds exposed by most of the popular media hosting sites, including Flickr, YouTube and Picasa Web Albums. There's also a handful of "media aggregators" — mostly focused on audio — that support MediaRSS. However, as seems to be the case for many things RSS, the specification gives loads of options and no clear guidance on what to actually do and consequently everyone implements it slightly differently.

What of Atom? The situation in Atom land is considerably more grim. Atom itself has nothing more than a workalike of RSS's original enclosure element, and while a few publishers are making use of it (Flickr, for example) this isn't enough to provide the various representations you generally want to publish for a media resource. It seems that around the time MediaRSS was being developed there was a thread about developing something similar for Atom on the Atom IETF mailing list, though as far as I can tell the outcome was "wait until MediaRSS is finished and use it as a basis". MediaRSS was of course eventually finished, but I guess by that time the Atom working group at IETF had published its two RFCs and wound up.

My research suggests that today most publishers just omit media information (beyond the basic "enclosure" link) completely in their Atom feeds, while publishing it via MediaRSS in their RSS feeds. Google's YouTube and Picasa Web Albums are the only example I could find where MediaRSS elements are published both in RSS and Atom feeds, though in both cases they do it differently than everyone else (everything's wrapped in a single media:group element, rather than included as direct children of item) and FeedValidator says that Picasa's feeds are in fact invalid because they only include one media:content element in the group, though of course the MediaRSS spec itself says little about this.

MediaRSS also, on a more subjective level, feels like a bit of a foreign citizen in Atom. Many of its elements overlap with elements already defined in Atom, and of course it doesn't use Atom's link element because that concept does not exist in RSS.

So the question now is how to specify media element handling in AtomActivity. MediaRSS has far more functionality than is required for the Activity Stream use-cases. As I see it, the options here are:

  • Write AtomActivity to specify that, for photos and videos, you are to "retrieve the media object metadata as defined by MediaRSS" and leave it at that. However, I feel that MediaRSS is too big and underspecified, with two many possible variations.
  • Define a subset of MediaRSS that only includes the minimum necessary for the activity streams use-cases, and is far more rigid about how things are to be published.
  • Use MediaRSS as the basis for a separate specification that has a narrower scope and feels more at home in Atom.

If MediaRSS in Atom were already widely and consistently deployed I wouldn't hesitate to go for the second of these options, but since everyone except Google would have to add to their Atom feeds anyway, and since existing Atom parser implementations are unlikely to have support for MediaRSS right now anyway, I'm leaning towards the last of these, defining an extension that builds upon (and is backwards-compatible with) how "enclosures" are already represented in Atom. The MediaRSS folks have already done the hard work of figuring out the featureset, so the work would largely be just mapping MediaRSS concepts onto Atom structures.

I'm fully expecting to hear loads of cries of "don't reinvent the wheel!", which is fair enough, but my review of current practice suggests that Atom enclosures are currently far more widely deployed than MediaRSS-in-Atom, so defining something that extends Atom's enclosure mechanism seems like a better way to go than switching to something entirely different. I'm going to take a whack at an "Atom Media Extensions" and see how it turns out.

Comments

TPCThread 6a010535617444970b01053614336e970c http://www.apparently.me.uk/22262.html