Prototyping Weeknotes #96
This week's featured project is ABC-IP, a two year collaborative research project, part funded by the , examining ways to automatically link together different sources of metadata around large video and audio collections.
One particularly interesting dataset we have access to is the entire World Service audio archive. This spans several decades, and consists of about 26,000 hours (three years in total) of audio content.
This World Service archive has traditionally been very different from other programme datasets at the ³ÉÈËÂÛ̳. The only overlap is with ³ÉÈËÂÛ̳ Programmes, for programmes broadcast since May 2008.
The data available is quite patchy: a number of programmes claim to have been first broadcast after now (e.g. 2099), or before the start of the ³ÉÈËÂÛ̳ (e.g. 1900), for example. However, the actual audio content is high quality and the content itself is the usual excellent World Service mix of education, information and entertainment.
We worked on trying to make this archive searchable, and linked up with other datasets at the ³ÉÈËÂÛ̳ and outside, by analysing the content of the programmes and automatically classifying them with URIs.
First, we developed an algorithm allowing us to do that with reasonable accuracy. We're working on releasing a Python implementation of it on , which will be described in further detail on this blog.
The next stage was to apply this algorithm to the whole World Service archive. We developed an API to manage and distribute the processing across a large number of instances, and successfully used it for automatically tagging around 27,000 programmes in about a week, for a predictable cost.
We'll present this work, and some applications of the resulting tags (like the prototype our partner Metabroadcast blogged about), at in Lyon, next April.
Meanwhile, we've started to investigate the challenges around presenting very large, partially described archives, looking at the design challenges for tag based navigation and retrieval. We're very interested in how we can engage the audience to improve this data.
In other project news:
FI-Content
The diary study has started and whilst we're collecting data, Joanne and Penny are planning a questionnaire and materials for the lab study. Andrew's been building a dashboard showing the users' programme data using and .
Chris N. has been working with Barbara on discussing and defining technical enablers for the project and to start thinking about the scope of the prototype we'll need to demo at the end of the first phase of the project. This week has been filled with project deadlines, with a deliverable about large-scale testing of use cases. Preparations are underway to welcome all the project partners to London for the next pan-European project meeting we're hosting. Akua has been sorting out the important logisitical work of getting 30 people here and working out where to put them.
EBU Radio Week
Chris L., Dan, George, Libby and Sean were in Geneva for the . George gave an to the assembled Radio Summit delegates (think - business atire) and Chris L gave a talk about RadioTAG aimed at developers, while everyone saw a lot of interesting presentations. Of particular interest was the developments in the RadioEPG specification. It brings programme information to radios, allowing people to listen to their favourite stations even if they have to switch between IP and Brodcast streams.
Dan and Chris also enjoyed seeing how open-source sofware was being used to make community radio affordable in Denmark and France. The allows Kanal Plus in Copenhagen to broadcast on behalf of 43 local stations for a hardware outlay of about €500 each.
Libby enjoyed this . The cheap DAB radio inside is enabling educational projects. Find more photos from the event on the .
W3C Audio Working Group
Olivier spent a chunk of the week on W3C Audio Working Group business, especially on editing the document for and then matching up use cases and requirements.
Meanwhile, Matt joined the working group so he spent some time finding his way around the and whilst thinking about use cases and demonstrators.
Interesting links
Finally, here's a round-up of interesting links from the team.
Comment number 1.
At 19th Mar 2012, U14179821 wrote:All this user's posts have been removed.Why?
Complain about this comment (Comment number 1)