Because good research needs good data

IDCC 09 Keynote: Timo Hannay

Chris Rusbridge | 08 December 2009

Timo Hannay presented a talk entitled 'From Web 2.0 to the Global Database”, providing a publishing perspective on the need for cultural change in scientific communication.Hannay took a step back to take a bigger picture view. He began by giving an overview to his work at Nature, noting that the majority of their business is through the web – although not everyone reads the work electronically, they do access the content through the web. He then explained how journals are being coming more structured, with links providing supplementary links and information. He admitted that this information is not yet structured enough, but it is there – making the journal more like databases.Hannay moved on to explain that Nature is getting involved in database publishing. They help to curate and peer-review database content and commission additional articles to give context to the data. This is a very different way of being a science publisher – so the change is not just for those doing the science!After taking us through Jim Gray's four scientific paradigms, Hannay asked us to think back to a talk by Clay Skirky in 2001, which led to the idea that the defining characteristic of the computer age is not the devices, but the connections. If a device is not connected to the network, it hardly seems like a computer at all. This led Tim O'Reilly to develop the idea of the Internet Operating System, which morphed into the name “Web 2.0”. O'Reilly looked at the companies that survived and thrived after the dot com bubble and created a list of features which defined Web 2.0 companies, including the Long Tail, software as a service, peer-to-peer technologies, trust systems and emergent data, tagging and folksonomies, and “Data as the new 'Intel Inside'”.... the idea that you can derive business benefit from powering data behind the scenes.Whilst we have seen the Web 2.0 affect science, science blogging hasn't really taken off as much as it could have done – particularly in the natural sciences – and is still not a main stream activity. However, Hannay did note some of the long term changes we are seeing as a result of the web and the tools it brings: increasing specialisation, more information sharing, smaller 'minimum publishable unit', better attribution, merging of journals and databases – with journals providing more structure to databases – and new roles for librarians, publishers and others. Hannay asserted that these changes are leading, gradually, to a speeding up of discovery.Hannay took us through some of the resources that are available on the web, from Wikipedia to PubChem and ChemSpider, where the data is structured and annotated through crowd sourcing to make the databases searchable and useable. He asserted that we are moving away from the cottage-industry model of science, with one person doing all the work in the process from designing the experiment to writing the paper. We are now seeing whole teams with specialisms collaborating across time and space in a more industrial-scale science. Different areas of science at at different stages with this.Hannay referred to Chris Anderson's claim on Wired Magazine that we no longer need theory. He rejected this, but did agree that more is different, so we will be seeing changes. He gave the example of Google, which didn't develop earlier in the history of the web simply because it was not necessary until the web reached a certain degree of scale for it to be useful.As publishers, Hannay believes that have a role to play in helping to capture, structure and preserve data. Journals are there to make information more readable for human beings, but they need think about how they present information to help both humans and computers to search and access information as both are now just as important.All human knowledge is interconnected and the associations between facts are just as important as the facts themselves. As we reach that point when a computer not connected to the network is not really a computer, Hannay hopes we will reach a point where a fact not connected to other facts in a meaningful way will hardly be considered a fact. One link, one tag, one ID at a time, we are building a global database. This may be vast and messy and confusing, but it will be hugely valuable – like the internet itself as the global computer operating system.