Thursday, 9 April 2015

Open Access and content mining

We've previously blogged about the British Library Electronic Theses Online Service (EThOS) that stores theses metadata and, where possible, the full text of digitised theses. BL Labs now wants to explore EThOS metadata for content mining or analysis of trends and is currently inviting research questions that could be answered using this approach. The move follows on from its project providing data to Virginia Tech to develop algorithms for automated subject tagging of theses.

This is another example of an overlay project underpinned by large-scale data harvesting such as the successful Mechanical Curator project that released one million out of copyright images into the public domain for researcher use and re-use.

ChemSpider is an earlier project that brings together chemical structures from a variety of sources into a free database including data from St Andrews theses. This publishing platform provides opportunities to make good quality data public, re-use and preserve known compound data and related information to advance research, develop services and surface the data on the wider internet.

It's an exciting area of Open Access and Open data and there are likely to be further developments as efforts are made digitising older theses and other sources.


Image captured by the Mechanical Curator project.  No known copyright restrictions.


No comments:

Post a Comment