Output – IncReASe: Final Report – web pages

Title: IncReASe: Final Report

Page: 13

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

Summary of contents:
“Analysis of individual researcher publication pages revealed a good deal of inconsistency of formatting, including within individual publication lists. The idea of “scraping” publication metadata from researcher pages is attractive, but the reality is quite challenging.”
“The Perl code written for one author could not be reused with another and would need tweaking every time.”

more detials of the issues encountered are available http://eprints.whiterose.ac.uk/increase/scraper.html

the project notes that the AIR project is investigating more sophisticated approaches to this problem (using machine-learning alogorithims) http://clg.wlv.ac.uk/projects/AIR/.

