Title: HILT phase IV demonstrators

Date Released: June 2008
URI for Output: http://hilt.cdlr.strath.ac.uk/hilt4/demonstrators.html

HILTIV has produced a number of service demostrators examining “cross-searching multi-subject scheme information environments”. The demonstration aspect of their work is that the mapppings being used are of selected sections of vocabularl schema.

They have also embedded the demostrator service within BUBL (an catalogue of resources relating to Library and Information Sciences) http://bubl.ac.uk/hilt4.htm
They have also created some form of client for OCLC http://linuxserv.cdlr.strath.ac.uk/~anuj/cgi-bin/hilt4/oclc_client.cgi


Project – SHERPA Romeo

Short Project Name: SHERPA Romeo

Programme Name: Digital Repositories Programme (2005-2007)


JISC Project URI: http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2005/sherparomeo.aspx

Project URI: http://www.sherpa.ac.uk/romeo/

Start Date:March 2006

End Date:June 2007


Contact Name and Role:Bill Hubbard

The Sherpa RoMEO site offers information about publishers’ policies with respect to self-archiving pre-print and post-print research papers.  The current funding stream supports the following activities:

· scoping and setting up a strand of work, within the proposed interim repository, to maintain and develop the Sherpa/RoMEO database;
· maintaining the Sherpa/RoMEO database journal-level information;
· developing the Sherpa/RoMEO database to indicate which journals have policies that allow authors to deposit their papers in accordance with the Wellcome Trust Grant conditions.’

Output – IncReASe: Final Report – self archiving rates

Title: IncReASe Final Report

Pages: 10-11

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

“It is often stated that, worldwide, the spontaneous level of self-archiving is around 10-15% (i.e. about 15% of published articles are made openly available by their authors).[Harnard (2006), Björk, B-C., Roosr, A. & Lauri, M. (2008)] We found similar levels of archiving: 16% of questionnaire respondents link to local, open copies of their work; 19% link to external copies – though often these are not openly accessible. Having said this, much of the self-archived content on web sites is working papers, reports and conference papers; the % of published journal papers spontaneously self-archived (on personal web sites or in any repository) by White Rose authors is likely to be lower than 15%. Of course, there is considerable variation between subject disciplines. This highlights the immediate potential value of open access repositories but also, perhaps, underlines the scale of the cultural change required – even after several years of institutional repository development – to engage researchers in active dissemination of their outputs.”

This provides further evidence for the percentile statistics of self-archiving. One consequence of this figure (even within a now established repository) is the challenge faced by instituions seeking to comply with funder’s deposit manadates.

Output – IncReASe: Final Report – proxy deposit

Title: Increase Final Report

page: 12

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

“Our experience to date, though, suggests authors will make the most of administrative support and that a helpful administrative framework results in higher levels of self-archiving overall. In particular, authors are responsive to well-known individuals in their departments: for example, local administrators have good success rates in persuading authors to re-send appropriate versions of their work where a non-archivable version (generally the published PDF) has been sent initially. Local administrators are well placed to “champion” and support the repository in ways that more “remote” central repository staff are not; this advantage needs to be balanced against the need to provide training and support for departmentally based administrators.”

The project also notes that encouraging this practice may hinder the promotion of self-archiving as such.

This raises an interesting question of priority – is the goal author self-archiving or increased repository content?
From the point of view of a funding body / the promotion of Open Access / institutional statistic (and REF) concerns the latter is important;
however, there are strong historical ties to author self-archiving, the author is (in some senses) the one doing the sharing, and the less self-archiving the greater organisational and financial overhead of the repository.

Either way the project’s findings support the view that the invovlement of local administrators increases depost rates (motivation).

Output – IncReASe: Final Report – web pages

Title: IncReASe: Final Report

Page: 13

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

“Analysis of individual researcher publication pages revealed a good deal of inconsistency of formatting, including within individual publication lists. The idea of “scraping” publication metadata from researcher pages is attractive, but the reality is quite challenging.”
“The Perl code written for one author could not be reused with another and would need tweaking every time.”

more detials of the issues encountered are available http://eprints.whiterose.ac.uk/increase/scraper.html

the project notes that the AIR project is investigating more sophisticated approaches to this problem (using machine-learning alogorithims) http://clg.wlv.ac.uk/projects/AIR/.

Output – IncReASe: Final Report – repository growth

Title: IncReASe: Final Report

Page: 20

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

The project had various intended outcomes. One of which was to double in size over the course of the project.

“At the original start date for the project (April 07), the repository held somewhere over 1,600 items. Taking this as the baseline, we have exceeded our target. However, as we delayed the official project start date toallow for staff recruitment, if we take our figure from July 07, we have fallen slightly short but will meet the target approximately 1 month post-project. As can be seen from the graph, the growth rate has been much stronger inthe latter half of the project.”

The project has not met it’s related goal of capturing 20% of the consortium’s reserch outputs but “progress has been made”
“Across the partnership, we estimate nine-ten thousand items falling within repository scope are produced per annum. Eventually, we need to be ingesting / be capable of ingesting over 200 new items each week; this excludes the “mountain” of legacy metadata and publications which could potentially be added to WRRO.”

At least 80% full text percentage:
“This target has been met. For the majority of its life, WRRO has had a high proportion of full text records (90- 95%). At the close of the project, approximately 82% of items have a local full text openly accessible copy of the research outputs; an additional 5% or so link to a full text open access works outside the repository. The proportion of metadata only records is increasing because of the addition of the University of York’s RAE data and other bulk imports. It is anticipated that the proportion of full text items will fall to 60% for a short time but that the proportion of full text will then start to recover.”


Output – IncReASe: Final Report – bulk import

Title: IncReASe: Final Report

Page: 12-13, 21

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

The project looked at importing from departmental bibliographic databases and from other departmental bibliographic collections (some of which where created explicitly for this purpose).

“It is interesting to note that the department preferred on balance to create their own local database and upload material en masse at the end of the summer. Similar suggestions have been made from time to time by other departments even though creating an additional collection system involves more work at the local level. For example, we have been asked to provide an Excel template to allow data to be collected ready for periodic bulk import into the repository.Though this approach may seem counterintuitive, local academics and administrators have suggested that, for some departments, this [local collection] may be a more sustainable method of data collection. Such solutions may be worth considering, perhaps as an interim measure, where sustained self-archiving activity is proving particularly elusive – though could prove counterproductive overall.”

It is also of note a number of departments already had their own bibliographic management tools. Some of which could export in formats that are directly importable into ePrints via plugins (DOI, EndNote, BibTex, Multiline Excel and PubMed ID). more detaisl on the use of the plugions ar available: http://eprints.whiterose.ac.uk/increase/plugins.html [page 18 notes that one difficulty with using DOI material from crossref is the lack of author data as a result “We have used CrossRef as a base source of metadata but not to enhance metadata in records already created within the repository.”]

The project notes that some of the desire to use other tools maybe be sidestepped by future developments that better integrate repository deposit into researcher’s workflows and by the introduction of research information/ management systems.

From the conlcusions
“There are likely to be personal and departmental sources of metadata suitable for bulk import at most /all HEIs. The metadata within such systems may well be inconsistent and incomplete. We found import to be more time-consuming than we hoped. A high degree of manual intervention was required: mainly to supplement incomplete metadata or add full publication details to imported “in press” items. Unless effective ways can be found to automatically check and improve bulk metadata this type of import may be a false economy and may not be the best way to grow the repository sustainably nor to embed into researchers’ workflow. An alternative approach would be to identify sources of pre-quality checked metadata – possibly from commercial sources – to create a back-catalogue of publication metadata.”

There is again a highlighted concern about alternative solutions impacting on the adoption of self-archiving.

[I think] The project’s experience that departments may opt to run their own bibliographic systems is an important reminder that there is not one solution to either archiving Open Access copies and that information in one place does not equate to information in one system.

It demonstrates the effective use of a number of plugins around the e-prints software to successfully import data.