“It is often stated that, worldwide, the spontaneous level of self-archiving is around 10-15% (i.e. about 15% of published articles are made openly available by their authors).[Harnard (2006), Björk, B-C., Roosr, A. & Lauri, M. (2008)] We found similar levels of archiving: 16% of questionnaire respondents link to local, open copies of their work; 19% link to external copies – though often these are not openly accessible. Having said this, much of the self-archived content on web sites is working papers, reports and conference papers; the % of published journal papers spontaneously self-archived (on personal web sites or in any repository) by White Rose authors is likely to be lower than 15%. Of course, there is considerable variation between subject disciplines. This highlights the immediate potential value of open access repositories but also, perhaps, underlines the scale of the cultural change required – even after several years of institutional repository development – to engage researchers in active dissemination of their outputs.”

This provides further evidence for the percentile statistics of self-archiving. One consequence of this figure (even within a now established repository) is the challenge faced by instituions seeking to comply with funder’s deposit manadates.

“Analysis of individual researcher publication pages revealed a good deal of inconsistency of formatting, including within individual publication lists. The idea of “scraping” publication metadata from researcher pages is attractive, but the reality is quite challenging.”
“The Perl code written for one author could not be reused with another and would need tweaking every time.”

more detials of the issues encountered are available http://eprints.whiterose.ac.uk/increase/scraper.html

the project notes that the AIR project is investigating more sophisticated approaches to this problem (using machine-learning alogorithims) http://clg.wlv.ac.uk/projects/AIR/.

The project had various intended outcomes. One of which was to double in size over the course of the project.

“At the original start date for the project (April 07), the repository held somewhere over 1,600 items. Taking this as the baseline, we have exceeded our target. However, as we delayed the official project start date toallow for staff recruitment, if we take our figure from July 07, we have fallen slightly short but will meet the target approximately 1 month post-project. As can be seen from the graph, the growth rate has been much stronger inthe latter half of the project.”

The project has not met it’s related goal of capturing 20% of the consortium’s reserch outputs but “progress has been made”
“Across the partnership, we estimate nine-ten thousand items falling within repository scope are produced per annum. Eventually, we need to be ingesting / be capable of ingesting over 200 new items each week; this excludes the “mountain” of legacy metadata and publications which could potentially be added to WRRO.”

At least 80% full text percentage:
“This target has been met. For the majority of its life, WRRO has had a high proportion of full text records (90- 95%). At the close of the project, approximately 82% of items have a local full text openly accessible copy of the research outputs; an additional 5% or so link to a full text open access works outside the repository. The proportion of metadata only records is increasing because of the addition of the University of York’s RAE data and other bulk imports. It is anticipated that the proportion of full text items will fall to 60% for a short time but that the proportion of full text will then start to recover.”


The project looked at importing from departmental bibliographic databases and from other departmental bibliographic collections (some of which where created explicitly for this purpose).

“It is interesting to note that the department preferred on balance to create their own local database and upload material en masse at the end of the summer. Similar suggestions have been made from time to time by other departments even though creating an additional collection system involves more work at the local level. For example, we have been asked to provide an Excel template to allow data to be collected ready for periodic bulk import into the repository.Though this approach may seem counterintuitive, local academics and administrators have suggested that, for some departments, this [local collection] may be a more sustainable method of data collection. Such solutions may be worth considering, perhaps as an interim measure, where sustained self-archiving activity is proving particularly elusive – though could prove counterproductive overall.”

It is also of note a number of departments already had their own bibliographic management tools. Some of which could export in formats that are directly importable into ePrints via plugins (DOI, EndNote, BibTex, Multiline Excel and PubMed ID). more detaisl on the use of the plugions ar available: http://eprints.whiterose.ac.uk/increase/plugins.html [page 18 notes that one difficulty with using DOI material from crossref is the lack of author data as a result “We have used CrossRef as a base source of metadata but not to enhance metadata in records already created within the repository.”]

The project notes that some of the desire to use other tools maybe be sidestepped by future developments that better integrate repository deposit into researcher’s workflows and by the introduction of research information/ management systems.

From the conlcusions
“There are likely to be personal and departmental sources of metadata suitable for bulk import at most /all HEIs. The metadata within such systems may well be inconsistent and incomplete. We found import to be more time-consuming than we hoped. A high degree of manual intervention was required: mainly to supplement incomplete metadata or add full publication details to imported “in press” items. Unless effective ways can be found to automatically check and improve bulk metadata this type of import may be a false economy and may not be the best way to grow the repository sustainably nor to embed into researchers’ workflow. An alternative approach would be to identify sources of pre-quality checked metadata – possibly from commercial sources – to create a back-catalogue of publication metadata.”

There is again a highlighted concern about alternative solutions impacting on the adoption of self-archiving.

[I think] The project’s experience that departments may opt to run their own bibliographic systems is an important reminder that there is not one solution to either archiving Open Access copies and that information in one place does not equate to information in one system.

It demonstrates the effective use of a number of plugins around the e-prints software to successfully import data.

“Our observations suggest that conditions likely to improve self-deposit are:
(i) keeping things as simple as possible from the author’s perspective
(ii) always asking for the author’s final version of a work (… “Accepted Version” suggested by The VERSIONS project …)
(iii) facilitating capture of the work at the point of acceptance for publication. …
(iv) providing central support to monitor uploaded files and seek copyright clearance where required
(v) reminding authors to deposit: this could be a periodic reminder, or could be linked to a publication “event” such as a publication being indexed in a bibliographic database
(vi) highlighting the impact of deposit through the regular provision of usage data”

From the conclusions:
“There is probably no simple “optimum” deposit point for research outputs; however, in the short term, capturing papers at the point of acceptance for publication is probably the most realistic option. The emergence of desktop capture/deposit tools may facilitate earlier capture and assist with version control. Capturing the most appropriate version of a work continues to be an issue; all efforts should be made to inform researchers about the “accepted version” and its importance in the open access landscape. It is likely to be helpful to instil this awareness in early career researchers and PhD students by including open access / scholarly communication elements in training.”

Based on their survey work and interviews these are the project’s suggestions to support the self-archiving process; this is an ongoing challenge even with mandates; in itself it provides workflow advice and suggests what software tools are needed.

All three universitiies particpating in WRRO have begun to examine Research management systems with differing results.
University of Sheffield have put their working group’s findings on hold pending more information about the REF but are investigating systems.
University of York is currently scoping a Research Information System (WRRO is likley to have a significant role)
University of Leeds has selected a system. Their RIS system “will [probably] become the primary ingest route for both metadata and full text”. As yet workflows and staffing (including any involvement of repository or library staff) for the metadata creation in this new system are unclear. As the source of metadata and primary point of contact with academic/ research staff this has the potential to greatly benefit the integration of WRRO into the publication process.

From the conlucions:
“Our discussion with researchers suggests that a comprehensive service – essentially, a publication database – is probably an easier sell than a pure “open access” repository (echoing the conclusions previously drawn by, for example, the TARIS project); its raison d’être is clearer and the possibility for providing services back to researchers in the form of full listings of research and detailed information on traffic to individual works, is increased. Currently, this is not the direction being taken by WRRO; rather, because other central services are likely to fulfil the publication database function, the emphasis remains on external dissemination of open access outputs.”
“Capturing grant and project data relevant to research outputs is likely to increase in importance; this data can help maximise the value of repository content for both research and administrative purposes.”

The RIS system at Leeds does of course also raise potential to difficulties for metadata quality.

All three institutions invovled in WRRO are actively moving towards some form of CRIS system; in all institutions this will impact significantly on the role and prominence of the repository. It is not clear how positive or negative this impact will be. What is clear is that for a institutional repositories covering the area of scholarly communicaitons CRIS systems are very likely to change what and how they operate.

  • Single management/search interface across Forced Migration Online (FMO)
  • Interoperability between the FMO repository and other institutional repositories and search services
  • Potential to make FMO’s grey literature collection available via the University’s online Library Catalogue
  • Open source management/search software built on Fedora