Output – IncReASe: Final Report – self archiving rates

Title: IncReASe Final Report

Pages: 10-11

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

Summary of contents:
“It is often stated that, worldwide, the spontaneous level of self-archiving is around 10-15% (i.e. about 15% of published articles are made openly available by their authors).[Harnard (2006), Björk, B-C., Roosr, A. & Lauri, M. (2008)] We found similar levels of archiving: 16% of questionnaire respondents link to local, open copies of their work; 19% link to external copies – though often these are not openly accessible. Having said this, much of the self-archived content on web sites is working papers, reports and conference papers; the % of published journal papers spontaneously self-archived (on personal web sites or in any repository) by White Rose authors is likely to be lower than 15%. Of course, there is considerable variation between subject disciplines. This highlights the immediate potential value of open access repositories but also, perhaps, underlines the scale of the cultural change required – even after several years of institutional repository development – to engage researchers in active dissemination of their outputs.”

This provides further evidence for the percentile statistics of self-archiving. One consequence of this figure (even within a now established repository) is the challenge faced by instituions seeking to comply with funder’s deposit manadates.

Output – IncReASe: Final Report – proxy deposit

Title: Increase Final Report

page: 12

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

Summary of contents:

“Our experience to date, though, suggests authors will make the most of administrative support and that a helpful administrative framework results in higher levels of self-archiving overall. In particular, authors are responsive to well-known individuals in their departments: for example, local administrators have good success rates in persuading authors to re-send appropriate versions of their work where a non-archivable version (generally the published PDF) has been sent initially. Local administrators are well placed to “champion” and support the repository in ways that more “remote” central repository staff are not; this advantage needs to be balanced against the need to provide training and support for departmentally based administrators.”

The project also notes that encouraging this practice may hinder the promotion of self-archiving as such.

This raises an interesting question of priority – is the goal author self-archiving or increased repository content?
From the point of view of a funding body / the promotion of Open Access / institutional statistic (and REF) concerns the latter is important;
however, there are strong historical ties to author self-archiving, the author is (in some senses) the one doing the sharing, and the less self-archiving the greater organisational and financial overhead of the repository.

Either way the project’s findings support the view that the invovlement of local administrators increases depost rates (motivation).

Output – IncReASe: Final Report – web pages

Title: IncReASe: Final Report

Page: 13

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

Summary of contents:
“Analysis of individual researcher publication pages revealed a good deal of inconsistency of formatting, including within individual publication lists. The idea of “scraping” publication metadata from researcher pages is attractive, but the reality is quite challenging.”
“The Perl code written for one author could not be reused with another and would need tweaking every time.”

more detials of the issues encountered are available http://eprints.whiterose.ac.uk/increase/scraper.html

the project notes that the AIR project is investigating more sophisticated approaches to this problem (using machine-learning alogorithims) http://clg.wlv.ac.uk/projects/AIR/.

Output – IncReASe: Final Report – repository growth

Title: IncReASe: Final Report

Page: 20

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

Summary of contents:
The project had various intended outcomes. One of which was to double in size over the course of the project.

“At the original start date for the project (April 07), the repository held somewhere over 1,600 items. Taking this as the baseline, we have exceeded our target. However, as we delayed the official project start date toallow for staff recruitment, if we take our figure from July 07, we have fallen slightly short but will meet the target approximately 1 month post-project. As can be seen from the graph, the growth rate has been much stronger inthe latter half of the project.”

The project has not met it’s related goal of capturing 20% of the consortium’s reserch outputs but “progress has been made”
“Across the partnership, we estimate nine-ten thousand items falling within repository scope are produced per annum. Eventually, we need to be ingesting / be capable of ingesting over 200 new items each week; this excludes the “mountain” of legacy metadata and publications which could potentially be added to WRRO.”

At least 80% full text percentage:
“This target has been met. For the majority of its life, WRRO has had a high proportion of full text records (90- 95%). At the close of the project, approximately 82% of items have a local full text openly accessible copy of the research outputs; an additional 5% or so link to a full text open access works outside the repository. The proportion of metadata only records is increasing because of the addition of the University of York’s RAE data and other bulk imports. It is anticipated that the proportion of full text items will fall to 60% for a short time but that the proportion of full text will then start to recover.”


Output – IncReASe: Final Report – bulk import

Title: IncReASe: Final Report

Page: 12-13, 21

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

Summary of contents:
The project looked at importing from departmental bibliographic databases and from other departmental bibliographic collections (some of which where created explicitly for this purpose).

“It is interesting to note that the department preferred on balance to create their own local database and upload material en masse at the end of the summer. Similar suggestions have been made from time to time by other departments even though creating an additional collection system involves more work at the local level. For example, we have been asked to provide an Excel template to allow data to be collected ready for periodic bulk import into the repository.Though this approach may seem counterintuitive, local academics and administrators have suggested that, for some departments, this [local collection] may be a more sustainable method of data collection. Such solutions may be worth considering, perhaps as an interim measure, where sustained self-archiving activity is proving particularly elusive – though could prove counterproductive overall.”

It is also of note a number of departments already had their own bibliographic management tools. Some of which could export in formats that are directly importable into ePrints via plugins (DOI, EndNote, BibTex, Multiline Excel and PubMed ID). more detaisl on the use of the plugions ar available: http://eprints.whiterose.ac.uk/increase/plugins.html [page 18 notes that one difficulty with using DOI material from crossref is the lack of author data as a result “We have used CrossRef as a base source of metadata but not to enhance metadata in records already created within the repository.”]

The project notes that some of the desire to use other tools maybe be sidestepped by future developments that better integrate repository deposit into researcher’s workflows and by the introduction of research information/ management systems.

From the conlcusions
“There are likely to be personal and departmental sources of metadata suitable for bulk import at most /all HEIs. The metadata within such systems may well be inconsistent and incomplete. We found import to be more time-consuming than we hoped. A high degree of manual intervention was required: mainly to supplement incomplete metadata or add full publication details to imported “in press” items. Unless effective ways can be found to automatically check and improve bulk metadata this type of import may be a false economy and may not be the best way to grow the repository sustainably nor to embed into researchers’ workflow. An alternative approach would be to identify sources of pre-quality checked metadata – possibly from commercial sources – to create a back-catalogue of publication metadata.”

There is again a highlighted concern about alternative solutions impacting on the adoption of self-archiving.

[I think] The project’s experience that departments may opt to run their own bibliographic systems is an important reminder that there is not one solution to either archiving Open Access copies and that information in one place does not equate to information in one system.

It demonstrates the effective use of a number of plugins around the e-prints software to successfully import data.

Output – IncReASe: Final report – self-archiving

Title: IncReASe Final Report

Page: 11, 21-22

Date Released: 30 April 2009

URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

Summary of contents:
“Our observations suggest that conditions likely to improve self-deposit are:
(i) keeping things as simple as possible from the author’s perspective
(ii) always asking for the author’s final version of a work (… “Accepted Version” suggested by The VERSIONS project …)
(iii) facilitating capture of the work at the point of acceptance for publication. …
(iv) providing central support to monitor uploaded files and seek copyright clearance where required
(v) reminding authors to deposit: this could be a periodic reminder, or could be linked to a publication “event” such as a publication being indexed in a bibliographic database
(vi) highlighting the impact of deposit through the regular provision of usage data”

From the conclusions:
“There is probably no simple “optimum” deposit point for research outputs; however, in the short term, capturing papers at the point of acceptance for publication is probably the most realistic option. The emergence of desktop capture/deposit tools may facilitate earlier capture and assist with version control. Capturing the most appropriate version of a work continues to be an issue; all efforts should be made to inform researchers about the “accepted version” and its importance in the open access landscape. It is likely to be helpful to instil this awareness in early career researchers and PhD students by including open access / scholarly communication elements in training.”

Based on their survey work and interviews these are the project’s suggestions to support the self-archiving process; this is an ongoing challenge even with mandates; in itself it provides workflow advice and suggests what software tools are needed.

Output – IncReASe: Final Report – research management

Title: IncReASe: Final Report

Page: 14, 21-22

Date Released: 30 April 2009
URI for Output: http://eprints.whiterose.ac.uk/increase/increase_finalreportv1.pdf

Summary of contents:
All three universitiies particpating in WRRO have begun to examine Research management systems with differing results.
University of Sheffield have put their working group’s findings on hold pending more information about the REF but are investigating systems.
University of York is currently scoping a Research Information System (WRRO is likley to have a significant role)
University of Leeds has selected a system. Their RIS system “will [probably] become the primary ingest route for both metadata and full text”. As yet workflows and staffing (including any involvement of repository or library staff) for the metadata creation in this new system are unclear. As the source of metadata and primary point of contact with academic/ research staff this has the potential to greatly benefit the integration of WRRO into the publication process.

From the conlucions:
“Our discussion with researchers suggests that a comprehensive service – essentially, a publication database – is probably an easier sell than a pure “open access” repository (echoing the conclusions previously drawn by, for example, the TARIS project); its raison d’être is clearer and the possibility for providing services back to researchers in the form of full listings of research and detailed information on traffic to individual works, is increased. Currently, this is not the direction being taken by WRRO; rather, because other central services are likely to fulfil the publication database function, the emphasis remains on external dissemination of open access outputs.”
“Capturing grant and project data relevant to research outputs is likely to increase in importance; this data can help maximise the value of repository content for both research and administrative purposes.”

The RIS system at Leeds does of course also raise potential to difficulties for metadata quality.

All three institutions invovled in WRRO are actively moving towards some form of CRIS system; in all institutions this will impact significantly on the role and prominence of the repository. It is not clear how positive or negative this impact will be. What is clear is that for a institutional repositories covering the area of scholarly communicaitons CRIS systems are very likely to change what and how they operate.

Output – SAFIR: Policy – overview

Title: York Digital Library: Digital Library Policy

Pages: all
Date Released: 12 Feb 2009

URI for Output: https://vle.york.ac.uk/bbcswebdav/xid-324531_3

Summary of contents:
This document provides the policies of York Digital Library; the digital library has stated policies on:

  • Metadata
  • Code and documentation
  • Resources
  • Content [scope]
  • Submission
  • Rights
  • Preservation

York Digital Library has chosen to use Creative Commons (NC ND BY) licenses for its Metadata, documentation, and resources (where possible).


Although the sample policies are in themselves useful – covering things like takedown policy – the explicit use of Creative Commons for the work of the digital library team is worth noting both for its use and for the instituional view it indicates.

Project – From Entry to ETHOS

Project Name: From Entry to ETHOS

Short Project Name: From Entry to ETHOS

Programme Name:  Repositories and Preservation

Strand: SUE

JISC Project URIhttp://www.jisc.ac.uk/whatwedo/programmes/reppres/sue/entry.aspx

Project URIhttps://www.kcl.ac.uk/blogs/etheses/

Start Date: 1 September 2007

End Date:  31 December 2008

Governance: JISC IEE

Contact Name and Role:   Russell Burke and Catriona Cannon (Project Managers)

Brief project description:

The project will develop an e-theses repository, entirely compliant with EThOS and designed to support the easy deposit and ready exchange of data. The project will expand and enhance the EThOS Toolkit by addressing key institutional administrative questions.  It will explore in detail the entire process of submission and handling from examination entry form through to the upload of the final thesis. The project will chart the varied workflows of students, of examiners and of the Examinations Office, school offices and the Bibliographic Services team.

Name of Trawler: Mahendra Mahey

Outputs: (just link to individual output postings) as a bulleted list

  • Workflow models and workflow tools to support students, examiners and administrators through the tasks of creating, submitting, processing, describing and publishing e-theses
  • A model institutional policy for e-theses along with supporting procedures and forms
  • Guidance notes to support the implementation of an e-theses policy
  • An EThOS compliant repository that meets required standards in terms of format, structure, metadata and so on

Project – PRESERV2

Project Name: PRESERV 2

Short Project Name: PRESERV 2

Programme Name:  Repositories and Preservation

Strand:  Preservation

JISC Project URIhttp://www.jisc.ac.uk/whatwedo/programmes/preservation/preserv2.aspx

Project URI:   http://preserv.eprints.org/

Start Date:  July 2007

End Date:  Feb 2009

Governance: JISC IIE

Contact Name and Role: Steve Hitchcock (Project Manager)

Brief project description:

Preserv 2 is a JISC project investigating and developing infrastructural digital preservation services for institutional repositories. Project partners are Southampton University, The National Archives, The British Library and Oxford University.

Name of Trawler: Mahendra Mahey

Outputs: (just link to individual output postings) as a bulleted list