Repositories Support Project

Repositories and Preservation


October 2006

March 2009

Bill Hubbard (Project Manager)

The Repository Support Project (RSP) is a 2.5 year project to co-ordinate and deliver good practice and practical advice to English and Welsh HEIs to enable the implementation, management and development of digital institutional repositories.

CLAReT – Personas


The output contains persona and corresponding scenarios for six end users of a system managing digital learning materials in the context of language learning. The personae were used in the context of roleplay in workshops to develop and test the concept/ domain map of language learning in HE.

The personae provide a useful reference for system desginers in this domain.

CLAReT – Concept map

CLAReT Language Learning and Teaching Concept Map

This is a nested concept map for the domain of language learning which supports structured resource browsing. There are 10 core concepts each typically expanding down two or three levels, and each level typically has four nodes. Each concept can retrieve all the resources from its sub-nodes.

This offers an alternative browse structure and innovative user interface to support the discovery of unknown content. It is not known how transferable or widely available this interface is.

CLAReT

CLAReT (Contextualised Learning Repository Tools)

Repositories and Preservation Programme

1 Oct 2006
31 Oct 2007

Dr. Yvonne Howard, Project Manager

“CLARET will develop a  prototype web service that can enable visual exploration and the use of social bookmarking of contextual metadata in a repository context.
The CLARET project will also explore the relationship between Learning Object Contextual metadata and the teaching and learning context as defined by working with practitioners. ”

Please note this project led to the subsequent FAROES project

MetaTools – Final Report

MetaTools – Final Report

98 pages

Automatic metadata generation has sometimes been posited as a solution to the ‘metadata bottleneck’ that repositories and portals are facing as they struggle to provide resource discovery metadata for a rapidly growing number of new digital resources. Unfortunately there is no registry or trusted body of documentation that rates the quality of metadata generation tools or identifies the most effective tool(s) for any given task. The aim of the first stage of the project was to remedy this situation by developing a framework for evaluating tools used for the purpose of generating Dublin Core metadata. A range of intrinsic and extrinsic metrics (standard tests or measurements) that capture the attributes of good metadata from various perspectives were identified from the research literature and evaluated in a report. A test program was then implemented using metrics from the framework. It evaluated the quality of metadata generated from 1) Web pages (html) and 2) scholarly works (pdf) by four of the more widely-known metadata generation tools – Data Fountains, DC-dot, SamgI, and the Yahoo! Term Extractor. The intention was also to test PaperBase, a prototype for generating metadata for scholarly works, but its developers ultimately preferred to conduct tests in-house. Some interesting comparisons with their results were nonetheless possible and were included in the stage 2 report. It was found that the output from Data Fountains was generally superior to that of the other tools that the project tested. But the output from all of the tools was considered to be disappointing and markedly inferior to the quality of metadata that Tonkin and Muller report that PaperBase has extracted from scholarly works. Over all, the prospects for generating high-quality metadata for scholarly works appear to be brighter because of their more predictable layout. It is suggested JISC should particularly encourage research into auto-generation methods that exploit the structural and syntactic features of scholarly works in pdf format, as exemplified by PaperBase, and strongly consider funding the development of tools in this direction. In the third stage of the project SOAP and RESTful Web Service interfaces were developed for three metadata generation tools – Data Fountains, SamgI and Kea. This had a dual purpose. Firstly, the creation of an optimal metadata record usually requires the merging of output from several tools each of which, until now, had to be invoked separately because of the ad hoc nature of their interfaces. As Web services, they will be available for use in a network such as the Web with well-defined interfaces that are implementation-independent. These services will be exposed for use by clients without them having to be concerned with how the service will execute their requests. Repositories should be able to plug them into their own cataloguing environments and experiment with automatic metadata generation under more ‘real-life’ circumstances than hitherto. Secondly, and more importantly (in view of the relatively poor quality of current tools) they enabled the project to experiment with the use of a high-level ontology for describing metadata generation tools. The value of an ontology being used in this way should be felt as higher quality tools (such as PaperBase?) emerge. The high-level ontology is part of a MetaTools system architecture that consists of various components to describe, register and discover services. Low level definitions within a service ontology are mapped to higher-level human-understandable semantic descriptions contained within a MetaTools ontology. A user interface enables service providers register their service in a public registry. This registry is used by consumers to find services that match certain criteria. If the registry has such a service, it provides the consumer with a contract and an endpoint address for that service. The terms in the MetaTools ontology can, in turn, be part of a higher-level ontology that describes the preservation domain as a whole. The team believes that an ontology-aided approach to service discovery, as employed by the MetaTools project, is a practical solution. A stage 3 technical report was also written.

An ontology methodology and CISP (Core Information about Scientific Papers)

An ontology methodology and CISP (Core Information about Scientific Papers)

26 pages

This report contains details about CISP, the results from the online survey as well as the benefits of assuming an ontology methodology when producing meta-data.

  • To introduce a new formalism for the description of scientific papers CISP (the Core Information about Scientific Papers);
  • Attract more attention to ontologies as a valuable methodology for developing metadata.

The report demonstrates the  advantages of an ontology methodology for developing metadata by applying it to the analysis of the Dublin Core metadata (DC). An ontology approach allows detecting potential weaknesses in the representation of the DC terms. Such weaknesses include overlap in the semantic meaning between the terms, logically incoherent representation of temporal and spatial relations as well as incoherence in the representation of content. An ontology can also suggest improvements to the DC.
The report describes an ontology methodology to construct CISP metadata about the content of papers. It makes use of an ontology of experiments EXPO proposed at the University of Wales, Aberystwyth as a core ontology, and DOLCE (a Descriptive Ontology for Linguistic and Cognitive Engineering) developed at the Laboratory for Applied Ontology, the Institute of Cognitive Science and Technology, Italy as an upper level ontology.
CISP is a defined set of leaf classes from these ontologies. It includes such key classes as <Goal of investigation>, <Object of investigation>, <Research method>, <Result>, <Conclusion>.

CISP can be used to generate abstracts and summaries of papers and also to facilitate storage and retrieval of information. CISP will constitute the basis for the ART tool. The latter is an authoring tool for the semantic annotation of papers stored in digital repositories. ART is intended for the semi-automatic annotation of data and metadata describing the scientific investigation represented in a research paper. ART will also be able to aid in the expression of research results directly in both a human and machine readable format, through the composition of text using ontology-based templates and stored typical key phrases. .
To find out more about ontology methodology refer to chapters 2 and 3 .
To learn about the proposed CISP metadata you can start reading from chapter 4 onwards.

Semantic Annotation of Papers: Interface & ENrichment Tool (SAPIENT)

Semantic Annotation of Papers: Interface & ENrichment Tool (SAPIENT)

The first release of SAPIENT, the ART Tool for the annotation of general scientific papers has been circulated to annotators.