Google Summer of Code 2008 Fedora Integration
From DSpace Wiki
Title: DSpace & Fedora integration
Student: Andrius Blažinskas
Mentor: Richard Rodgers
Contents |
[edit] Abstract
Two popular digital content repositories – DSpace and Fedora are quite different in nature and have different data models. Both of the repositories have different advantages. Integration of these two repositories would allow wider digital content dissemination and management possibilities. Utilizing repositories in a separate way, digital content must be prepared and replicated for each of them. To avoid this replication a specific driver implementation, allowing one repository to access data through another repository, must be created. It is obvious that a lot of work must be done to fully achieve desired result, so my proposal is to create a working storage driver prototype for DSpace which will allow storing, accessing and managing at least basic DSpace data in Fedora repository considering its relationships and associated policy.
[edit] Proposal
Two popular digital content repositories – DSpace and Fedora are quite different in nature and have different data models. Both of the repositories have different advantages. Integration of these two repositories would allow wider digital content dissemination and management possibilities. Utilizing repositories in a separate way, digital content must be prepared and replicated for each of them. To avoid this replication a specific driver implementation, allowing one repository to access data through another repository, must be created. It is obvious that a lot of work must be done to fully achieve desired result, so my proposal is to create a working storage driver prototype for DSpace which will allow storing, accessing and managing at least basic DSpace data in Fedora repository considering its relationships and associated policy.
[edit] Implementation details
I propose to create a driver prototype which will provide DSpace the possibility to access Fedora repository as a primary storage to store bitstreams and metadata. Driver classes will have the same method interfaces as current DSpace “org.dspace.storage” package classes and will be accessed in the same manner. Driver will communicate directly with Fedora repository using its SOAP API (API-A and API-M). To prevent software defects, all written code will be tested using JUnit. I will also provide code documentation.
Comments (RLR): the programmatic way DSpace accesses bitstreams and metadata is very different. Bitstreams are treated as opaque simple objects (although a few additional properties are required like a checksum). There is already some preliminary work on creating a clean abstraction to the underlying storage system (see http://wiki.dspace.org/index.php/PluggableStorage). I would recommend starting with this 'Bitstore' interface, since it will be incorporated into DSpace 1.6, and already supports several storage back-ends: filesystem, Storage Resource Broker, Amazon S3, and Sun's HoneyComb. The last 2 are essentially http client calls, so they already resemble using the Fedora SOAP API.
But the metadata is another story - DSpace does very little to abstract away from direct JDBC/SQL calls into a RDBMS. I think here the question
of a 'driver' is less obvious, and you might want to explore a few designs before committing a lot of work. For example: could the metadata be
placed in a bitstream and stored through the other driver? This is not a functional mapping, but would satisfy e.g. a replication scenario.
Should you attempt a high level metadata abstraction that bypasses current DSpace (but could be retrofitted into it)? Etc. I am just throwing
out thoughts to elicit additional discussion here.
[edit] Development Process
My work will involve five basic phases: 1. In-depth analysis of both repositories data models and the possibilities of mapping between them. 2. Analysis of existing DSpace low level storage principles. 3. Creation of storage driver prototype for DSpace. 4. Creation of tests. 5. Creation of documentation.
[edit] Deliverables
1. Working storage driver prototype for DSpace. 2. JUnit tests. 3. Code documentation.
[edit] Model mapping ideas
The first and most essential thing that will be done is DSpace model mapping into Fedora model. Defined mapping and created driver should allow not only storing basic data but also retaining the main infrastructure. That is, it should be able to preserve DSpace defined relationships and policy rules if possible.
To understand the possible model mapping, Fedora Object must be described. In Fedora, Fedora Object is a general entity and most of the other Fedora entities are defined on its basis. Fedora Object can have relationships with other Fedora Objects in hierarchical manner and so in some cases it can be treated both as parent and as a child. Fedora Object can contain datastreams which can be metadata or other simple files. Fedora Object relationships are indicated in special RELS-EXT datastream. As we will see, several of DSpace entities can be mapped to Fedora Object.
Possible mappings of several entities:
“Community” to “Fedora Object” It seems that Fedora does not have entity for direct mapping with DSpace Community entity, however simple Fedora Object may be somewhat suitable. Top level Fedora Object (parent) can be interpreted as a Community to which other entities (Collection, Sub-community) link.
“Collection” to “Fedora Object” In Fedora, Fedora Object is also used as a collection, so there is direct mapping for this entity.
“Item”, “Bundle” to “Fedora Object” Fedora object can easily be used.
“Bitstream” to “Datastream” The same.
“E-Person” to “User” DSpace E-Person equivalent in Fedora is simple user.
“Group” to “Role” DSpace Group equivalent in Fedora is Role.
“Policy” to “Policy datastream (XACML)” Fedora has two main types of policies: global and local. Global policies are defined for overall repository. A local policy is a special datastream called POLICY and it controls access only to particular Fedora Object. This POLICY datastream can be utilized for DSpace policy reflection in Fedora.
[edit] My experience
I have very good Java knowledge, the language in which are written both DSpace and Fedora repositories. I have worked with bibliographic and other fields experts at Kaunas University of Technology near project eLABa (http://elaba.library.lt), where Fedora is used for storage of digital documents (books, ETDs, journals etc.). I have 3 years of experience working with Fedora repository and I know Fedora concepts very good. I have made various Fedora content management tools for eLABa project internal needs, most of which utilizes Fedora APIS. My Bachelor of Science work was directly related to Fedora.
Link to Further Information: http://andriusb.library.lt/gsoc/proposal.htm
