Talk:ItemBatchUpdate
From DSpace Wiki
Contents |
[edit] Mark Diggory
Christophe,
I'm very wary and concerned about extending DIM in any way, Ultimately, we should be working to get rid of it and replace it instead with something like an RDF ontology for DSpace and/or dublin core/mods/etc expressed appropriately in its native XML namespace.
I would like to suggest that for the insert/replace/delete functionality that you instead look to something like update to capture the logic. see:
For instance, see:
http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html http://www.xmldatabases.org/projects/XUpdate-UseCases/
Theres some basic functionality to model off of there, so for instance rather than:
<xsl:element name="dim:dim">
<xsl:element name="dim:field">
<xsl:attribute name="mdschema">dc</xsl:attribute>
<xsl:attribute name="element">identifier</xsl:attribute>
<xsl:attribute name="qualifier">pmid</xsl:attribute> (or any other field used to store an external identifer of the record)
<xsl:attribute name="type">key</xsl:attribute>
<xsl:value-of select="normalize-space(PMID)"/> (or any other element of input file)
</xsl:element>
<xsl:element name="dim:remove">
<xsl:attribute name="mdschema">dc</xsl:attribute>
<xsl:attribute name="element">contributor</xsl:attribute>
<xsl:attribute name="qualifier">author</xsl:attribute>
</xsl:element>
instead:
<xupdate:modifications version="1.0"
xmlns:xupdate="http://www.xmldb.org/xupdate">
<xupdate:append select="dim:dim" >
<xupdate:element name="dim:field">90200</xupdate:element>
<xupdate:attribute name="mdschema">dc</xupdate:attribute>
<xupdate:attribute name="element">identifier</xupdate:attribute>
<xupdate:attribute name="qualifier">pmid</xupdate:attribute>
<xupdate:attribute name="type">key</xupdate:attribute>
<xsl:value-of select="normalize-space(PMID)"/> (or any other element of input file)
</xupdate:element>
</xupdate:append>
<xupdate:remove select="dim:dim/dim:field[@mdschema='dc' and @element='contributor' and @qualifier='author']"/>
</xupdate:modifications>
For lists, rather than extending dim, I would instead use either METS or RDF to capture the listings. METS would be consistent with our usage elsewhere and more than likely reusable in the packager frameworks "replace" method. But RDF would more than likely be simpler to quickly encode:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/terms/"> <rdf:Description rdf:about="oai:dspace.mit.edu:1721.1/32719"> <dc:contributor>Joe Haldeman.</dc:contributor> <dc:title> Example 1 </dc:title> </rdf:Description> <rdf:Description rdf:about="oai:dspace.mit.edu:1721.1/32720"> <dc:contributor>Someone Else.</dc:contributor> <dc:title> Example 2 </dc:title> </rdf:Description> </rdf:RDF>
I might also just recommend using the xupdate logic because it can express a series of updates as simply separate modifications or as separate individual update|insert|append|delete instructions.
Cheers, Mark
[edit] Larry Stone
Re. the ultimate demise of DIM: DIM was created to encode the table-oriented descriptive metadata for Items for XSL-driven translations in crosswalk plugins. It's been legitimately adopted by the XML UI too, so I think if you want to get rid of it, there are a lot of cold, dead hands you'll have to pry it out of.
I agree that it makes great sense to employ a wrapper like XUpdate to encode the update instructions. You could also look at the encodings used in the WebDAV protocol.
Adding/modifying bitstreams or rights metadata is clearly out of the realm of DIM, and I agree that it is best expressed in METS. We already have some tools to interpret METS, and the more we use it the more native expertise we'll build.
Could you just create an LNI client to do the batch updating? That would have the added advantage of letting you run the client on any machine, not just the server.
Re. use of RDF, on the surface it's a good idea but XML encodings are problematic since they don't necessarily follow a schema so transformations may not work. Also, DSpace's out-of-the-box use of Dublin Core does not match [[ontology for DSpace and/or dublin core/mods/etc expressed appropriately in its native XML namespace.
I would like to suggest that for the insert/replace/delete functionality that you instead look to something like update to capture the logic. see:
For instance, see:
http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html http://www.xmldatabases.org/projects/XUpdate-UseCases/
Theres some basic functionality to model off of there, so for instance rather than:
<xsl:element name="dim:dim">
<xsl:element name="dim:field">
<xsl:attribute name="mdschema">dc</xsl:attribute>
<xsl:attribute name="element">identifier</xsl:attribute>
<xsl:attribute name="qualifier">pmid</xsl:attribute> (or any other field used to store an external identifer of the record)
<xsl:attribute name="type">key</xsl:attribute>
<xsl:value-of select="normalize-space(PMID)"/> (or any other element of input file)
</xsl:element>
<xsl:element name="dim:remove">
<xsl:attribute name="mdschema">dc</xsl:attribute>
<xsl:attribute name="element">contributor</xsl:attribute>
<xsl:attribute name="qualifier">author</xsl:attribute>
</xsl:element>
instead:
<xupdate:modifications version="1.0"
xmlns:xupdate="http://www.xmldb.org/xupdate">
<xupdate:append select="dim:dim" >
<xupdate:element name="dim:field">90200</xupdate:element>
<xupdate:attribute name="mdschema">dc</xupdate:attribute>
<xupdate:attribute name="element">identifier</xupdate:attribute>
<xupdate:attribute name="qualifier">pmid</xupdate:attribute>
<xupdate:attribute name="type">key</xupdate:attribute>
<xsl:value-of select="normalize-space(PMID)"/> (or any other element of input file)
</xupdate:element>
</xupdate:append>
<xupdate:remove select="dim:dim/dim:field[@mdschema='dc' and @element='contributor' and @qualifier='author']"/>
</xupdate:modifications>
For lists, rather than extending dim, I would instead use either METS or RDF to capture the listings. METS would be consistent with our usage elsewhere and more than likely reusable in the packager frameworks "replace" method. But RDF would more than likely be simpler to quickly encode:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/terms/"> <rdf:Description rdf:about="oai:dspace.mit.edu:1721.1/32719"> <dc:contributor>Joe Haldeman.</dc:contributor> <dc:title> Example 1 </dc:title> </rdf:Description> <rdf:Description rdf:about="oai:dspace.mit.edu:1721.1/32720"> <dc:contributor>Someone Else.</dc:contributor> <dc:title> Example 2 </dc:title> </rdf:Description> </rdf:RDF>
I might also just recommend using the xupdate logic because it can express a series of updates as simply separate modifications or as separate individual update|insert|append|delete instructions.
Cheers, Mark
[edit] Larry Stone
Re. the ultimate demise of DIM: DIM was created to encode the table-oriented descriptive metadata for Items for XSL-driven translations in crosswalk plugins. It's been legitimately adopted by the XML UI too, so I think if you want to get rid of it, there are a lot of cold, dead hands you'll have to pry it out of.
I agree that it makes great sense to employ a wrapper like XUpdate to encode the update instructions. You could also look at the encodings used in the WebDAV protocol.
Adding/modifying bitstreams or rights metadata is clearly out of the realm of DIM, and I agree that it is best expressed in METS. We already have some tools to interpret METS, and the more we use it the more native expertise we'll build.
Could you just create an LNI client to do the batch updating? That would have the added advantage of letting you run the client on any machine, not just the server.
Re. use of RDF, on the surface it's a good idea but XML encodings are problematic since they don't necessarily follow a schema so transformations may not work. Also, DSpace's out-of-the-box use of Dublin Core does not match the "official" Dublin Core namespaces already, and then some sites extend it beyond that; this is what drove me to invent DIM.
[edit] Christophe Dupriez
I do not see DSpace as a pile of protocols but as a functional structure for which "lean and mean" is the best. People need to import and to update. DIM is simple and clear. Cleaning it a bit made it just right to the task.
When one controls the source, (s)he directly produces DIM (or worse: SQL but this is another story): this what I observe from mails I receive and from what I do in new situations. I think we do have to restrain going after all new buzzwords and try to keep DSpace efficient, coherent, simple to learn and to extent when necessary. I do hope this function be added to DSpace as a simple extension of existing ItemImport and not as a supplementary layer...
For the problem of maintenance functions (DSRUN) conflicting with Tomcat threads, I do agree that it will have to be tackled nicely one day or another. If somebody has a nice proposal "mean and lean" for this, I would work on it with pleasure!
This being said, ounce simple import/update is there, if somebody wants to reimplement SolR or similar structures over DSpace, if it asked for by real users, why not!
