The other major addition that we propose for SWORD is the content which lies under the manifest link. The following information may be of use to rich deposit clients to enhance the user experience:
Most repositories, at a high level, have 3 states that items are likely to be in (based on analysis of DSpace, EPrints and Fedora software):
- In preparation: this typically lasts a short while, and is the state that new repository items begin life in. During this stage, files and metadata are being added to the item, and it is generally only worked on by a single user. Repository administrators do not usually have access directly to the item.
- Under review: once a repository item is fully prepared it is usually injected into some form of review process. Repository managers carry out general tasks like metadata verification and copyright compliance, or tasks more specific to their environment or organisation such as appropriateness for the archive. Within the review stage, there is no clear common workflow, as usages for repositories and for SWORD are sufficiently broad as to encompass many different workflows.
- Archived: once review completes the repository item is archived. In some cases this may mean that the item has been made public under a stable identifier, while in others it may mean that the item has gone into the dark archive for preservation. Purposes for and consequences of archiving are not covered here.
- Deleted: the repository item was at one point in the Archive, but has been removed
- Rejected: the repository item was rejected from the repository before reaching the Archive
It is suggested that we introduce a short ontology to cover the above states, and for the serialisation of this in the manifest document to be extensible such that additional states can easily be added, either by the SWORD standard at a later date or by specific implementation for local needs.
It is also desirable to offer the repository the opportunity to describe what each of those states means to them; this would be similar in ethos to the sword:treatment element, which allows the repository to describe what processes have actually happened to the package during deposit [SWORD spec section B.9.8], except that it is giving details on what is currently happening to the deposit. With each of the states represented by a URI this would also allow for the addition of non-standard states which are still comprehendible by the client.
With the ability of the repository to describe these states it would be easy for SWORD clients to allow depositors to keep track of their submissions, giving them feedback on their progress, rather than being subject to the current fire-and-forget approach. The potential for creating rich clients, and for making users feel invested in the process of deposit would be increased.
- An assertion to the state of the item (In preparation, under review, archived, etc.);
- An identifier which resolves to a description of the meaning of that state in the repository.
Furthermore, since this paper is intended to generate discourse around the direction of the standard, we are also looking for common and generic states which are useful to systems which are not necessarily repositories.
Many repositories are likely to unpackage the incoming content and import it into their native format. This typically includes:
- A top-level object: this is effectively the equivalent of the package – it is the container within which all the content is held;
- Associated files: the actual content of the item, as extracted as individual files from the package. This may also include additional files created by the repository, such as format conversions, thumbnails, etc;
- Metadata: anything from structural, administrative and bibliographic data about the item, provided in the package as metadata files or within the manifest itself, depending on package format. As with the files, this may contain metadata that was not in the original package;
It may be of interest to clients to be able to display this information to their users, for the purposes of creating a rich deposit environment. It is certainly possible to include identifiers for the individual files which would allow further operations to be performed on them; for example HTTP DELETE could be provided to allow the client to subsequently delete individual files, or HTTP PUT could be used to replace one file with another.
Including the metadata in the description of the item would be a significant challenge, and is therefore not addressed here. Future work in this area could consider operations similar to WebDAVs PROPFIND and PROPPATCH operations as a starting point [17], or by performing HTTP PUT to the Entry document for the repository item. It is unclear how useful or successful these approaches would be.
It is the recommendation of this paper, therefore, that the basic structural information concerning the unpackaged item is made available in some common serialisation, and that this explicitly excludes any treatment of the actual metadata content. Instead we would focus simply on the content files.
This fine grain CRUD access to individual things in an item is of interest to me. However again I would recommend the use of content negotiation on the URI returned in the receipt to this time request an ORE-REM. This can then describe each item in the package including name,URI,mime,MD5,format etc. The client could then interact directly according to the structure and URIs in the REM.
Again not much which is SWORD (simple deposit remember) related and each repository is likely to have a REM which looks different and exposes different amounts of data.
So I’m not sure here if SWORD should even be involved in this as it sounds to me like it would be something to bloat SWORD.
I would remove state from the SWORD receipt as it is going to change, thus the item is going to change! I would however make it part of the metadata you can get back when going a GET (with an ACCEPT header) on the URI returned in the receipt, thus the client knows how to poll for status from the word go. This avoids badly implemeted clients and inconsistency in known data.
The receipt should just be for static info, like the URI, as you can’t interact with a receipt.
From paragraph 7: “It is suggested that we introduce a short ontology to cover the above states”. Errr no
All the status should be returned in one of the repository serialisations of the object (through a GET on the receipted URI with an ACCEPT header) and the status itself should be a URI either inside or outside the repository. This way the repository can define as many states as it likes
e.g.
and then the client can go the this URI with an ACCPET header and get back something about that status, e.g. what stage number it is, how long things usually stay there etc etc.
In an ontology is defined, it should just vbe published openly on the web such that repositories “could” use it. Doesn’t mean that have to. So i see this as separate from the SWORD spec. SWORD should make the recommendation that a client can GET something which tells them the current status (in some form) of the item. It could recommend this form
Glen raises a good point
yes, absolutely. I think that the way this can be achieved is by requesting the atom entry document again at a later date: the edit-media url in the deposit response links to that document, and it should be retrievable at a later date.
I see this idea of a manifest as “key” to what has been missing from SWORD (although I dislike the name “manifest” as well — not sure of a better term).
To go back to the Word / Filesystem interaction analogy that Tim Brody gives. With SWORD 1.3 all you are guaranteed is a basic confirmation that your object has been saved. This is akin to Word saying: “Yep, it’s saved somewhere. Here’s an ID for you, and you can get it back if you keep this receipt. But I won’t necessarily tell you where I saved it, or if anyone else has changed it since then.” I see this idea of a “manifest” as providing that extra level of interaction. Now you can tell if the item is truly archived (and not just in a processing/approval state), and you have a way to query the repository similar to how Word queries a filesystem for basic info (like filesystem metadata, last modified date, etc).
Referencing back to the “splash page” link concept in the ‘Identifiers’ section. There should be a way to ask the SWORD Server / Repository for the “splash page” information at a later time in this Manifest. So, if an object is received by the repository, but is still in the “under review” state it may not yet have a “splash page”. However, once it is “archived”, hopefully there would be a “splash page” location where a SWORD client could send users (so that they can see the archived representation of their item in the repository).
I don’t see how this is system design for a new repository, sorry. This is just giving the repository a way to tell the client what happened to the item after you put it in. You can do that routinely through the native web ui’s of repositories, right? “Items I submitted”, and so on.
So, this is derived from observing how repositories actually work, and also from a perceived need for the client to create a sufficiently rich remote deposit environment.
Definitely not attempting to create a front end for a repository.
To borrow your analogue: you might not use Word to manage your file system, but you do sometimes “save as…”.
What you’re talking about is creating a system design for a new repository – something akin to a front-end to Fedora.
To give you an analogy, I don’t manage my file system through Microsoft Word. When I want to backup my files or copy them somewhere else I use a file manager. When I want to manage my publications online I use the Repository Web interface.
The only bearing on SWORD is how the user accesses the item post-deposit. To my mind that will either be a clickable link or via email feedback. If you’re using a repository broker the URL will point to it’s control page, otherwise to the repository’s item control page.
I think that’s an implementation consideration, not one for the standard.
I think that the choice of “manifest” for this document was a mistake. This is not intended to specify that a repository should be able to generically provide for supplying resource maps for its holdings (which is, as Tim says, not part of deposit).
It is, instead, a document which the deposit client can retrieve which will give it sufficient information to feed-back to the user, and to support update and delete operations. Without closing this loop, then it won’t be possible to give client systems and end users the facilities to actively engage in deposit, and we’ll remain in the fire-and-forget model that sword currently uses.
Suggestions for an alternative name for the “manifest” which would get rid of this confusion would be welcome …
No, this is something different and should not be part of a deposit spec.
This is the big acheivement of this project.
How does the repository know when a deposit has moved from preparation to under review? For example if a user was depositing a collection of images how would the repository know when the depost was ready for review? Shouldn’t the change in state from ‘in preparation’ to ‘under review’ be triggered by the client?
An interesting point – In our application of SWORD, the client is not depositing directly into the repository, but offering a deposit package for our ingest workflow to pick up and process. In the end, a (possibly enhanced and transformed) deposit package only enters the repository as the last step if all subsequent steps are successful. The client is generally not involved in this process other than monitoring the status of a deposit, and perhaps allowing the opportunity for fixing & re-submitting a package if it fails.
I am a little concerned about the convergence of structural and temporal components in the manifest. It does make sense to make available a description of the “stuff” in the repository that resulted from a deposit. It also makes sense to represent deposit state, but I am not convinced the two should be mixed.
Indeed, I don’t necessarily believe that “state” as a single value is sufficient. Imagine, for example, “state” being represented as an atom feed of “events – each event either representing s state change as is relevant to the sword ontology, or some other (application specific) piece of information. That would allow the client to determine (1) the current, latest state (2) when this occurred (3) the history of state changes (did a step take particularly long? was a step revisited multiple times, such as “in review”?), and (4) additional workflow information that is not represented as a sword state change and can be safely skipped, but conveys additional information that may be application-specific.
In this view, perhaps “state” document/feed could be represented as a linked resource in and of itself?
Manifest was just the name we quickly came up with for this document, so perhaps it would be better to call it “description”?
In the ‘under review’ state, is the user still allowed to touch the submission, or is it assumed that it is locked at this point?
This seems sensible to me.
For ADMIRAL, we are taking a simple approach of including additional metadata as additional data files within the package, which can be referenced as such by the manifest. As well has simplicity, this has the advantage of sidestepping angels-on-pinheads style discussions about whether certain data is metadata or something else.
In the ADMIRAL project, which is implementing something similar to this, we have had some discussion about whether the packaged files (zip file, or whatever) should appear *within* the top-level repository object, or as a separate entity. I favour the former, but would be interested tom hear if there are alternate views.
How well does this fit with a web “follow-your-nose” approach? It sounds as if it should, but in the web in general the following would be performed simply by HTTP GETs. Does that apply here?
Is the _state_ of an item meaningfully part of its _manifest_? I interpret manifest to mean a list of the item’s contents.
Again, reference the Atom link type registry?