DynamicBrowsePrototype

From DSpace Wiki

Jump to: navigation, search

An experiment by: Richard Jones.

This is an experiment in live documentation of the design of a new system during the development process. The design for the system is already on paper, but will inevitably change throughout the course of the work, which should be limited to just a few days.

The objective is to develop the browse system to be more flexible and to scale to a much greater degree. To this end, the starting point is my initial Browse patch that was placed on the SF patch tracker some months back. The specific scalability problems that are to be addressed are:

  • pagination for second level browse pages (e.g. all items by a specific author)
  • faster load times for browse pages
  • improved SQL and SQL generation
  • improved indexing

Contents

[edit] Configuration

The current browse configuration in the dynamic browse looks like this:

   webui.browse.index.1 = dateissued:dc.date.issued:date:full
   webui.browse.index.2 = author:dc.contributor.*:text:single
   webui.browse.index.3 = title:dc.title:title:full
   webui.browse.index.4 = subject:dc.subject.*:text:single
   webui.browse.index.5 = dateaccessioned:dc.date.accessioned:date:full
   
   # and some extra examples ones
   # webui.browse.index.6 = type:dc.type:text:single
   # webui.browse.index.7 = itemstatus:icadmin.status:text:single

it is therefore of the form:

   webui.browse.index.<n> = <index name>:<metadata field>:<data type>:<browse type>

The proposed addition is the following:

   webui.browse.sort-option.<x> = <sort name>:<metadata field>:<data type>

This will allow the browse index mechanism to generate tables which contain enough information to sort results without having to instantiate any Item objects at all. This will make browsing faster and enable browses such as "all items by the author XXX" to be paginated successfully.

Therefore, new configuration will look like this (which will have the same effect as the current default):

   # Set the options for what can be sorted by
   webui.browse.sort-option.1 = title:dc.title:text
   webui.browse.sort-option.2 = date:dc.date.issued:date

[edit] Data Model

The next stage is to rewrite the BrowseIndex class to be able to pick up this configuration for each of the indexes mentioned above. It will then create a number of Browse tables (when necessary):

  • index_<n>_seq: a SEQUENCE for use by the index_<n> table
  • index_<n>: indexed on the "value" column
  • collection_index_<n>: a VIEW on the index_<n> table with collection2item.item_id = index_<n>.item_id
  • community_index_<n>: a VIEW on the index_<n> table with community2item.item_id = index_<n>.item_id
  • index_<n>_value_index: the INDEX on index_<n> value

QUESTION: do we need to create INDEXes on all the other columns in every table? We may need to be open to the possibility.

This is similar to the existing browse and the dynamic browse patch, but the critical difference is in the table structure. index_<n> will be structured thus:

   item_id: int not null FK     // the item id
   value: text                  // the text value of the core browse value
   sort_value: text             // the normalised text value of the core browse value
   sort_<x>                     // x number of columns corresponding to the normalised value of the sort-options defined above

[edit] Progress Update: 21-11-2006, 13:40 GMT

The initial code to build the index tables from the configuration has been written (or, more accurately, adapted from the previous browse patch). I have successfully generated 7 indices which make allowances for sorting by title and by date issued. This has touched the following existing files: dspace.cfg, BrowseIndex.java and resulted in the creation of the following files: IndexBrowse.java, SortOption.java. Note that the IndexBrowse.java file is one I had previously written locally to alleviate some performance issues with the default indexer, and therefore contains a lot of the scalability improvements necessary for the indexing side of things. More on that later.

[edit] The Indexing Process

The next challenge is to get the data in the database actually indexed into the new tables. To do this we are moving the indexing process from Browse to IndexBrowse, and will be replacing calls to Item with calls to an ultra-lightweight Item like object called BrowseItem, whose only task will be to obtain metadata from the item tables.

The process then is as follows

  • obtain all the BrowseItem objects
  • obtain all the BrowseIndex objects (which will, in turn, contain the SortOption objects)
  • for each BrowseItem object
    • for each BrowseIndex object
      • delete the existing index data for the BrowseItem
      • get the "value" metadata from the BrowseItem
      • for each SortOption object
        • get the "sort" metadata from the BrowseItem
      • for each metadata value (primary index)
        • write a line into the database with the index value and sort option values (normalised)
    • commit the transaction
  • for each BrowseIndex object
    • delete all item ids that are in the index table but not the item table
   NOTE: a potential problem arises.  The "sort" fields need to be singular, while the "value" field can be multiple.  That is, an item 
   may have more than one author as the value to browse on, but may not have more than one title as the value to sort on.  In the cases
   where an item has more than one value in the sort metadata, the code will select only the first value that is returned.  This is a 
   caveat that people configuring their system will need to be aware of

[edit] Progress Update: 21-11-2006, 14:50 GMT

Against all belief, the indexing code appears to already be working. It as much as 0.1 seconds slower per item in a small database, which may need to be addressed (see TODO below). This code touches the following files: IndexBrowse.java, SortOption.java, BrowseException.java. No new files were necessary. I have included below two screen dumps of my test database, as an example as to what I am currently seeing in the indices:

   # select * from index_2;
    id | item_id |     value      |   sort_value   |        sort_2        |  sort_1
   ----+---------+----------------+----------------+----------------------+-----------
     1 |       1 | Jones, Richard | jones, richard | 2006-11-16t17:08:11z | submit 1
     2 |       2 | Jones, Richard | jones, richard | 2006-11-16t17:08:42z | submit 2
     3 |       3 | Jones, Richard | jones, richard | 2006-11-16t17:09:05z | submit 3
     4 |       4 | Jones, Richard | jones, richard | 2006-11-16t17:09:26z | submit 4
     5 |       5 | Jones, Richard | jones, richard | 2006-11-16t17:09:52z | submit 5
     6 |       6 | Jones, Richard | jones, richard | 2006-11-16t17:10:18z | submit 6
     7 |       7 | Jones, Richard | jones, richard | 2006-11-16t17:10:43z | submit 7
     8 |       8 | Jones, Richard | jones, richard | 2006-11-16t17:11:08z | submit 8
     9 |       9 | Jones, Richard | jones, richard | 2006-11-16t17:11:30z | submit 9
    10 |      10 | Jones, Richard | jones, richard | 2006-11-16t17:11:56z | submit 10
    11 |      11 | Jones, Richard | jones, richard | 2006-11-16t17:12:19z | submit 11
    12 |      12 | Jones, Richard | jones, richard | 2006-11-16t17:12:45z | submit 12
    13 |      13 | Jones, Richard | jones, richard | 2006-11-16t17:13:09z | submit 13
    14 |      14 | Jones, Richard | jones, richard | 2006-11-16t17:13:31z | submit 14
    15 |      15 | Jones, Richard | jones, richard | 2006-11-16t17:13:52z | submit 15
    16 |      16 | Jones, Richard | jones, richard | 2006-11-16t17:14:16z | submit 16
    17 |      17 | Jones, Richard | jones, richard | 2006-11-16t17:14:37z | submit 17
    18 |      18 | Jones, Richard | jones, richard | 2006-11-16t17:14:58z | submit 18
    19 |      19 | Jones, Richard | jones, richard | 2006-11-16t17:15:19z | submit 19
    20 |      20 | Jones, Richard | jones, richard | 2006-11-16t17:15:42z | submit 20
    21 |      21 | Jones, Richard | jones, richard | 2006-11-16t17:16:02z | submit 21
    22 |      22 | Jones, Richard | jones, richard | 2006-11-16t17:17:24z | submit 22
   (22 rows)
   # select * from index_1;
    id | item_id |        value         |      sort_value      |        sort_2        |  sort_1
   ----+---------+----------------------+----------------------+----------------------+-----------
     1 |       1 | 2006-11-16T17:08:11Z | 2006-11-16t17:08:11z | 2006-11-16t17:08:11z | submit 1
     2 |       2 | 2006-11-16T17:08:42Z | 2006-11-16t17:08:42z | 2006-11-16t17:08:42z | submit 2
     3 |       3 | 2006-11-16T17:09:05Z | 2006-11-16t17:09:05z | 2006-11-16t17:09:05z | submit 3
     4 |       4 | 2006-11-16T17:09:26Z | 2006-11-16t17:09:26z | 2006-11-16t17:09:26z | submit 4
     5 |       5 | 2006-11-16T17:09:52Z | 2006-11-16t17:09:52z | 2006-11-16t17:09:52z | submit 5
     6 |       6 | 2006-11-16T17:10:18Z | 2006-11-16t17:10:18z | 2006-11-16t17:10:18z | submit 6
     7 |       7 | 2006-11-16T17:10:43Z | 2006-11-16t17:10:43z | 2006-11-16t17:10:43z | submit 7
     8 |       8 | 2006-11-16T17:11:08Z | 2006-11-16t17:11:08z | 2006-11-16t17:11:08z | submit 8
     9 |       9 | 2006-11-16T17:11:30Z | 2006-11-16t17:11:30z | 2006-11-16t17:11:30z | submit 9
    10 |      10 | 2006-11-16T17:11:56Z | 2006-11-16t17:11:56z | 2006-11-16t17:11:56z | submit 10
    11 |      11 | 2006-11-16T17:12:19Z | 2006-11-16t17:12:19z | 2006-11-16t17:12:19z | submit 11
    12 |      12 | 2006-11-16T17:12:45Z | 2006-11-16t17:12:45z | 2006-11-16t17:12:45z | submit 12
    13 |      13 | 2006-11-16T17:13:09Z | 2006-11-16t17:13:09z | 2006-11-16t17:13:09z | submit 13
    14 |      14 | 2006-11-16T17:13:31Z | 2006-11-16t17:13:31z | 2006-11-16t17:13:31z | submit 14
    15 |      15 | 2006-11-16T17:13:52Z | 2006-11-16t17:13:52z | 2006-11-16t17:13:52z | submit 15
    16 |      16 | 2006-11-16T17:14:16Z | 2006-11-16t17:14:16z | 2006-11-16t17:14:16z | submit 16
    17 |      17 | 2006-11-16T17:14:37Z | 2006-11-16t17:14:37z | 2006-11-16t17:14:37z | submit 17
    18 |      18 | 2006-11-16T17:14:58Z | 2006-11-16t17:14:58z | 2006-11-16t17:14:58z | submit 18
    19 |      19 | 2006-11-16T17:15:19Z | 2006-11-16t17:15:19z | 2006-11-16t17:15:19z | submit 19
    20 |      20 | 2006-11-16T17:15:42Z | 2006-11-16t17:15:42z | 2006-11-16t17:15:42z | submit 20
    21 |      21 | 2006-11-16T17:16:02Z | 2006-11-16t17:16:02z | 2006-11-16t17:16:02z | submit 21
    22 |      22 | 2006-11-16T17:17:24Z | 2006-11-16t17:17:24z | 2006-11-16t17:17:24z | submit 22
   (22 rows)

TODO: the scale problem is at least in part because the sort values are obtained for each item for each browse index, which is an unnecessary amount of work. Refactoring should sort this out, but it remains as-is for the moment because it slipped easily into existing code

[edit] Browse Servlet and User Interface

Although we're not quite ready to start putting the UI together, it is now time to specify exactly what we want out of the UI interaction with the Servlet, because this gives us our window into the Browse engine itself. Therefore, I will write a primitive implementation of the BrowseServlet to deal directly with the Browse engine, and to initial just deposit useful debug to the screen.

The following are variables that will need to be passed into the Browse engine in order for appropriate results to be returned:

  • type: the type of browse being undertaken. This will be used to identify the Browse Index from the config
  • sortBy: which of the available sort options in config is to be sorted by
  • order: which way to interpret the sortBy. ASC or DESC
  • value: a specific value to browse upon. For example "Jones, Richard" to view all items where I am the author (in conjunction with type=author, of course)
  • resultsperpage: number of results to display on the page at any one time
  • community: the community we are browsing in
  • collection: the collection we are browsing in
  • next: the id of the item to be at the top of the "next" page
  • prev: the id of the item to be at the top of the "previous" page
  • focus: the target point in the listing to point the browse. This will be utilised by the paging system
  • year: the year to use as a focus in date browse
  • month: the month to use as a focus in date browse
  • startsWith: the characters to use for a stem search. Will be used with the focus
  • vfocus: the string to form the focus for single browse contexts [added 29-11-2006]

NOTE: "next" and "prev" are not clearly defined as to what the best way to obtain them is, and exactly what their relationship to "focus" is. It may be that "next" and "prev" are only used in the Servlet/UI layer to represent the "focus" for the next and previous functionality.

[edit] SQL Queries

This section is a summary of my first (untested) stabs at the SQL required by the Browse engine. Some of them MAY NOT WORK, or may not be complete yet

Obtain the results for a given value or focus:

   SELECT * FROM <index>
   WHERE sort_value [<|>|=] [<value> | <focus>]
       [AND collection_id = <collection>] 
       [AND community_id = <community>]
   ORDER BY <sortBy> <order>
   LIMIT <resultsperpage + 1>

So if "focus" is used to tell us which "next" or "prev" we should be looking at, then we may need to be able to dispense with them all together. <index> is updated to refer to the relevant table name (whether it is index_<n>, collection_index_<n> or community_index_<n>), and if the sort terms are correctly prepared then the simple comparators should be enough to ensure that we get everything in the desired order.

In order to output the string "Results A - B of C" the following are requried (B = A + <resultsperpage>):

A:

   SELECT COUNT(*) FROM <index> 
   WHERE value [<|>] <focus>

Here the "focus" must be implicitly defined for every request so that this query always returns, although sometimes will return 0. That means that after the SELECT above we must at least always assign the first result to be the focus if it has not already been defined).

C:

   SELECT COUNT(*) FROM <index>
   [WHERE value = <value>]

[edit] Ongoing programmer notes

  • sortBy is an int parameter, indicating which sort field to use. sortBy = 0 will therefore be sort by the index value
  • focus is either: a value pulled from the UI top navigation; a specific item id to browse to. This strikes me that it might need to be divided into two parts
    • I have made an executive decision that focus will refer only to item ids. Everything else must go through "value" or "starts_with"
  • if order = ASC value comparator = >, if order = DESC value comparator = <. How about =? This is when a value is supplied, in which case the comparator is applicable to the sortBy field instead
  • QUESTION: do we do our comparisons for browse on the "value" or the "sort_value" fields. Since "sort_value" has some sort of normalisation applied to it we must either normalise the request and compare it to that or not normalise the request and compare it to the "value" column

[edit] SQL Revisited

It has become necessary, on implementation, to modify the first SQL query given above, and add an additional query to satisfy paging with item focusses:

The first query becomes:

   SELECT * FROM <index>
   WHERE sort_value [<|>|=] [<value> | <focus-value>]
       [AND collection_id = <collection>] 
       [AND community_id = <community>]
   ORDER BY <sortBy> <order>
   LIMIT <resultsperpage + 1>

Just a minor change to indicate that the sort_value is not the <focus> as previously indicated, but the value of the <focus> id in the relevant context. This means we must add the following SQL:

   SELECT sort_value FROM <index>
   WHERE item_id = <focus>

This seems like a slightly dodgy solution, but this is almost exactly how the current browse mechanism does it. I will implement it this way, and revisit later if there are problems.

Further to this, here are 3 SQL case studies:

1) Browse all by title:

type = title order = ASC focus = -1 value = null rrp = 21 startsWith = null sortBy = 0

   SELECT * FROM index_1
   ORDER BY sort_value ASC
   LIMIT 21

2) Browse page 3 of the author list

type = author order = ASC focus = 32 (random item id) value = null rrp = 21 startsWith = null sortBy = 0

   SELECT * FROM index_2
   WHERE sort_value > [focus value]
   ORDER BY sort_value ASC
   LIMIT 22

3) Browse page 2 of author Jones, Richard, ordering by title

type = author order = ASC focus = 11 (random item id) value = Jones, Richard rrp = 21 startsWith = null sortBy = 1 (title field)

   SELECT * FROM index_2
   WHERE sort_1 > [focus value]
       AND sort_value = 'jones, richard'
   ORDER BY sort_1 ASC
   LIMIT 22

[edit] Ongoing programmers notes

  • As yet I have not attempted a treatment of the startsWith parameter. It strikes me that this needs to be dealt with in just the same way as the focus, and therefore may need to be merged with the focus parameter right at the start
  • I'm also proposing that instead of passing Item objects into the BrowseInfo, that actually we pass BrowseItem objects, which are extra lightweight, and have been written for the index process already
  • in the parlance of the BrowseInfo (I think):
    • position = B
    • total = C
    • offset = A
  • position relies on us having the focus for the top item. If a focus is not supplied to the engine, then we must get the value from the top result from the query

[edit] Progress Update: 22-11-2006, 12:10 GMT

This lunch time we reach that first goal of writing a chunk of code: it compiles. That is, I have written and compiled the initial version of the code which I think can take inputs through a URL according to the variables defined above, and perform the appropriate queries on the database defined also above. The next goal, then, is that great achievement - the elimination of any runtime problems. After that we'll know if it actually does what I think it does.

[edit] Ongoing programmers notes

  • Here's an oddity which rings a bell from the last time I looked at the Browse code. It seems that the result of a SELECT COUNT(*) query, despite being a number can't be retrieved using the TableRow.getIntColumn() method. I'm trying TableRow.getStringColumn and Integer.parseInt, but I have a horrible feeling that won't work either.
    • Nope, as anticipated:
   Exception:
   java.lang.IllegalArgumentException: Value for number is not an integer
       at org.dspace.storage.rdbms.TableRow.getIntColumn(TableRow.java:162)
   Exception:
   java.lang.IllegalArgumentException: Value is not an string
       at org.dspace.storage.rdbms.TableRow.getStringColumn(TableRow.java:244)
    • a spot of experimentation suggests that it can be got hold of as a "long"
  • Initial experiments with the browse code are looking positive. At least there are no errors coming out of the SQL and result sets containing actual data are being returned. Of course, we don't know for certain that it is the correct data yet. One problem that we need to address is some sort of entity object to cover the display end of things. As you may know, the current configuration for browse listings looks like this:
   webui.itemlist.columns = dc.date.issued(date), dc.title, dc.contributor.*

In order to make our debug better, it would be handy to have this wrapped up in an object that could do the configuration, rather than buried in the ItemListTag.java which is where it is just now

IMPORTANT NOTE: the previously optional configuration line "webui.itemlist.columns" has now become compulsory

  • OK, some good progress. I can now see that the list config is being picked up and dealt with properly, and therefore have worked out the prototype to the display logic (although not actually written any display code beyond BrowseInfo.toString()). There appears to be a problem getting hold of the right metadata through the BrowseItem object, which I am moving onto now

[edit] Progress Update: 22-11-2006, 15:10 GMT

I have got code that "works" in the minimal sense of the word, which appears to give us useful results (still to confirm that they are correct in each context, see below). Adding a toString() method to the BrowseInfo, I have dumped browse results to the screen for debugging. Below I have pasted the browse results corresponding to index_2 and index_1 in my dev box, which represent the tables shown further up this page. The work has touched the following files: BrowseInfo.java, DatabaseManager.java (for extra debug), dspace-web.xml, and created the following new files: BrowserServlet.java, BrowserScope.java, ItemListConfig.java.

   BrowseInfo String Representation: Browsing 0 to 22 of 22||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   {{Item ID: 22 :: [dc.date.issued:2006-11-16T17:17:24Z][dc.title.null:submit 22][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 21 :: [dc.date.issued:2006-11-16T17:16:02Z][dc.title.null:submit 21][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 20 :: [dc.date.issued:2006-11-16T17:15:42Z][dc.title.null:submit 20][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 19 :: [dc.date.issued:2006-11-16T17:15:19Z][dc.title.null:submit 19][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}||
   BrowseInfo String Representation: Browsing 0 to 22 of 22||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 19 :: [dc.date.issued:2006-11-16T17:15:19Z][dc.title.null:submit 19][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 20 :: [dc.date.issued:2006-11-16T17:15:42Z][dc.title.null:submit 20][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 21 :: [dc.date.issued:2006-11-16T17:16:02Z][dc.title.null:submit 21][dc.contributor.*:Jones, Richard]}}
  {{Item ID: 22 :: [dc.date.issued:2006-11-16T17:17:24Z][dc.title.null:submit 22][dc.contributor.*:Jones, Richard]}}||

TODO: This is currently returning one more row that it's supposed to. This is because I've asked the query to get the row after the current page so that we have a focus for the next page browse. I need to strip this from the result set before giving them to the browse info object

[edit] Testing the Browse URL

To get an idea as to whether the Browse is working correctly, we need to just run through some tests with the URL API. At this stage, "month", "year", "starts_with", "community" and "collection" cannot be tested as they have not been built in to the engine yet. The following can be tested, though:

  • type = dateissued | author | title | subject
  • order = ASC | DESC
  • value = [free text]
  • focus = [item id]
  • rpp = [integer: 1 - X]
  • sort_by = [integer: 1 - N]

The full browse URL is of the form:

   browse?type=<type>&order=<order>&value=<value>&focus=<focus>&rpp=<rpp>&sort_by=<sort_by>

So for example:

   browse?type=author&order=ASC&value=Jones%C2+Richard&focus=34&rpp=10&sort_by=1

The following are URLs that have been tested, and the results:

   browse?type=dateissued&order=ASC&focus=3&rpp=12&sort_by=0
   BrowseInfo String Representation: Browsing 2 to 15 of 22||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}||

(A brief pause then ensued while I added some more information to my debug output so that I can quickly test that things are as they should be; below is the same debug, but with the extra data)

   BrowseInfo String Representation: Browsing 2 to 15 of 22 in index: dateissued(data type: date, display type: full||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   Sorting by: dc.date.issued ASC||
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}||

Success Metric: type = dateissued, order = ASC (by date issued), focus = 3, results per page = 12 (+1 as documented above), sort by = dateissued (the browse value)

   browse?type=dateissued&order=DESC&focus=5&rpp=10&sort_by=1
   BrowseInfo String Representation: Browsing 22 to 22 of 22 in index: dateissued(data type: date, display type: full||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   Sorting by: dc.title DESC||||

This one evidently has some problems. Going in to find out what's wrong ...

Failure Analysis: This problem appears to be because the value obtained for the focus item (in this case item id 5) is the actual value in the desired index (dateissued), which in this case is "2006-11-16t17:09:52z". The desired value for the focus value should, though, be the value for the focus item id in the relevant sort field of the desired index (in this case it should have been "submit 5").

   BrowseInfo String Representation: Browsing 4 to 15 of 22 in index: dateissued(data type: date, display type: full)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   Sorting by: dc.title DESC(option 1)||
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 22 :: [dc.date.issued:2006-11-16T17:17:24Z][dc.title.null:submit 22][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 21 :: [dc.date.issued:2006-11-16T17:16:02Z][dc.title.null:submit 21][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 20 :: [dc.date.issued:2006-11-16T17:15:42Z][dc.title.null:submit 20][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 19 :: [dc.date.issued:2006-11-16T17:15:19Z][dc.title.null:submit 19][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}||

Success Metric: type = dateissued, order = DESC (by title), focus = 5, results per page = 10 (+1 as documented above), sort by = title (sort_1)

NOTE: it should only be possible to set the "value" variable from a browse page which is of type "single"

NOTE: it should only be possible to specify the "sort_by" variable from a browse page which is of type "full"

   browse?type=author&order=ASC&rpp=25
   BrowseInfo String Representation: Browsing 0 to 22 of 22 in index: author(data type: text, display type: single)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   Sorting by: dc.contributor.* ASC(option 0)||
   {{Item ID: 22 :: [dc.date.issued:2006-11-16T17:17:24Z][dc.title.null:submit 22][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 21 :: [dc.date.issued:2006-11-16T17:16:02Z][dc.title.null:submit 21][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 20 :: [dc.date.issued:2006-11-16T17:15:42Z][dc.title.null:submit 20][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 19 :: [dc.date.issued:2006-11-16T17:15:19Z][dc.title.null:submit 19][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}||

Failure Analysis: This is wrong is a fairly significant way, in that since there is only one author name "Jones, Richard", there should only be one result. It therefore needs a DISTINCT clause inserted into the SELECT in the condition that the display type is "single" and there is no value specified for the browse. In order to make testing this index possible I will also add a couple of items by other authors to the test set.

[edit] Single Browse vs Full Browse

The above testing has yielded a logical flaw in my reasoning. Inserting DISTINCT into the query is not straightforward, as this changes the very nature of what you are querying for. Code so far has assumed that it could obtain an item_id as a focus, but this falls down here. The old browse code dealt with this in a complex way, and the reason that it was impossible to generalise the old code, and thus necessitating the creation of this new code was because how the semantics of these two essentially different ways of making the browse function were blended. Fortunately, the approach I have adopted is sufficiently clear that it will be possible to build in a new mechanism for browsing in this second way that will not be a horrible confusion!

I propose, inside the BrowseEngine, to have an initial catch thus:

   if (browseIndex.isSingle() && !scope.hasValue())
   {
       browseByValue(scope);
   }
   else
   {
       browseByItem(scope);
   }

This will then allow us to keep all the logic separate. Obviously much of the supporting methods will work well in both context and can be reused.

The BrowseInfo object already supports two methods: getItemResults and getStringResults, which gives me a place in the existing code to hook the results of this functionality without too much work at that end.

The SQL that we need to achieve is quite simple, and looks like this

   SELECT DISTINCT(value) FROM <index>
   [WHERE sort_value [<|>|=] <vfocus>]
   ORDER BY sort_value <order>
   LIMIT <rpp> + 1

Here we have introduced one new variable (which has been fed back to the earlier list of UI variables) called "vfocus", and which is the text value of the target focus. For pagination this is obtained by the +1 on the next page in the LIMIT portion of the query. More on this later.

[edit] Ongoing programmers notes

  • The SQL appears to be coming together quite quickly
  • We have a method called getFocusValue which is supposed to return a string value from the item id integer. It would be good if we can generalise this so that it dealt with both "focus" and "vfocus". I've added a vFocus member variable to the BrowserScope object. The next thing to do (probably not until Friday now) is track this back through to the Servlet to ensure that it gets properly populated by the UI
  • The UI URL parameter "vfocus" has been added (and retrofitted to the list above)
  • the vfocus parameter has been propagated from the URL to the browse engine, and the engine has been modified so that it should now be able to deal with both browsing by item or by value. Once this compiles, testing info to follow...
  • Having forgotten how to write SELECT DISTINCT statements properly, we modify our SQL for value browsing to be:
   SELECT DISTINCT(value), sort_value
   FROM index_<n>
   [WHERE sort_value [<|>|=] <vfocus>]
   ORDER BY sort_value <order>
   LIMIT <rpp> + 1
  • in order to make the BrowseInfo.toString method work with the new value browse, I need to go in and make some changes

[edit] Testing the Browse URL (part 2)

Having got through end-to-end on the first single value browse (author), we are ready to throw some more stuff at the browse engine and see how it copes. Let's start simple:

   browse?type=author
   BrowseInfo String Representation: Browsing 0 to 1 of 22 in index: author(data type: text, display type: single)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   Sorting by: dc.contributor.* ASC(option 0)||
   { { Value: Jones, Richard}}||

Failure Analysis: while this looks correct, we can see that the count value 0 - 1 of 22 is incorrect. This is because of the following factors: the BrowseEngine.getTotalResults is not implementing DISTINCT, and because of the way that BrowseInfo.toString makes a minor error in calculating the range (although the actual range is correct). It is also displaying which columns it lists over, even though it will not do this. These to be fixed before the next test.

   BrowseInfo String Representation: Browsing 1 to 1 of 1 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.*||
   Sorting by: dc.contributor.* ASC(option 0)||
   { { Value: Jones, Richard}}||

Success Metric: type = author, 1 result

To be sure that this is working properly I will add more data to the system and run the same URL again

[edit] Progress Update: 29-11-2006, 15:05 GMT

While the single browse pages now look within reach, a new problem has emerged which means we cannot continue the above testing immediately. It appears that the submission process which requests the indexing of the item is broken. This is not surprising, since I've not put much work into that area. Nonetheless, I have written an indexer which should be re-usable, and I will therefore divert my attention to this for a short while so that I can then add more items to continue the above testing.

[edit] Ongoing Programmer's Notes

  • The individual item indexing is currently achieved with Browse.itemAdded, Browse.itemChanged and Browse.itemRemoved. We will need to provide reasonable alternatives to these for the new indexer. I expect that IndexBrowse.itemAdded, IndexBrowse.itemChanged, IndexBrowse.itemRemoved will be the way to go. This will mean that we must modify the Item object to call that class instead.
  • Browse is referenced from Item.update, Item.withdraw and Item.delete

[edit] Progress Update: 30-11-2006, 12:55 GMT

A new index process for individual items has been added and tested in the most basic way: an item has been added, and it has appeared in the index tables. This means that we can resume our primary path which is to add some more data to the browse tables and continue testing the URL

[edit] Testing the Browse URL (part 3)

   browse?type=author
   BrowseInfo String Representation: Browsing 1 to 6 of 6 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.*||
   Sorting by: dc.contributor.* ASC(option 0)||
   { {Value: Ardman, Alfred}}
   { {Value: Boothroyd, Betty}}
   { {Value: Chaplin, Charlie}}
   { {Value: Decimal, Dewey}}
   { {Value: Eagle, Eddie}}
   { {Value: Jones, Richard}}||

Success Metric: single value browse, type-author, results=6, sort by name ASC

Now we can go on and push the browse URL a little further.

Note: I have gone back and re-run all the tests done above and they appear, withough significant analysis, to be correct. I have also re-run the last URL which caused the problems before, and it produces exactly the same results as above, which is what we would expect.

   browse?type=author&order=DESC&rpp=3&vfocus=Boothroyd%2C+Betty
   BrowseInfo String Representation: Browsing 26 to 27 of 6 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.*||
   Sorting by: dc.contributor.* DESC(option 0)||
   { {Value: Boothroyd, Betty}}
   { {Value: Ardman, Alfred}}||

Failure Analysis: *sigh*. Well this has behaved pretty much correctly, in so much as it starts with "Boothroyd, Betty" and works its way in descending order to the end of the list. Unfortunately, the "26 to 27" bit is a little out! This appears to be because the count prior to the current value is not employing a DISTINCT(value) section to the query. Therefore, the range is correct, and if the starting point were correct then everything would have gone as planned.

   BrowseInfo String Representation: Browsing 5 to 6 of 6 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.*||
   Sorting by: dc.contributor.* DESC(option 0)||
   { {Value: Boothroyd, Betty}}
   { {Value: Ardman, Alfred}}||

Success Metric: type=author, value focus = Boothroyd, Betty, sorted by value descending.

   browse?type=title&order=ASC&focus=13&rpp=5&sort_by=2
   BrowseInfo String Representation: Browsing 13 to 18 of 27 in index: title(data type: title, display type: full)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   Sorting by: dc.date.issued ASC(option 2)||
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}||

Success Metric: type=title, order = undeterminable for this set, as all the authors are the same, focus = 13, range is correct (+1 as above), rpp = 5 (+1). Sorting by author needs to be bourne out by other tests

   browse?type=title&order=DESC&rpp=30&sort_by=2
   BrowseInfo String Representation: Browsing 1 to 27 of 27 in index: title(data type: title, display type: full)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.*||
   Sorting by: dc.date.issued DESC(option 2)||
   {{Item ID: 27 :: [dc.date.issued:2006-11-30T12:56:42Z][dc.title.null:Submit E][dc.contributor.*:Eagle, Eddie]}}
   {{Item ID: 26 :: [dc.date.issued:2006-11-30T12:56:19Z][dc.title.null:Submit D][dc.contributor.*:Decimal, Dewey]}}
   {{Item ID: 25 :: [dc.date.issued:2006-11-30T12:55:58Z][dc.title.null:Submit C][dc.contributor.*:Chaplin, Charlie]}}
   {{Item ID: 24 :: [dc.date.issued:2006-11-30T12:55:33Z][dc.title.null:Submit B][dc.contributor.*:Boothroyd, Betty]}}
   {{Item ID: 23 :: [dc.date.issued:2006-11-30T12:49:29Z][dc.title.null:Submit A][dc.contributor.*:Ardman, Alfred]}}
   {{Item ID: 22 :: [dc.date.issued:2006-11-16T17:17:24Z][dc.title.null:submit 22][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 21 :: [dc.date.issued:2006-11-16T17:16:02Z][dc.title.null:submit 21][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 20 :: [dc.date.issued:2006-11-16T17:15:42Z][dc.title.null:submit 20][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 19 :: [dc.date.issued:2006-11-16T17:15:19Z][dc.title.null:submit 19][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}||

Success Metric: type=title, order = date issued descending (option 2 was date issued not author, as I have not (and cannot) set an author sort option (see notes above)). less than 30 results are on the page, and this is all of them

   browse?type=subject&order=DESC&rpp=10
   BrowseInfo String Representation: Browsing 1 to 2 of 2 in index: subject(data type: text, display type: single)||
   Listing single column: dc.subject.*||
   Sorting by: dc.subject.* DESC(option 0)||
   { {Value: fdsa}}
   { {Value: asdf}}||

Success Metric: type=subject, order = subject descending, all values in results

   browse?type=subject&order=ASC&rpp=10&vfocus=fdsa
   BrowseInfo String Representation: Browsing 2 to 2 of 2 in index: subject(data type: text, display type: single)||
   Listing single column: dc.subject.*||
   Sorting by: dc.subject.* ASC(option 0)||
   { {Value: fdsa}}||

Success Metric: type=subject, order = subject ascending (even though there is only one result, because we focus on "fdsa" we know that it is sorting the right way), vfocus = fdsa (asdf would be number 1, which is not displayed)

Now we can continue to test the "second level browse" pages, which are Single/Value browses which have value parameters specified. This could be the source of some more bugs

   browse?type=author&order=ASC&value=Jones%2C+Richard&rpp=10&sort_by=1

Failure Analysis: this URL returns a blank page. This appears to be caused by a problem with turning the BrowseInfo object into a String

   java.lang.ClassCastException: org.dspace.browse.BrowseItem
       at org.dspace.browse.BrowseInfo.valueListingString(BrowseInfo.java:462)
       at org.dspace.browse.BrowseInfo.toString(BrowseInfo.java:336)

This is because there is a test BrowseIndex.isSingle which only remarks on the browse type - it does not consider whether we are at the top or second level of the browse. The BrowseInfo object needs to know whether it is doing top level or second level browsing, as does the BrowseScope object for other uses. I propose the addition of isTopLevel and isSecondLevel to both of these objects, and to have them populated by the BrowseServlet

[edit] Ongoing Programmer's Notes

  • This means that things which were done previously in the BrowseEngine thus:
   if (browseIndex.isSingle() && scope.hasValue())

can now be done thus:

   if (scope.isSecondLevel())

[edit] Testing the Browse URL (part 4)

   browse?type=author&order=ASC&value=Jones%2C+Richard&rpp=10&sort_by=1
   BrowseInfo String Representation: Browsing 1 to 11 of 22 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.*||
   Sorting by: dc.title ASC(option 1)||
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 19 :: [dc.date.issued:2006-11-16T17:15:19Z][dc.title.null:submit 19][dc.contributor.*:Jones, Richard]}}||

Success Metric: *somewhat miraculous that there were no significant problems* type=author, level=2, sort by = title, ASC, value="Jones, Richard"

   browse?type=author&order=DESC&value=Jones%2C+Richard&rpp=5&sort_by=1&vfocus=submit+13
   BrowseInfo String Representation: Browsing 1 to 6 of 22 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.*||
   Sorting by: dc.title DESC(option 1)||
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}||

Failure Analysis: it appears that this browse has not used the "vfocus" variable in the SQL. This means that the SQL has not been written quite correctly. All other features of the browse have functioned correctly as far as I can tell.

[edit] Ongoing Programmer's Notes

  • It seems the fix might be as simple as to ensure that the browseByItem method checks both for item id and string value focusses, as at the moment it doesn't
    • yup, looks like it's done the job ...

[edit] Testing the Browse URL (part 5)

   browse?type=author&order=DESC&value=Jones%2C+Richard&rpp=5&sort_by=1&vfocus=submit+13
   BrowseInfo String Representation: Browsing 23 to 27 of 22 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.*||
   Sorting by: dc.title DESC(option 1)||
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}||

Failure Analysis: well, it appears to have selected the correct values, and the reason there aren't 6 (rpp + 1) is because it reached the end of the index for those parameters. They are in the correct order for the correct value, but now we are in the range 23 - 27 of 22, which is interesting. The range is correct, so only the start value is at fault. This looks like an application (or misapplication) of the DISTINCT SQL construct.

   BrowseInfo String Representation: Browsing 7 to 11 of 22 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.*||
   Sorting by: dc.title DESC(option 1)||
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}||

Success Metric: *good news* type=author, order = title descending (sorting by title), value = "Richard, Jones" (and it is known that at this point there are 22 items by that author), vfocus = "submit 13"

   browse?type=author&order=ASC&value=Eagle%2C+Eddie&rpp=10&sort_by=2
   BrowseInfo String Representation: Browsing 1 to 1 of 1 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.*||
   Sorting by: dc.date.issued ASC(option 2)||
   {{Item ID: 27 :: [dc.date.issued:2006-11-30T12:56:42Z][dc.title.null:Submit E][dc.contributor.*:Eagle, Eddie]}}||

Success Metric: type=author, order = unknowable, but should be by date, value ="Eagle, Eddie" (and it is known that there is only 1 item by this author)

   browse?type=subject&order=DESC&value=asdf&rpp=10&sort_by=1
   BrowseInfo String Representation: Browsing 1 to 11 of 22 in index: subject(data type: text, display type: single)||
   Listing single column: dc.subject.*||
   Sorting by: dc.title DESC(option 1)||
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 22 :: [dc.date.issued:2006-11-16T17:17:24Z][dc.title.null:submit 22][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 21 :: [dc.date.issued:2006-11-16T17:16:02Z][dc.title.null:submit 21][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 20 :: [dc.date.issued:2006-11-16T17:15:42Z][dc.title.null:submit 20][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}||

TODO: the results don't yet tell us which value we are browsing on - this will need to be fixed for the real UI

Success Metric: type=subject, order = title descending, by value "asdf" (although you can't see this - we know there are 22 items with that subject)

[edit] Browse URL Testing Summary

A summary of the URL parameters tested above
test mode type order value focus vfocus rpp sort_by
1 Full/Item dateissued ASC N/A 3 N/A 12 0
2 Full/Item dateissued DESC N/A 5 N/A 10 1
3 Single/Value author ASC - N/A - 25 N/A
4 Single/Value author DESC - N/A Boothroyd, Betty 3 N/A
5 Full/Item title ASC N/A 13 N/A 5 2
6 Full/Item title DESC N/A - N/A 30 2
7 Single/Value subject DESC - N/A - 10 N/A
8 Single/Value subject ASC - N/A fdsa 10 N/A
9 Single/Value with Value author ASC Jones, Richard N/A - 10 1
10 Single/Value with Value author DESC Jones, Richard N/A submit 13 5 1
11 Single/Value with Value author ASC Eagle, Eddie N/A - 10 2
12 Single/Value with Value subject DESC asdf N/A - 10 1

[edit] Progress Update: 30-11-2006, 16:50 GMT

We now have a Browse Engine which is capable of taking the core set of parameters that we might want to browse by, and turning them into meaningful results. We also have a BrowseInfo object which is capable of carrying all the information that will be required by the UI to render these into pages for the user. There are a few major things outstanding:

  • The implementation of "starts with"
  • The restriction to communities and collections
  • Next and Previous page items or values (next can be got with the current code, but previous will require more work)
  • the UI

These 4 items will be attacked in approximately that order now ...

[edit] Introducing "starts with"

The "starts_with" parameter is used through the user interface to indicate what value the search should look for strings starting with to display. For example:

   starts_with=Jon

should match "Jones, Richard". This also works with dates, where

   year=2006&month=01

should be equivalent to:

   starts_with=2006-01

and therefore will match all things published in the year 2006 in January

Converting "year" and "month" into "starts_with" will be done in the BrowseServlet. The impact that it will have on the SQL is to require us to stop using "= 'some value'" and start using "LIKE 'some value%'" when a "starts_with" parameter is available. So the SQL will look like this:

   SELECT * FROM <index>
   WHERE sort_value [[<|>|=] [<value> | <focus-value>] | LIKE <starts_with>% ]
      [AND collection_id = <collection>] 
      [AND community_id = <community>]
   ORDER BY <sortBy> <order>
   LIMIT <resultsperpage + 1>

Note: this utilises the database's regular expression engine, which will have a performance impact. Therefore, we cannot always use LIKE for convenience, we must only use it when there is a "starts_with" parameter.

Important Note: the logic here has been refuted below - it is no longer necessary to worry about the regular expression features

[edit] Ongoing Programmer's Notes

  • "starts_with" is mutually exclusive of "focus" and "vfocus"
  • "starts_with" is intrinsically linked to the "sort_by" field, in the same way that "focus" and "vfocus" are

[edit] Testing the Browse URL (part 6)

   browse?type=dateissued&order=ASC&rpp=10&sort_by=0&starts_with=2006-11
  BrowseInfo String Representation: Browsing 1 to 11 of 27 in index: dateissued(data type: date, display type: full)||
  Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.* starting with value: 2006-11|| 
  Sorting by: dc.date.issued ASC(option 0)||
  {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}
  {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
  {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}  
  {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
  {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}  
  {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}  
  {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
  {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
  {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
  {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
  {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}||

Success Metric: type=dateissued, order = date issued ascending, rpp= 10 (+1), starts with "2006-11"; we know that there are 27 items in the database, and they were all entered in November 2006

   browse?type=dateissued&order=DESC&rpp=10&sort_by=0&starts_with=2006-11-30
   BrowseInfo String Representation: Browsing 6 to 10 of 27 in index: dateissued(data type: date, display type: full)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.* starting with value: 2006-11-30||
   Sorting by: dc.date.issued DESC(option 0)||
   {{Item ID: 27 :: [dc.date.issued:2006-11-30T12:56:42Z][dc.title.null:Submit E][dc.contributor.*:Eagle, Eddie]}}
   {{Item ID: 26 :: [dc.date.issued:2006-11-30T12:56:19Z][dc.title.null:Submit D][dc.contributor.*:Decimal, Dewey]}}
   {{Item ID: 25 :: [dc.date.issued:2006-11-30T12:55:58Z][dc.title.null:Submit C][dc.contributor.*:Chaplin, Charlie]}}
   {{Item ID: 24 :: [dc.date.issued:2006-11-30T12:55:33Z][dc.title.null:Submit B][dc.contributor.*:Boothroyd, Betty]}}
   {{Item ID: 23 :: [dc.date.issued:2006-11-30T12:49:29Z][dc.title.null:Submit A][dc.contributor.*:Ardman, Alfred]}}||

Failure Analysis: the perplexing thing about this is that only the results that actually start with the value are supplied. A moment examining the SQL we wrote shows us that we were over-zealous in our logic. There is no need to invoke the regular expression engine, it is simply enough to supply the "starts_with" parameter in place of the "focus" or "vfocus" in the query, thus:

   SELECT * FROM <index>
   WHERE sort_value [<|>|=] [<value> | <focus-value> | <starts_with>]
     [AND collection_id = <collection>] 
     [AND community_id = <community>]
   ORDER BY <sortBy> <order>
   LIMIT <resultsperpage + 1>

The code update to achieve this now gives us this result:

   BrowseInfo String Representation: Browsing 6 to 16 of 27 in index: dateissued(data type: date, display type: full)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.* starting with value: 2006-11-30||
   Sorting by: dc.date.issued DESC(option 0)||
   {{Item ID: 22 :: [dc.date.issued:2006-11-16T17:17:24Z][dc.title.null:submit 22][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 21 :: [dc.date.issued:2006-11-16T17:16:02Z][dc.title.null:submit 21][dc.contributor.*:Jones, Richard]}}  
   {{Item ID: 20 :: [dc.date.issued:2006-11-16T17:15:42Z][dc.title.null:submit 20][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 19 :: [dc.date.issued:2006-11-16T17:15:19Z][dc.title.null:submit 19][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}||

Failure Analysis: this is not as wrong as it looks. It has been produced as an oddity in the way that string comparisons are dealt with in the database. The query which generated this looks like this:

   SELECT * FROM index_1  WHERE  sort_value <= '2006-11-30'  ORDER BY sort_value  DESC  LIMIT 11

which will actually only match everything *after* 2006-11-30, not *including*. The original Browse code notes this problem as follows:

               /*
                * When the user is browsing with the most recent items first,
                * the browse code algorithm doesn't quite do what some people
                * might expect. For example, if in the index there are entries:
                * 
                * Mar-2000 15-Feb-2000 6-Feb-2000 15-Jan-2000
                * 
                * and the user has selected "Feb 2000" as the start point for
                * the browse, the browse algorithm will start at the first
                * point in that index *after* "Feb 2000". "Feb 2000" would
                * appear in the index above between 6-Feb-2000 and 15-Jan-2000.
                * So, the browse code in this case will start the browse at
                * "15-Jan-2000". This isn't really what users are likely to
                * want: They're more likely to want the browse to start at the
                * first Feb 2000 date, i.e. 15-Feb-2000. A similar scenario
                * occurs when the user enters just a year. Our quick hack to
                * produce this behaviour is to add "-32" to the startsWith
                * variable, when sorting with most recent items first. This
                * means the browse code starts at the topmost item in the index
                * that matches the user's input, rather than the point in the
                * index where the user's input would appear.
                */

We will adopt the same approach for the new browse code. This means that we must abandon this particular test (see note below).

Note: there is an implied limit to the functionality of the Browse Engine here. All string comparison problems for dates are overcome by the application of "-32" to the end of the string. This works perfectly well for years without months and months without days (because there are less than 32 in both), but does not work at the days level (that is, if starts_with=2006-11-30, then the query will be on a date before "2006-11-30-32", which does not successfully compare to a date of the form "2006-11-30T17:17:24Z"

   browse?type=author&order=ASC&rpp=10&sort_by=0&starts_with=Jon
   BrowseInfo String Representation: Browsing 6 to 6 of 6 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.* starting with value: Jon||
   Sorting by: dc.contributor.* ASC(option 0)||
   { {Value: Jones, Richard}}||

Success Metric: type=author, sort by = author ascending (otherwise, Jones, Richard wouldn't be last). Correct range and everything.

   browse?type=title&order=DESC&rpp=10&sort_by=0&starts_with=submit+2
   BrowseInfo String Representation: Browsing 16 to 26 of 27 in index: title(data type: title, display type: full)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.* starting with value: submit 2||
   Sorting by: dc.title DESC(option 0)||
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 19 :: [dc.date.issued:2006-11-16T17:15:19Z][dc.title.null:submit 19][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 15 :: [dc.date.issued:2006-11-16T17:13:52Z][dc.title.null:submit 15][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 14 :: [dc.date.issued:2006-11-16T17:13:31Z][dc.title.null:submit 14][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 13 :: [dc.date.issued:2006-11-16T17:13:09Z][dc.title.null:submit 13][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 12 :: [dc.date.issued:2006-11-16T17:12:45Z][dc.title.null:submit 12][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}||

Success Metric: type=title, sort by = title descending, staring with the last submit+2 value when descending

Note: using "starts_with" with an order=DESC is an odd thing to do, and produces the above technically correct, but slightly misleading result. Worth mentioning.

   browse?type=subject&order=ASC&rpp=10&sort_by=0&starts_with=a
   BrowseInfo String Representation: Browsing 1 to 2 of 2 in index: subject(data type: text, display type: single)||
   Listing single column: dc.subject.* starting with value: a||
   Sorting by: dc.subject.* ASC(option 0)||
   { {Value: asdf}}
   { {Value: fdsa}}||

Success Metric: type=subject, sort by = subject ascending, starting with a

   browse?type=author&order=ASC&rpp=10&sort_by=1&starts_with=submit+2&value=Jones%2C+Richard
   BrowseInfo String Representation: Browsing 2 to 12 of 22 in index: author(data type: text, display type: single)||
   Listing single column: dc.contributor.* starting with value: submit 2||
   Sorting by: dc.title ASC(option 1)||
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 20 :: [dc.date.issued:2006-11-16T17:15:42Z][dc.title.null:submit 20][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 21 :: [dc.date.issued:2006-11-16T17:16:02Z][dc.title.null:submit 21][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 22 :: [dc.date.issued:2006-11-16T17:17:24Z][dc.title.null:submit 22][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}||

Success Metric: type=author value = "jones, richard", sort by = title, starting with submit 2, and going up

TODO: the output string says that it is starting with value "submit 2". This is true but misleading; it would be better if the value was reported as "Jones, Richard" and the starts_with reported as "starting with focus"

[edit] Testing Browse URL Summary (part 2)

A summary of the URL parameters tested above
test mode type order value focus vfocus rpp sort_by starts_with
13 Full/Iten dateissued ASC - - - 10 0 2006-11
14 Single/Value author ASC - - - 10 0 Jon
15 Full/Item title DESC - - - 10 0 submit 2
16 Single/Value subject ASC - - - 10 0 a
17 Value author ASC - - - 10 1 submit 2

[edit] Introducing restriction to Community or Collection

Community and Collection data can be obtained from the URL, when the browse URL is of the form:

   handle/123456789/4321/browse?....

Where 123456789/4321 is the handle of the community or collection to be browsed in.

When we are inside a community or collection the browse must be done on one of the views created on the data which lists browse results by collection id. This means that we can construct SQL queries of the form:

   SELECT * FROM [community|collection]_<index>
   WHERE [collection_id|community_id] = [<collection>|<community>]
       AND sort_value [[<|>|=] [<value> | <focus-value> | <starts_with>]
   ORDER BY <sortBy> <order>
   LIMIT <rpp> + 1

To achieve this we need to place the community or collection object into the BrowseScope to go into the BrowseEngine. The engine can then construct the query in the same way for both item and value browses, for example:

   if (scope.isCollection())
   {
       table = browseIndex.getTableName(false, true);
   }
   else if (scope.isCommunity())
   {
        ....

To obtain the table name, and similarly to construct the relevant segment of the WHERE clause. The community or collectiont then needs to be passed back into the BrowseInfo object so that it can report on the scope of the browse.

[edit] Onging Programmer's Notes

  • At home time, you had implemented the Servlet end of things for taking the collection or community, and added them to the BrowseScope object, with the relevant accessors. Next, implement in the BrowseEngine.
  • Looks like I've been excessive with my application of the constraints to community and collection. Probably this is just a missing check in the BrowseEngine, no biggie:
   The container must be a community or a collection
   =============================================
   org.dspace.browse.BrowseException: The container must be a community or a collection
     at org.dspace.browse.BrowseInfo.setBrowseContainer(BrowseInfo.java:167)
     at org.dspace.browse.BrowseEngine.browseByItem(BrowseEngine.java:242)
     at org.dspace.browse.BrowseEngine.browse(BrowseEngine.java:448)
  • Aside from the fact that the BrowseInfo reports browsing NOT in a community or collection as being in an Invalid Container, the very first primitive tests suggest that the application of the constraint code has not immediately broken anything else. Always a good start!
  • The very first tentative test of the constraint code indicates that while the code is working, it appears to have missed the 5 items submitted later on. Since they should be in the same collection, there must be some sort of problem ... investigating.
    • Actually, there appears to be a problem with the constraining process, so that constraining to collection does not take effect
      • The reason for this is that the bit of code which tells which container you are in is clever enough to lift out the community /and/ the collection if you are in a collection. I'm not sure what happens when you are in a stack of communities. Perhaps I need to modify the code slightly to eliminate this danger.
        • This all appears to happen somewhere high up the stack, possibly in the DSpaceServlet (I've decided not to track it any further). Anyway, the logic just needs to figure out that if we are in a collection then it doesn't need to bother with the community.
    • With the above logic implemented, the collection browse functions apparently correctly (tests still to be fully carried out). Unfortunately, it still doesn't appear to work correctly for the community
      • The table community_index_3 is a view on index_3 with additional community information. This table really doesn't contain the values that we want to see in the browse, so the problem is most likely in the indexing process itself (or, more likely still, in the community2item table)
      • The problem is that there is one table called "communities2item" which contains only 22 of the 27 current records, and another called "community2item" which contains all the 27 records, and is a view on the data. The question is, why are there two so similarly named tables, and which one should we really be using, and why do they contain (slightly )different data.
        • The "communities2item" table was used by the previous browse code, and was not being updated by the new indexer. I have decided to stick with the existing view "community2item" instead. With that change to the community_index_<n> views, everything appears to be selected. Next, on to test the new functionality ...
  • It looks as though the Indexer might not be dropping old tables and views. Not quite sure why yet - one problem at a time.

[edit] Testing the Browse URL (part 7)

In order to successfully test the constraining code is is necessary to do the following things first:

  • Create more top level communities
  • Create some second level communities
  • Create more collections as the second and third levels
  • Add more items to the system ensuring there are results for all collections
  • map items into more than one collection.
   handle/123456789/2/browse?type=dateissued&order=ASC&rpp=10&sort_by=0
   BrowseInfo String Representation: Browsing 1 to 11 of 40 in index: dateissued (data type: date, display type: full) ||
   Browsing in collection: 1 (123456789/2)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.* on value: null||
   Sorting by: dc.date.issued ASC(option 0)||
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}||

Failure Analysis: Although the result set appears to be correct, there are not 40 elements in this collection (which is the correct collection). This is evidently a missing statement in the WHERE clause of the count mechanism.

   BrowseInfo String Representation: Browsing 1 to 11 of 27 in index: dateissued (data type: date, display type: full) ||
   Browsing in collection: 1 (123456789/2)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.* on value: null||
   Sorting by: dc.date.issued ASC(option 0)||
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 3 :: [dc.date.issued:2006-11-16T17:09:05Z][dc.title.null:submit 3][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 5 :: [dc.date.issued:2006-11-16T17:09:52Z][dc.title.null:submit 5][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 6 :: [dc.date.issued:2006-11-16T17:10:18Z][dc.title.null:submit 6][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 7 :: [dc.date.issued:2006-11-16T17:10:43Z][dc.title.null:submit 7][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 8 :: [dc.date.issued:2006-11-16T17:11:08Z][dc.title.null:submit 8][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 10 :: [dc.date.issued:2006-11-16T17:11:56Z][dc.title.null:submit 10][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 11 :: [dc.date.issued:2006-11-16T17:12:19Z][dc.title.null:submit 11][dc.contributor.*:Jones, Richard]}}||

Success Metric: right number of results, range, total, right collection, ordered by date ascending

   handle/123456789/32/browse?type=title&order=DESC&rpp=5&sort_by=0
   BrowseInfo String Representation: Browsing 1 to 3 of 3 in index: title (data type: title, display type: full) ||
   Browsing in collection: 2 (123456789/32)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.* on value: null||
   Sorting by: dc.title DESC(option 0)||
   {{Item ID: 29 :: [dc.date.issued:2006-12-04T11:55:38Z][dc.title.null:Submit G][dc.contributor.*:Garrison, Gertrude]}}
   {{Item ID: 28 :: [dc.date.issued:2006-12-04T11:55:08Z][dc.title.null:Submit F][dc.contributor.*:Frankfurt, Freddie]}}
   {{Item ID: 1 :: [dc.date.issued:2006-11-16T17:08:11Z][dc.title.null:Submit 1][dc.contributor.*:Jones, Richard]}}||

Success Metric: correct range and result count, sorted by title descending, limited to correct collection

   handle/123456789/34/browse?type=author&order=ASC&rpp=5&sort_by=0
   BrowseInfo String Representation: Browsing 1 to 3 of 3 in index: author (data type: text, display type: single) ||
   Browsing in collection: 3 (123456789/34)||
   Listing single column: dc.contributor.* on value: null||
   Sorting by: dc.contributor.* ASC(option 0)||
   { {Value: Harrison, Harry}}
   { {Value: Ianson, Irene}}
   { {Value: Jones, Richard}}||

Success Metric: correct range and result count and results, sorted by contributor ascending, limited to correct collection

   handle/123456789/38/browse?type=subject&order=DESC&rpp=5&sort_by=0

Success Metric: no results, as expected (no items in this collection have subjects

   handle/123456789/1/browse?type=dateissued&order=DESC&rpp=5&sort_by=1&focus=20
   BrowseInfo String Representation: Browsing 15 to 20 of 27 in index: dateissued (data type: date, display type: full) ||
   Browsing in community: 1 (123456789/1)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.* on value: submit 20||
   Sorting by: dc.title DESC(option 1)||
   {{Item ID: 20 :: [dc.date.issued:2006-11-16T17:15:42Z][dc.title.null:submit 20][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 2 :: [dc.date.issued:2006-11-16T17:08:42Z][dc.title.null:submit 2][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 19 :: [dc.date.issued:2006-11-16T17:15:19Z][dc.title.null:submit 19][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 18 :: [dc.date.issued:2006-11-16T17:14:58Z][dc.title.null:submit 18][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 17 :: [dc.date.issued:2006-11-16T17:14:37Z][dc.title.null:submit 17][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 16 :: [dc.date.issued:2006-11-16T17:14:16Z][dc.title.null:submit 16][dc.contributor.*:Jones, Richard]}}||

Success Metric: correct range and result count, starting from item id 20, sorted by title descending, limited to correct community

NOTE: during testing it became clear that the browse URL not only needs to be validated for structure, but also that sort_by (for one example) is sensitive to the value passed, therefore we need to build in value validation also

   handle/123456789/33/browse?type=title&order=ASC&rpp=5&sort_by=2&focus=30
   BrowseInfo String Representation: Browsing 3 to 6 of 6 in index: title (data type: title, display type: full) ||
   Browsing in community: 3 (123456789/33)||
   Listing over 3 columns: dc.date.issued,dc.title.null,dc.contributor.* on value: 2006-12-04t11:56:30z||
   Sorting by: dc.date.issued ASC(option 2)||
   {{Item ID: 30 :: [dc.date.issued:2006-12-04T11:56:30Z][dc.title.null:Submit H][dc.contributor.*:Harrison, Harry]}}
   {{Item ID: 31 :: [dc.date.issued:2006-12-04T11:57:01Z][dc.title.null:Submit I][dc.contributor.*:Ianson, Irene]}}
   {{Item ID: 32 :: [dc.date.issued:2006-12-04T11:57:30Z][dc.title.null:Submit J][dc.contributor.*:Johnson, James]}}
   {{Item ID: 33 :: [dc.date.issued:2006-12-04T11:57:50Z][dc.title.null:Submit K][dc.contributor.*:Karlson, Karl]}}||

Success Metric: correct range and result count, starting from item id 30, sorted by date issued ascending, limited to correct community

   handle/123456789/33/browse?type=author&order=DESC&value=Jones%2C+Richard&vfocus=submit+2&rpp=5&sort_by=2
   BrowseInfo String Representation: Browsing 1 to 2 of 2 in index: author (data type: text, display type: single) ||
   Browsing in community: 3 (123456789/33)||
   Listing single column: dc.contributor.* on value: submit 2||
   Sorting by: dc.date.issued DESC(option 2)||
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}||

Success Metric: correct range and result count, displaying only author "Jones, Richard", focussing on value submit 2 (this doesn't exist in the list, so the result set is correct), sorted by date issued descending, limited to correct community. It is interesting to note that these are both "mapped" items.

   handle/123456789/33/browse?type=subject&order=ASC&rpp=5&value=asdf&sort_by=0
   BrowseInfo String Representation: Browsing 1 to 2 of 2 in index: subject (data type: text, display type: single) ||
   Browsing in community: 3 (123456789/33)||
   Listing single column: dc.subject.* on value: null||
   Sorting by: dc.subject.* ASC(option 0)||
   {{Item ID: 4 :: [dc.date.issued:2006-11-16T17:09:26Z][dc.title.null:submit 4][dc.contributor.*:Jones, Richard]}}
   {{Item ID: 9 :: [dc.date.issued:2006-11-16T17:11:30Z][dc.title.null:submit 9][dc.contributor.*:Jones, Richard]}}||

Success Metric: correct range and result set, displaying only items (mapped in) with the subject asdf (although for some reason the BrowseInfo is not reporting on the value), sorted by subject ascending (doesn't really make much sense)

[edit] Testing Browse URL Summary (part 3)

A summary of the URL parameters tested above
test mode type order value focus vfocus rpp sort_by starts_with collection community
18 Full/Iten dateissued ASC N/A - - 10 0 - /2 N/A
19 Full/Item title DESC N/A - - 5 0 - /32 N/A
20 Single/Value author ASC - - - 5 0 - /34 N/A
21 Single/Value subject DESC - - - 5 0 - /38 N/A
22 Full/Item dateissued DESC N/A 20 - 5 1 - N/A /1
23 Full/Item title ASC N/A 30 - 5 2 - N/A /33
24 Value author DESC Jones, Richard - submit 2 5 2 - N/A /37
25 Value subject ASC asdf - - 5 0 - N/A /31

[edit] Progress Update: 04-12-2006, 13:15 GMT

With the above coding and testing complete that means that our Browse URL API is complete and correct as far as we can tell and test. We now must turn our attention to the next and previous links that the UI will need to render. It also means that we are tantilisingly close to a workable browse system.

There are some other things that we might want to consider after this:

  • Browse Cacheing
  • More indexes on the browse tables, depending on performance

[edit] Introducing a Next Button

Introducing the next button ought to be easy, as we have already made provisions for it. We will simply strip the last result off the result set and set that as the target of the next button.

[edit] Introducing a Previous Button

The previous button is going to be slightly more complicated, but basically we have to do the same query as the main SELECT, but *in reverse*, so that we can get the value which will be top of the previous page. It ought to be possible to simply flip the comparison operators, and perhaps attache an OFFSET clause to the query, so as to only refer to a single value result.

[edit] Onging Programmer's Notes

  • The first pass on the next page stuff looks promising. The only obvious thing it is doing wrong is *always* stripping the last value off the results, even if it isn't supposed to.
    • OK, that minor problem is fixed.
  • It looks like it will be quite straightforward in concept to get the "previous" value. Nonetheless, the code might benefit from some refactoring before we try it, as there is a lot of reuse which is not taken advantage of yet.
  • The refactoring has taken the shape of creating a new class to represent the browse SQL query. This class is populated by the relevant values, and then assembles a query from those values on request (as opposed to the un-refactored version, which assembled the query as it went along). This will be useful for obtaining the "previous" value, because we can simply flip the ordering of the query via the API, and regenerate it. Otherwise it means reassembling the inline built query with only one section changed, which would be the wrong thing to do. About to test the new code, to be sure that we haven't broken anything
  • During testing the refactored code, a problem has turned up with the BrowseEngine.getPosition method. It looks as thought this problem has always been there, and has simply gone unnoticed until this point. The problem is that the SQL which determines the current position of the start of the browse doesn't appear to return the true results for specific value browses. It is performing a SELECT DISTINCT where it shouldn't be. Investigating ...

[edit] Obtaining the current position

A flaw has been found in the code that determines the current position of the first item to be displayed. This arises because the code which generates the query does not take into account the value being browsed on, and does not correctly negate the SQL query to obtain the relevant position. For example, the following SQL generates a valid result set:

   SELECT * FROM index_2  WHERE  sort_1 <= 'submit 13'  AND  sort_value = 'jones, richard'  ORDER BY  sort_1 DESC  LIMIT 6

The query to obtain the position of the start pointer generated is as follows:

   SELECT COUNT(DISTINCT(value)) AS number FROM index_2  WHERE sort_1 > 'submit 13'

This is incorrect. Instead the query ought to read:

   SELECT COUNT(*) FROM index_2  WHERE  sort_1 > 'submit 13'  AND  sort_value = 'jones, richard';

Note here how although the direction of the comparator is correct, no sort_value was specified in the original query, and it also does a SELECT DISTINCT with insufficient cause. Some modification to the browse engine will fix this reasonably quickly.

  • getPosition only needs to select distinct when in a value based top-level browse

NOTE: There is a problem (around line 198) where value and focus are being conflated for the UI - this is why there is a problem displaying the value of a browse

  • This problem now appears to be fixed. It would certainly benefit from being refactored into the BrowseQuery class, which will make the whole engine a lot easier to look at. Meanwhile, back to testing the refactoring ...


[edit] Ongoing programmer notes

  • arg. Now there is a problem with "starts_with" not being included in the query. Hopefully this is just an oversight in the refactoring.
    • Yup, this was just a typo that occurred during the refactoring
  • There's still a misunderstanding between value and focus in the BrowseInfo object. This needs to be looked at now in case it becomes problematic later.

NOTE: we really need to do something at some point about what happens when there are no results. It's not causing any actual problems, so not yet.

[edit] Progress Update: 04-12-2006, 17:30 GMT

The code has been refactored to include a class to manage just the SQL. It is my hope that this class can handle construction of all the SQL required by the browse engine, and therefore will be made into a pluggable class which will allow a similar class for Oracle support to be created at a later date. This brief round of refactoring was to support the facility to turn the SQL query around easily so as we can obtain the top value of the previous page quickly and easily.

Meanwhile, another thought occurs for the todo list:

  • Proper internationalisation. The browse code doesn't support sorting by non-latin characters, but provided this functionality can be pushed to the database layer (*fingers crossed*), then a plugin class which is loaded and applied as the normaliser (in replacement for the current NormalizeTitle class) would enable a stack of plugins that know how to normalise sorting for multiple languages.

[