Collection Notes

Introduction

In order to give myself a sporting chance of finding things, and in order to satisfy those latent tendencies to categorize and analyze data in excess, I offer the following suggestions and rules for cataloging collections of things.

Collation and Alphabetization

The advent of computer technology (offering simple means to sort bazillions of items) and the collective human tendency towards laziness and sloppiness are in direct conflict. Collation is the implementation of a sorting algorithm for organizational purposes.

For collation of a collection (by alphabetization or any other sorting mechanism) to be effective, the sort key must be properly standardized and that standard must be adhered to with rigor!

Unfortunately, any attempt to adhere to this lofty goal of standardization will quickly stumble in the face of the blizzard of exceptions offered by the marketplace.

Sample problems

To illustrate this problem, note just a few examples from the audio kingdom. I use a 'File Under' field aka funder to sort my audio media. Here are some problems:

Sample solutions

Groups may be listed as either distinct entities, eg., King Crimson, or under a primary performer name. The preumption is that group names should take preference over performer names, but groups named after the performers will be filed by performer names in order to try and collect works by the same performers together. This approach causes problems with some group names (the Billy Tipton Memorial Saxophone Quartet → Billy Tipton Memorial Saxophone Quartet, the; NOT Tipton, Billy [Billy Tipton Memorial Saxophone Quartet]) but that's just how it goes. As shown in the preceding exmaple, the groupname should be given as a bracketed suffix to the alphabetized performer name when the performer name is used as the funder key. For example,

Note that alphabetization should be independent of accents and other incidental punctuation, but enforcing that is difficult, requiring careful attention to creation of the sortname entries! In other words, despite the presence of the accented i in the second name below, we should expect our algorithm to give the following order:

The last instance in each of the two preceding examples also points out that nicknames are generally NOT sortnames, and ought to be delimited by single quotes rather than any other mark. Unfortunately, given the common practice of performers to self-identify with nicknames rather than formal birthnames, any attempt to hold too rigidly to the aforementioned model is doomed to fail. Best practice is to provide enough detail to turn up a given artist no matter how the name is specified. John Birks 'Dizzy' Gillespie ought to be easy to find, right? One method of solving this problem is to implement the following somewhat cumbersome multi-field naming scheme: