Developing a useful book index is a tedious process. With some exceptions, it is not sufficient to merely list the key words and phrases in a book; rather, for an index to be useful those words and phrases must be organized within their context and their relationship to each other.

For example, consider the index we created for a 700 page philosophy text that drew on a particular relationship between “facts, objects, and processes.” It clearly will not work to simply list all the pages in which the words “facts,” “objects,” and “processes” appear; the list for each would be extremely long and essentially useless to a reader wanting to explore those particular aspects of the book. It was necessary to list where these three words were defined, as well as structure the index to show how they related to each other.

Once the index taxonomy is established, it is then necessary to format it into a word processing document that can be imported into text layout programs, such as Indesign or Quark. That requires a different set of skills than the process of building and organizing the index.

To address these requirements, we have created a process in which an editor scans the text and builds a multi-column spreadsheet; how the data are organized in the columns captures the logical structure of the index. A program is then run against that spreadsheet to create a fully formatted index file that can be directly imported into a text layout program.

There are several central benefits to this approach. One is that a person defines the index; it is not simply a programmatic scan of keyword occurrences. Another is that the formatted index is created programmatically, saving someone hours of tedious formatting work. Additionally, revisions to a text during successive copyediting phases may well cause page references to change for particular words or phrases; since the formatted index is created programmatically, the program can be rerun at any time to create a new formatted index, saving someone much time to check the page number correspondence for every reference in the index.

The results of this approach are evident in the index for the above mentioned philosophy book. From a spreadsheet with just under 2200 lines, the program created the index in just over one minute. This rapid creation process easily allowed multiple proof runs for approval by the publisher and the author. Furthermore, when a final copyediting pass resulted in a slightly revised text, running the program to generate a new final index required less than ten minutes.