The trouble with people as subjects is that they have names, and that personal names are hard.
- Names in the text may be illegible or incomplete, so that Reese _____ and Mr ____ Edmonds require special handling.
Names need be remembered by the scribe during their transcription. I discovered this the hard way.
After doing some research in secondary documents, I was able to "improve" the entries for Julia's children. Thus Kate Harvey became Julia Katherine Brumfield Harvey and Mollie Reynolds became Mary Susan Brumfield Reynolds.
The problem is that while I'm transcribing the diaries, I can't remember that "Mary Susan" == Mollie. The diaries consistently refer to her as Mollie Reynolds, and the family refers to to her as Mollie Reynolds. No other person working on the diaries is likely to have better luck than I've had remembering this. After fighting with the improved names for a while, I gave up and changed all the full names back to the common names, leaving the full names in the articles for each subject.
- Names are odd ducks, when it comes to strings. "Mr. Zach Abel" should be sorted before "Dr. Anne Zweig", which requires either human intervention to break the string into component parts or some serious parsing effort. At this point my subject list has become unwieldy enough to require sorting, and the index generation code for PDFs is completely dependent on this kind of separation.
I'm afraid I'll have to solve all of these problems at the same time, as they're all interdependent. My initial inclination is to have subject articles for people allow the user to specify a full name in all its component parts. If none is chosen, I'll populate the parts via a series of regular expressions. This will probably also require a hard look at how both TEI and DocBook represent names.