I've moved into what is probably the least glamorous phase of development: security, permissions, and user management.
There are four (or five) different roles in FromThePage, with some areas of ambiguity regarding what those users are allowed to do.
- Admins are the rulers of a software installation. There are only a few of them per site, and in a hosted environment, they hold the keys to the site. Admins may manage anything in the system, period.
- Owners are the people who upload the manuscript images and pay the bills. They have entered into some sort of contractual relationship with the hosting provider, and have the power to create new works, modify manuscript page images, and authorize users to help transcribe works. In theory, they'll be responsible for supporting the scribes working on their works.
- Scribes may modify transcriptions of works they're authorized to transcribe. They may create articles and any other content for those works. They are the core users of FromThePage, and will spend the most time using the software. If the scribes aren't happy, ain't nobody happy.
- Viewers are registered users of the site. They can see any transcription, navigate any index, and print any work.
- Non-users are people viewing the site who are not signed in to an account. They probably have the same permissions as viewers, but they will under no circumstances be allowed to create any content. I've had enough experience dealing with blog comments to know that the minute you allow non-CAPTCHA-authorized user-created content, you become prey to comment spammers who will festoon your website with ads for snake oil, pornography, and fraudulent mortgage refinance offers. [June 8 Update: Within thirty-six hours of publication, this very post was hit by a comment spammer peddling shady loans, who apparently was able to get through Blogger's CAPTCHA system.]
There are two open questions regarding the permissions granted to these different classes of user:
- Should viewers see manuscript images? Serving images will probably consume more bandwidth than all other uses combined. For manuscripts containing sensitive information, image service is an obvious security breach. The only people who really need images (aside from those who find
unclear
tags with links to cropped images insufficient) are scribes. - Should viewers add comments? For the reasons outlined above, I think the answer is yes, at least until it's abused enough for me to turn off the capability.
For those who have never programmed enterprise software before, the reason security gets such short shrift is that fundamentally it's about turning off functionality. Before you get to the security phase of development, you have to have already developed the functionality you're disabling. By definition, it's an afterthought.
Sara says
Two comments:
1) I wouldn’t worry about bandwidth costs. We can use Amazon S3 if or when it seems appropriate, and I don’t think costs will go up much over time.
2) Why not let work owners turn commenting on or off for a particular work? I wouldn’t worry too much abou granular levels of commenting (only be scribes, by any viewer, etc) for the first rev, though.
Gavin Robinson says
I’d be in favour allowing all registered users to view images and post comments if at all possible. No transcription is ever likely to be perfect even when the source text is perfectly clear. One of the most exciting possibilities of putting material on the web is the ability of users to improve it after publication. If someone spots a mistake in the transcription there should be a mechanism for them to report.
Ben W. Brumfield says
Sara,
We’ll see about the bandwidth issues — I’ve never done any sort of hosting before, so I have no idea where the costs come from. A good argument for getting a variety of early (free) users is to gather data on what my costs actually will be.
I like the notion of per-work controls for commenting and things. In general, I’d like to default to making works as accessible, interactive, and transparent as possible.
Ben W. Brumfield says
Gavin,
I completely agree about post-publication improvement. I got hooked on Pepys Diary Online, and enjoyed it so much that Sara gave me a print copy of Pepys for Christmas. To my surprise, reading it just isn’t the same. I miss the annotations, explanations, and speculations from the commenters.
Regarding reporting mechanisms, I’m pretty sure I’m going to need a way of flagging pages for review by a scribe. Probably this will just be a boolean on individual comments, which will allow me to display a “pages needing attention” list by searching for flagged comments.
Do you know of any examples of this reporting feature you can recommend? Surely this is a common problem, but Wikipedia just overloads their category system to do it, which doesn’t give you any more information that the name of the “article needing attention”.
Gavin Robinson says
So far I haven’t seen the kind of feedback mechanism that I want to see. Even the excellent Old Bailey Proceedings is quite disappointing in that respect: if you spot a transcription error all you can do is e-mail them.
Ideally every page should have a link saying “report error on this page” or whatever, and when you click it you get a form which knows which page it relates to where you can suggest corrections.
Even better would be a PGDP style interface where you can edit the text directly and then submit it for review. Site admins could then compare the old and new version side by side and decide whether to approve the correction. I definitely wouldn’t want a Wikipedia style thing where anyone can change the text and changes go live without having to be approved.
Ben W. Brumfield says
Gavin,
What you describe doesn’t seem that different from page-linked comments with flagging.
Fundamentally, you’re providing a mechanism to flag a page with a comment about what’s wrong, requesting the attention of either the owner (for problems with the image itself), the scribes (for transcription/interpretation issues), or the public at large (for questions about the transcription content).
I came up with a design for this (page-linked, flaggable comments with associated inboxes/viewers) about a year and half ago, but lost it in an email crash. As I remember, the design difficulty wasn’t in the page linking, but is in the permutations of who the commenter is posing their question to. Is my question to the owner? The scribes? The person to edited this revision? To Joe Smith? What constitues the question being answered, or the issue being resolved? Who decides?
I’ll mull it over a bit more and then do a full post on it. It’s dependent on the user management/ownership code, so if I can just slog through that it may move to the front of my queue.
Currently I’m looking at using/extending the Rails acts_as_commentable plugin.
Could you elaborate on “PGDP”? I’m not familiar with the term. (Though I’m very familiar with the authoring/publishing workflows in CMS software, which may be the direction you’re heading.)
Gavin Robinson says
PGDP = Project Gutenburg/Distributed Proofreaders – I was just too lazy to type it out. Basically when you’re proofreading on their site you get a screen with the scanned page image at the top and the text box below it. So far I’ve only been thinking about this specifically in terms of correcting transcription errors – the users submits a new version of the page, then the server compares them and flags the differences so that the editor can decide whether to accept or reject the changes – but you’re right that there could be a need for feedback about other things which would make everything more complicated.