Militieregisters.nl is a new transcription project organized by the City Archive Amsterdam that plans to use crowdsourcing to index militia registers from several Dutch archives. It’s quite ambitious, and there are a number of innovative features about the project I’d like to address. However, I haven’t seen any English-language coverage of the project so I’ll try to translate and summarize it as best as my limited Dutch and Google’s imperfect algorithms allow before offering my own commentary.
To research how archives and online users can work together to improve access to the archives, the Stadsarchief Amsterdam has set up the “Many Hands” project. With this project, we want to create a platform where all Dutch archives can offer their scans to be indexed and where all archival users can contribute in exchange for fun, contacts, information, honor, scanned goods, and whatever else we can think of.
To ask the whole Netherlands to index, we must start with archives that are important to the whole Netherlands. As the first pilot, we have chosen the Militia Registers, but there will soon be more archival files to be indexed so that everyone can choose something within his interest and skill-level.
Militia registers contain the records of all boys who were entered for conscription into military service during almost the entire 19th and part of the 20th centuries. These records were created in the entire Netherlands and are kept in many national and municipal archives.
The militia records are eminently suitable for large-scale digitization. The records consist of printed sheets. This uniformity makes scanning easy and thus inexpensive. More importantly, this resource is interesting for anyone with Dutch ancestry. Therefore we expect many volunteers to lend a hand to help unlock this wonderful resource, and that the online indexes will eventually attract many visitors.
But the first step is to scan the records. Soon the scanning of approximately one million pages will begin as a start. The more records we have digitized, the cheaper the scanning becomes, and the more attractive the indexing project becomes to volunteers. The Stadsarchive therefore calls upon all Dutch archival institutions to join!
- At our institution, online scans are provided for free. Why should people pay for scans?
Revenues from sales of scans over the two years duration of the project are a part of the financing of the project. The budget is based on the rates as used in the City Archives: € 0.50 to € 0.25 per scan, depending on the number of scans that someone buys. We ask the institutions that participate throughout the project do not sell their own scans or make them available for free. After the completion of the project, each institution may follow its own policy for providing the scans.
- If we participate, who is the owner of the scans and index data?
After production and payment, the scans will be delivered immediately to the institution which provided the militia records. The index information will also be supplied to the institutions after completion of the project. The institution remains the owner, but during the project period of approximately two years the material may not be used outside of the project.
- What are the financial risks for participating archives?
Participants pay only for their scans: the actual costs and preparation of the scanning process. The development and deployment of the index tool, volunteer recruitment and two years maintenance of the website from the project has been funded by grants and contributions by Stadsarchief Amsterdam. There are no financial surprises.
- What does the schedule for the project look like?
On July 12 and September 13 we are organizing meetings with potential participants to answer your questions. Until October 1, 2010, participants will sign up to participate in the project, in order for the scanning to start on that day. The tender process runs about 2 months, so a supplier can be contracted in 2010. In January 2011 we will start scanning, volunteers can begin indexing in the spring. The sister site www.velehanden.nl–where the indexing will take place–will continue online for at least one year.
- Will the indexing tool be developed as Open Source software?
It is currently impossible to say whether the indexing tool will be developed via/as open source software. Of primary importance is finding the most cost-effective solution and that the software performs well and is user-friendly. The only hard requirement is the use of open standards for the import and export of metadata, so that vendor independence is guaranteed.
Below are some ideas SAA has formulated regarding the functionality and sustainability of VeleHanden.nl:
- Facilities for importing and managing scans, and for exporting data in XML format.
- Scan viewer with advanced features.
- Functionality to simultaneously run multiple projects for indexing, transcription, and translation of scans.
- Features for organizing and managing data from volunteer groups and for selectively enabling features for participants and volunteer coordinators.
- Features for communication between archival staff and volunteers, as well as for volunteers to provide support to each other.
- Automated features for control of the data produced.
- Rewards system (material and immaterial) for volunteers.
- Many volunteers may work in parallel to process scans quickly and effectively.
- Facilities to search, view and share scans online.
Other Dutch bloggers have covered the unique approach that Stadsarchief Amsterdam envisions for volunteer motiviation and project support: Militieregisters.nl users who want to download scans may either pay for them in cash or in labor, by indexing N scanned pages. Christian van der Ven’s blog post Crowdsourcen rond militieregisters and the associated comment thread discusses this intensely and is worth reading in full. Here’s a loosly-translated excerpt:
The project assumes that it can not allow the volunteer to indicate whether he wants to index Zeeland or Groningen. It is–in the words of the project leader–the Orange feeling, to see if the rural people can volunteer and not just concentrate on their own location. Indexing people from their own village? Please, not that!
Well since the last World Cup I’m feeling Orange again, but overall experience and research in archives teaches that all country people are more interested in the history of themselves, their own ancestors, their homes and the surrounding area. The closer [the data], the more motivation to do something.
And if the purpose of this project is to build an indexing tool, to scan registers, and then to obtain indexes through crowdsourcing as quickly as possible, it seems to me that the public should be given what it wants: local resources if desired. What I suggest is a choice menu: do you want records from your source environment? Do you want them maybe only from a certain period? Or do you want them filtered by time and place? That kind of choice will trigger as many people as possible to participate, I think.
- The pay-or-transcribe approach for acquiring scans is a really innovative approach. Offering people alternatives for supporting the project is a great way of serving the varied constituencies that compose genealogical researchers, allowing cash-poor, time-rich users (like retirees) an easy way to access the project.
- Although I have no experience in the subject, I suspect that this federated approach to digitization–taking structurally-similar material from regional archives and scanning/hosting it centrally–has a lot of possibilities.
- Christian’s criticism is quite valid, and drives right to the paradox of motivation in crowdsourcing: do you strive for breadth using external incentives like scoreboards and free recognition, or do you strive for depth and cultivate passionate users through internal incentives like deep engagement with the source material? Volunteer motivation and the trade-offs involved is a fascinating topic, and I hope to do a whole post on it soon.
- One potential flaw is that it will be very hard to charge to view the scans when transcribers must be able to see the scans to do their indexing. I gather that the randomization in VeleHanden will address this.
- The budget described in the RFP is maximum 150000 Euros. As a real-life software developer, it’s hard for me to see how this would pay for building a transcription tool, index database, scan import tool, scan CMS, search database and (since they expect to sell the searched scans) eCommerce. And that includes running servers too!
- This is yet another project that’s transcribing structured data from tabular sources, which would benefit from the FamilySearch Indexer, if only it were open-source (or even for sale).