Archival Data Digitization -- Work In Progress

The AAVSO's volunteer-driven project to digitize paper archives of variable star observations is moving forward, and there have been some exciting developments over the past few months that deserve to be mentioned.  Here, I'll summarize some of the happenings in this project, and talk about some of the issues that have arisen at AAVSO and among the volunteers. 


What we're doing

In September of 2010, we announced a project to make data published in the Annals of the Harvard College Observatory available in digital form for the use of the variable star community.  Many thousands of variable star observations were published in the Harvard Annals both before and after the founding of the AAVSO, and many of these records were never entered into our online database.  This makes the data difficult to use unless the researchers themselves do the work.  By digitizing these observations ourselves, we can make these observations live again as part of the AAVSO International Database.  There are hundreds of variables for which we can extend light curves back in time for a decade or more; for a few stars, recorded observations go back even further, to the early 19th Century or earlier!  Making these data publicly available enables more and better science both for these "new" data sets, and for the existing AAVSO archives.

Several individuals have volunteered their time and expertise to digitizing these datasets in whatever way they can.   We've made wonderful progress so far, but there's much more work to be done!

Highlights

Kevin Paxson digitized the observations of Friedrich Argelander and Edward Schoenfeld collected by Pickering in Harvard Annals 33, amounting to nearly 5700 observations of 40 different variable stars.  Most of these stars are those with existing long-term data sets in the AID, and these data extend the light curves backwards in time by several decades.  Kevin also did some work on Harvard Annals 63, digitizing several hundred more observations from that paper.  He is now working on his own project to collect and digitize all historical literature for the bright Mira R Leonis.

Andrew Rupp was a young man who volunteered for the AAVSO for a brief time prior to his untimely passing from cancer in early January 2011.  Andrew helped to digitize over 2700 observations from Harvard Annals 37, extending the light curves for eleven Mira variables back in time by over a decade.  We're very grateful to Andrew for his work with the AAVSO, and to his family for making sure his work was safely transmitted to us after his passing.

Christian Froschlin contacted us in 2010 offering to apply the optical software & machine learning tools developed by his company to this project.  His software does a remarkable job of digitizing large collections of tables, although it is still time intensive -- one has to "teach" the software how to read each table, and it's clearly best applied to large collections of data where the tables and fonts are all similar.  We're really excited about this software, and we'll be working with Christian to maximize its potential both for the Harvard Annals project, and for other published papers from the literature.

AAVSO Councillor Bob Stine volunteered to digitize the collected observations of the outburst of Nova Persei 1901 No.2, now known as GK Persei.  The existence of these data in printed form but not in the AID has been bothering me for years, and I was very happy to have this work done.  Bob finished this in mid-March 2011, providing us with a great new data set for this star.  The data set consists of over 3300 observations of this Nova, from its discovery on February 19, 1901 to August 8, 1902 when it finally fell below magnitude 9.  We have a little more work to do to generate observer codes for a number of "new" observers, and then we'll enter the data into the AID.  For now, please enjoy the beautiful light curve at the top of this page, and we look forward to making the data available very soon!

Where we're going

The enthusiasm of the volunteers for this project has been wonderful to see, but challenging to keep up with.  Aside from the scarcity of free time at AAVSO headquarters, we've been bottlenecked to some extent by our ability to handle so much data from non-AAVSO observers in the published literature.  We're about to overcome that through an automated program that can quickly assign initials to a large number of observers quickly, which will in turn allow us to process digitized data from non-AAVSO observers much faster.  The AAVSO Science Team also had a brainstorming session earlier this week to make some changes to the AAVSO International Database, and one of the results is that we'll be adding some new fields to our data tables including columns for a reference (to indicate if the observation was taken from a published paper), and also a column for a digitizer's initials.  Keeping track of the digitizer (in the same way that we tag each observation with an observer code) will allow us to keep track of how the data are entered and (importantly) to give credit to the valuable work that our digitizers are doing.  Most of these changes will be transparent to AAVSO observers and to people using our data, although those who download data will now know the origins of digitized data as well as have identifying information for the people who digitized it.

In late 2010 we submitted a grant proposal to the National Science Foundation requesting modest funds to cover staff expenses for this project and related activities, and also to provide some additional financial support for the AAVSO Archives and infrastructure.  We'll keep our fingers crossed that our proposal is accepted, but we're also pursuing other grant opportunities for more focused projects.  There are many datasets within the AAVSO's physical archives whose addition to the online AID would be of value to the community, and we'll be writing small proposals to assist us in that work.  In the meantime, we'll continue the good progress our volunteers are making with the Harvard Annals and do our best to keep up with them!

A number of people inquired about volunteering beyond those mentioned above.  We've given those volunteers suggestions on how to contribute, and we're hoping to coordinate with them and get them digitizing data soon.  Although we've been contentrating on larger papers from the Harvard Annals, there were a great number of smaller papers published in other early astronomical journals such as The Astrophysical Journal, Monthly Notices of the Royal Astronomical Society, Astronomische Nachrichten, and others, and data from those papers would also be valuable additions to the AID.  A great deal of that work is simply finding those papers and seeing what data they have to offer.  Another possible contribution might be to rereduce visual photometry made using different magnitude systems (an especially good project for those who enjoy both mathematics and computer programming).  And the best project of all is to use the data!  The addition of decades of observations to the AID for many stars might reveal some interesting new things about these variables, and we're looking forward to the science that will come from the work of our volunteers.  We hope that data miners and other researchers will put these data to good use very soon!