The AAVSO Data Validation Project was a two-year project begun in 2002 where AAVSO staff examine every visual observation in the AAVSO International Database made between 1911 and 2001. This project has been completed, and all AAVSO Light Curves are now available directly via the AAVSO Website. A summary of the Validation Project was published by Malatesta et al. in the Journal of the AAVSO, volume 34 (2006).
Percent Validation Project Completed
Data Validation: What is it?
Simply put, data validation is the process by which AAVSO Headquarters staff scrutinizes observations in the AAVSO International Database for any possible errors.
The purpose of data validation is not to produce a pretty light curve devoid of scatter, but rather to ensure that no unintentional errors creep their way into the database. Such errors include: designation/name discrepancies; comment field code and observer initial oversights; and JD or magnitude problems. Fortunately, many of these types of errors are no longer encountered since computer software has become much more sophisticated than it was in the early days of computerization.
As the largest and highest quality digital database of variable star observations available, thousands of researchers, educators, and students have used the wealth of information contained in the database for both personal and professional projects. As an impressive testament to the work and dedication of its observers, the AAVSO fulfills thousands of requests for data annually. While the AAVSO maintains a strict quality-control policy to ensure error-free data for such requests, there had never been a systematic survey of all the data to look for potential and obvious errors, and to investigate and rectify any problems. That is, until the Data Validation Project.
The Data Validation Project
In 2002, NASA awarded the AAVSO a 2-year grant to error check or validate observations for 4,922 stars in the AAVSO International Database from the founding of the organization in 1911 through 2001. The stars chosen for validation were of many types, with the heaviest concentrations coming from the pulsating (66%) and the eruptive-type (23%) variables. Eclipsing binary, RR Lyrae, and comparison star data, as well as select stars with complex histories or light curves, were not included at this time. Nonetheless, the project encompassed the validation of over 10 million observations!
A very important part of the Data Validation Project was to find, investigate, and resolve discrepant observations, mostly by comparison with the observer's submitted report. In fact, out of the observations validated, 633,126 were considered to be discrepant and 441,879 (70%) of those were repaired by changing data fields to match the observer's original reports.
If a suspicious data point could not be resolved, it then became subject to a strict set of rules agreed upon by the Director and the Technical Staff. During this phase, the Technical Staff scrutinized the data by looking at each individual light curve included in the project. Points that would negatively affect the analysis of data to a statistically significant level were flagged discordant, as represented by an editorial letter code inserted into the digital record. None of the data deemed discrepant were ever deleted from the permanent archives. The data remain in the AAVSO International Database and are available upon request.
The Data Validation Project was completed in September of 2004, meeting the 2-year deadline. By the close of the project and with 9,324 staff hours logged, over 10 million observations contributed by over 6,000 observers worldwide were made available via the AAVSO web site. Since that time, well over 4,000 downloads of validated data have been made. Not only has this project significantly cut down on the amount of data requests received at HQ, but it has also given users immediate access to the treasures contained in the AAVSO International Database!
The validated data for each star may be viewed graphically online through the AAVSO Light Curve Generator or may be downloaded as a data file. Interested observers may wish to make use of the data usage report program which sends an e-mailed monthly log stating the number of times any of their data was accessed via the online data download tool.