Skip to main content

Data validation/ DB cleanup (HL Cma)

4 posts / 0 new
Last post
lmk's picture
Data validation/ DB cleanup (HL Cma)

Just noticed a couple of positive observations of HL Cma at 9th magnitude. This does not seem reasonable, historically its never been brighter than 10th. Then, checking back a long way, there is a 2nd magnitude "visual validated" one back around 1986 too.

I  think we need to look at a way to automatically verify/validate grossly erroneous data. A "polluted" DB does no one any good. Maybe a simple check against a variables known range in WebObs, and if a positive obs is entered outside of this range it sends an alert notice to the observer as well as HQ to check on it.

Mike LMK

wel's picture

"Are you interested in helping HQ make sure that our variable star data remains the best in the world? You can help us to achieve that goal by using Zapper. Zapper is a new Java, multi-platform program that will enable our members *and observers* to help us in the massive task of making sure our data is of the highest quality."

lmk's picture

Ok, I downloaded it and marked the 3 discrepancies. What happens now?

BSJ's picture

What happens after you "zap" an observation using Zapper is that the name of the star appears in a drop-down list in my super-duper version of Zapper (called Javazap). It thereby becomes part of my queue of stars to investigate. When I then look at the lightcurve of HL CMa using Javazap, I will see your initials next to the observations you flagged along with any comments you have added.

What happens next depends on what I see when I look at the observation in detail. If I think that there is some probablility that the erroneous magnitude was a typographical error and we have the email address of the observer, I will write to them asking them to check their original records. If they write back with a correction, either that person or I will fix the observation. If we have no means to contact an observer or they confirm that they made no typo, I will make a value judgement as to whether the observation is truly discrepant or not, and if it is, I will mark it as such so that it won't show up on the LCG or in a data download unless discrepant observations are specifically requested. The guidelines we use for determining if an observation is discrepant ot not are spelled out in the attached document (zapperhelp.pdf). Very occasionally, I disagree with the person who did the zapping and leave the observation alone.

The other possibility is that if HQ has a paper record of the observation (mostly pre-WebObs) or if it was an observation entered by one of our volunteer digitizers, I can find the original record and check it myself for typos.   

Right now, my queue is pretty long (about 75 stars) because a few people have been fairly prolific zappers lately and I have a lot of other projects on my plate, but I am gradually catching up and hope to finish doing so by the end of next week.

I feel that this is a fairly good system and it does help to find and fix (when possible) discrepant observations. The only thing I regret is that I have not yet put in place an easy way to give feedback to the people who use zapper. I don't have time to write to them individually to thank them, but I might be able to figure out a way to post statistics or send them a summary email once a month or something. I am open to suggestions/ideas. Something automated would be good.

In any case, you should know that your zapping efforts are never in vain. I see the observations you flag and will do something about them one way or another. If you wish to draw my attention to something particularly aggregious or requiring a timely response, you can always send me an email.

Thanks for asking!
Sara Beck, AAVSO Technical Staff

Log in to post comments
AAVSO 49 Bay State Rd. Cambridge, MA 02138 617-354-0484