Skip to main content

Data Cleaning Algorithm(s)?

3 posts / 0 new
Last post
davidjayjackson's picture
Data Cleaning Algorithm(s)?

Are we using any algorithm(s) to do inital cleaning of observations, before making them available to zapper? For instance do we "flag" magnatudes that are +- 3  magnatudes that of the daily mean?


Thanks in advance,


Python :-) Probably the

Python :-) Probably the simplest (actually a must) way would be looking at a graph. But you may also look for e.g. for sigma clipping algorithms.

You can use Bouguer curve(s) (magnitude=f(airmass) as well, it works well to detect passing clouds (i.e. intrinsically somewhat (till a lot) less reliable measurements).

Best wishes,

No cleaning yet


To answer your question we do not have any automatic cleaning procedures yet. There are two main reasons for this.

1.)  What is a good threshold? A star could change drastically from one day to the next and in the case of nova like events it would probably erroneously mark certain things discrepant. If you dealt with those correctly then it would likely miss obvious outliers on stars which vary much slower. You would have to tailor it to star type.

2.) Data gaps. Not all stars are well observed and so implementing any automatic reporting would have to be done on a star by star basis. If you tried to do it on a star where there wasn't enough data or not enough data in a given time span, you would have lots of issues. 

We have looked into and are working on tools to do this for a subset periodic variables, but even then they have to be checked to make sure there is enough data and that there aren't any period changes etc. The bottom line is that it's a lot more difficult than it may seem at first glance. If you would like to help us develop these algorithms though, I'm more than willing to listen to suggestions. 


Bert Pablo
Staff Astronomer, AAVSO

Log in to post comments
AAVSO 49 Bay State Rd. Cambridge, MA 02138 617-354-0484