Skip to main content

Data Mining section

NMR's picture
Joined: 2010-08-04

I appreciate that the former Data Mining Section is currently inactive but would it be possible to have a forum specifically for questions and answers on data mining? That, more than anything else, might revive this area of the AAVSO's work.

data mining section
HQA's picture
Joined: 2010-05-10

Hi Martin,

We will take your suggestion under advisement.  We are currently limiting the number of forums until we see how active they are; we're trying to keep proliferation down as much as possible.  If interest is shown for a specific topic, then we may split it off.  Thanks for reading the forums!


Data Mining project...
Andrey Prokopovich
Andrey Prokopovich's picture
Joined: 2011-09-06


I would like to share some words about VS-COMPAS Project.

We have organized our project at October 2011 and now we have more than 1000 new variable stars discovered. About 13 members were participated  but now 6-7 members are active in team. We submitting our discoveries to VSX and of course our team once again wants to say thanks to Sebastian Otero,  he's doing great job, and sometimes we believe that hi is our team member too. :) We using custom software to perform data analysis and now still working at English manual.

If someone would like to join us, you are welcome! More info about us you can find at our website:

Project coordinator,
Andrey Prokopovich

Data Mining
ZPA's picture
Joined: 2010-07-26

   Data Mining is fascinating to me, but also incredibly confusing! For what it's worth, I'd also like to see a Q&A type forum on the site, especially if experienced observers could help total introductory level ones like me! How does one get started? What kinds of projects are being worked that a beginner could help with? I joined the actual Data Mining Section last year hoping to learn more about it, before realizing that the section I'd joined was already inactive.

Some data-mining opportunities using the public surveys
Sebastian Otero
Sebastian Otero's picture
Joined: 2010-09-19

Hi, Paul,

There are a lot of examples of data-mining activities. Nowadays one of the most rewarding activities is combining data from the different available sky surveys to find new variables or to solve already known ones without elements or a proper classification.
The VS-COMPAS team lead by Andrey (thanks, Andrey, for your kind words) is doing this with a lot of success.
There are several databases with publicly available data that you can data-mine. The ones I find more interesting are ASAS-3, NSVS, CRTS, OGLE, MACHO, LINEAR, Kepler and SuperWASP. Some are easier to use than others (the first 4) but nowadays there is software capable of opening datafiles from all those surveys, I am mostly using the easier ones adn I have years of data ahead to analyze! The possibilities are endless.
The hardest part is probably getting used to each survey's vagaries, their limitations, their pros and cons. You need to know what to expect of the data you find. Some show meaningful error figures or flags, some not, for each survey you need to know there are instrumental issues (exposure times, focusing, resolution) that affect the results. Each one has a different spatial resolution so they will perform differently in crowded fields (e.g. you can expect blended magnitudes if there are stars less than 50" apart in NSVS. That shrinks to 25" in ASAS-3 for faint stars, to 15" in APASS and 10" or less in CRTS).
With ASAS-3 you have the results in 5 differtent apertures and sometimes you need to choose which one is best (the smallest aperture for faint stars and the largest one for bright stars).

Some surveys show HJD dates, others show HJD-2450000, others MJD or modifed versions of that MJD. You need to be very careful with that and convert everything to HJD to be able to combine data.

And each survey has a different filter/set of filters. Or it may be an unfiltered survey with different calibrations. E.g. ASAS-3 and APASS are V. CRTS, SuperWASP and NSVS are unfiltered surveys that are calibrated using V magnitudes but since they are more red sensitive, their results are always brighter for thre red stars. OGLE has Ic magnitudes. MACHO has Rc. HIPPARCOS has Hp magnitudes easily converted to V unless the star is a red giant or a peculiar object. Kepler has Kp magnitudes. For most stars you can combine data to derive elements by simply shifting the observations to the same zero point (I mean when you combine a stabdard dataset like V with an unfiltered one).

E.g.: combining data from ASAS-3 with CRTS extends the time baseline from 9 years (2000-2009 for ASAS) to 13 years (CRTS is being updated in real time). Combining CRTS with NSVS (1999-2000), increases that baseline for a year or two more. Combining ASAS-3 with HIPPARCOS (1989-1993) you'll get a time baseline of 20 years! This improves a lot the period determinations since you have many more cycles.

Each survey has a dynamic range where its observations are useful. HIPPARCOS is good from 1 to 10 or so, ASAS-3 from 6-7 to 13-14, NSVS from 9-10 to 14, CRTS from 12-13 to 20 or so you need to take that into account too. Using saturated (too bright) or too faint (large scatter) data won't be useful and may ruin your analysis.

And don't forget the AAVSO International Database. There are a lot of observations to use in our database that can be used for a combined analysis.

So it depends on what you like or want to do, if you want a prefered variable type, if you want to find new stuff or correct old stuff.

VSX gives you a lot of data-mining oportunities too.
You can help us keep VSX up to date and with improved information.
VSX offers you the chance to make different type of searches, e.g. searches by variability type, by magnitude, by name.
For instance, you can find mira types with no period or eclipsing binaries with no period (any search you make offers you the chance to order the results by a number of parameters, including period) and then you can data-mine the public surveys to find the star's elements and submit a revision to VSX.
(First, do a literature check of the star in SIMBAD or the ADS to see if there is no paper published with that information. In case you find that, you can still submit a revision to add the information adding the relevent paper in the reference field in the VSX revision form. That is also a great help in order to keep the database updated).

Other example is revising stars with ambiguous classifications in surveys. This would help improving the statistics on the different variability types. For instance, most surveys give automatic multiple types to the same star. A star may be classified as EC|ESD|RRC|DSCT when it is an EW so defining the correct type is another interesting contribution to help cleaning up the database. In this same example, the period may be half or twice the actual value, because if the binary solution is chosen the period will be longer (two maxima and minima) and if the pulsating solution is preferred, the period will be shorter (just one minimum and one maximum).

Also correcting wrong classifications, updating ranges, etc. using survey data is welcome.

There may be more data-mining projects in the future about this (e.g. checking GCVS stars with poor information) so keep your eyes open.
But you can start now with the lots of possiblities that data-mining offers you.

Ah! What I had forgotten to mention was that you can get access to most of the surveys I mentioned just by doing a VSX search on position and then clicking on each survey's name in the VSX external links menu.


data mining
daveh's picture
Joined: 2012-10-08

I've been doing some data mining for about the last 4 months with the Lowell Observatory LARI (Lowell Amateur Research Initiative) program.  The project I'm working is concerned with extrasolar transits (planet hunting) rather than variable stars, but utilizes similar techniques.

I've been planning to become involved in the AAVSO data mining activity for some time now and hopefully will get started in the near future.  So, I'd be very interested participating if the forum is active.  I've actually been using some of the AAVSO analysis tools for my LARI work.

Data Mining
ZPA's picture
Joined: 2010-07-26

  I just wanted to thank Sebastian for providing so much detailed information in his post. It seems to me a good start (correct me if I'm wrong) would be to get more familiar with the ASAS-3 and other databases that you mentioned, and also with the information that VSX can provide. I'll read through your entry more thoroughly when time permits. Thanks again!

AAVSO 49 Bay State Rd. Cambridge, MA 02138 617-354-0484