Skip to main content

Data Mining section

14 posts / 0 new
Last post
NMR's picture
Data Mining section

I appreciate that the former Data Mining Section is currently inactive but would it be possible to have a forum specifically for questions and answers on data mining? That, more than anything else, might revive this area of the AAVSO's work.

HQA's picture
data mining section

Hi Martin,

We will take your suggestion under advisement.  We are currently limiting the number of forums until we see how active they are; we're trying to keep proliferation down as much as possible.  If interest is shown for a specific topic, then we may split it off.  Thanks for reading the forums!


Andrey Prokopovich
Andrey Prokopovich's picture
Data Mining project...


I would like to share some words about VS-COMPAS Project.

We have organized our project at October 2011 and now we have more than 1000 new variable stars discovered. About 13 members were participated  but now 6-7 members are active in team. We submitting our discoveries to VSX and of course our team once again wants to say thanks to Sebastian Otero,  he's doing great job, and sometimes we believe that hi is our team member too. :) We using custom software to perform data analysis and now still working at English manual.

If someone would like to join us, you are welcome! More info about us you can find at our website:

Project coordinator,
Andrey Prokopovich

ZPA's picture
Data Mining

   Data Mining is fascinating to me, but also incredibly confusing! For what it's worth, I'd also like to see a Q&A type forum on the site, especially if experienced observers could help total introductory level ones like me! How does one get started? What kinds of projects are being worked that a beginner could help with? I joined the actual Data Mining Section last year hoping to learn more about it, before realizing that the section I'd joined was already inactive.

Sebastian Otero
Sebastian Otero's picture
Some data-mining opportunities using the public surveys

Hi, Paul,

There are a lot of examples of data-mining activities. Nowadays one of the most rewarding activities is combining data from the different available sky surveys to find new variables or to solve already known ones without elements or a proper classification.
The VS-COMPAS team lead by Andrey (thanks, Andrey, for your kind words) is doing this with a lot of success.
There are several databases with publicly available data that you can data-mine. The ones I find more interesting are ASAS-3, NSVS, CRTS, OGLE, MACHO, LINEAR, Kepler and SuperWASP. Some are easier to use than others (the first 4) but nowadays there is software capable of opening datafiles from all those surveys, I am mostly using the easier ones adn I have years of data ahead to analyze! The possibilities are endless.
The hardest part is probably getting used to each survey's vagaries, their limitations, their pros and cons. You need to know what to expect of the data you find. Some show meaningful error figures or flags, some not, for each survey you need to know there are instrumental issues (exposure times, focusing, resolution) that affect the results. Each one has a different spatial resolution so they will perform differently in crowded fields (e.g. you can expect blended magnitudes if there are stars less than 50" apart in NSVS. That shrinks to 25" in ASAS-3 for faint stars, to 15" in APASS and 10" or less in CRTS).
With ASAS-3 you have the results in 5 differtent apertures and sometimes you need to choose which one is best (the smallest aperture for faint stars and the largest one for bright stars).

Some surveys show HJD dates, others show HJD-2450000, others MJD or modifed versions of that MJD. You need to be very careful with that and convert everything to HJD to be able to combine data.

And each survey has a different filter/set of filters. Or it may be an unfiltered survey with different calibrations. E.g. ASAS-3 and APASS are V. CRTS, SuperWASP and NSVS are unfiltered surveys that are calibrated using V magnitudes but since they are more red sensitive, their results are always brighter for thre red stars. OGLE has Ic magnitudes. MACHO has Rc. HIPPARCOS has Hp magnitudes easily converted to V unless the star is a red giant or a peculiar object. Kepler has Kp magnitudes. For most stars you can combine data to derive elements by simply shifting the observations to the same zero point (I mean when you combine a stabdard dataset like V with an unfiltered one).

E.g.: combining data from ASAS-3 with CRTS extends the time baseline from 9 years (2000-2009 for ASAS) to 13 years (CRTS is being updated in real time). Combining CRTS with NSVS (1999-2000), increases that baseline for a year or two more. Combining ASAS-3 with HIPPARCOS (1989-1993) you'll get a time baseline of 20 years! This improves a lot the period determinations since you have many more cycles.

Each survey has a dynamic range where its observations are useful. HIPPARCOS is good from 1 to 10 or so, ASAS-3 from 6-7 to 13-14, NSVS from 9-10 to 14, CRTS from 12-13 to 20 or so you need to take that into account too. Using saturated (too bright) or too faint (large scatter) data won't be useful and may ruin your analysis.

And don't forget the AAVSO International Database. There are a lot of observations to use in our database that can be used for a combined analysis.

So it depends on what you like or want to do, if you want a prefered variable type, if you want to find new stuff or correct old stuff.

VSX gives you a lot of data-mining oportunities too.
You can help us keep VSX up to date and with improved information.
VSX offers you the chance to make different type of searches, e.g. searches by variability type, by magnitude, by name.
For instance, you can find mira types with no period or eclipsing binaries with no period (any search you make offers you the chance to order the results by a number of parameters, including period) and then you can data-mine the public surveys to find the star's elements and submit a revision to VSX.
(First, do a literature check of the star in SIMBAD or the ADS to see if there is no paper published with that information. In case you find that, you can still submit a revision to add the information adding the relevent paper in the reference field in the VSX revision form. That is also a great help in order to keep the database updated).

Other example is revising stars with ambiguous classifications in surveys. This would help improving the statistics on the different variability types. For instance, most surveys give automatic multiple types to the same star. A star may be classified as EC|ESD|RRC|DSCT when it is an EW so defining the correct type is another interesting contribution to help cleaning up the database. In this same example, the period may be half or twice the actual value, because if the binary solution is chosen the period will be longer (two maxima and minima) and if the pulsating solution is preferred, the period will be shorter (just one minimum and one maximum).

Also correcting wrong classifications, updating ranges, etc. using survey data is welcome.

There may be more data-mining projects in the future about this (e.g. checking GCVS stars with poor information) so keep your eyes open.
But you can start now with the lots of possiblities that data-mining offers you.

Ah! What I had forgotten to mention was that you can get access to most of the surveys I mentioned just by doing a VSX search on position and then clicking on each survey's name in the VSX external links menu.


daveh's picture
data mining

I've been doing some data mining for about the last 4 months with the Lowell Observatory LARI (Lowell Amateur Research Initiative) program.  The project I'm working is concerned with extrasolar transits (planet hunting) rather than variable stars, but utilizes similar techniques.

I've been planning to become involved in the AAVSO data mining activity for some time now and hopefully will get started in the near future.  So, I'd be very interested participating if the forum is active.  I've actually been using some of the AAVSO analysis tools for my LARI work.

ZPA's picture
Data Mining

  I just wanted to thank Sebastian for providing so much detailed information in his post. It seems to me a good start (correct me if I'm wrong) would be to get more familiar with the ASAS-3 and other databases that you mentioned, and also with the information that VSX can provide. I'll read through your entry more thoroughly when time permits. Thanks again!

drob's picture

Good Morning All,

I want to generate a list of EB stars by data mining VSX.  This part of a project that I hope that I will be able to present at the fall meeting of the AAVSO.  Apparently I can only search for information about EB stars one at a time.  Is there a trick to do this?  Is there third party software that well let me gather this data and will VSX let me do this data mining.  Finally another hurdle is my computer's operating system is Ubuntu which does not support Java so the solution cannot be VStars though I could use one of my local library's computers.

Is it obvious that I am a novice at this?  I would appreciate any and all help.

Cheers and finally clear skies in New England!,




BGW's picture
a useful EB data compilation

Not to dissuade you from mining VSX for EBs, Bob (and I too am interested in knowing the answer to your question about searching VSX), but just in case you don't know about it, I thought I'd let you know about a dataset that I find helpful.  It's a compilation of basic data about nearly 7200 EBs.  A compilation by Avvakumova et al 2013, that supersedes a similar one by Malkov et al 2006.

The Avvakumova+ catalog can be downloaded from VizierR (just search on that author name).  There are lots of knobs to twiddle getting what you want at VizieR.  Most importantly, while the default format is an HTML table, you can get ascii text delimited in various ways, easy to import to Excel for example.

VSX will include more EBs than Avvakumova, but the additional ones will usually have less information available, elements less accurate etc.

I hope this is useful,

Gary Billings

drob's picture
EB data compilation



Thanks fot the information.  I will look at the data tomorrow.




Bob Dudley

Sebastian Otero
Sebastian Otero's picture
Searching by type in VSX

Yes, you can search by variability type in VSX.

Click on the More button in the VSX search page.
You'll first get an additional Coordinate-based search in adition to the identifier search.
Click on the More button once again and you'll end up with a large number of search options. You'll be able to search by variability type, magnitude, period, spectral type and some other options.
If you are looking for EB-type systems (I recommend not using "EB" as an acronym for "Eclipsing Binary" because EB stands for beta Lyrae-type systems) write EB in the Variability type field and you'll get a list of the EB type stars in VSX. Now we have limited searches to 1,000 results but if they turn out to be more, you will be able to download them all as a csv file.
If you want to have also suspected EB listed, as well as EB with additional variability types or subtypes, write EB% and you'll have them all (e.g.: EB/RS, EB+DSCT, etc).

If you look for all eclipsing systems in VSX, you can write E% instead of EB but I don't recommend that search since there will be more than 91,000 results (and some other types starting with the letter E will be included, like ELL, EXOR, EP). Use the subtypes. E.g.: EA%.


Variability type

Next to the variability type search box, there is also a listbox showing some groups of variables to search for.  Eclipsing binaries are one of the possible groups.


David Benn
David Benn's picture
VSX web service and VStar under Ubuntu

Hi Bob, all

Regarding VSX and VStar...


To answer your question about VSX, it has a web service you can use to query particular types, e.g. clicking this link:

returns results for EB/KE eclipsing binaries recorded in VSX. This thread on the Software Development forum provides more detail:


VStar does indeed run under Ubuntu, just not with the default version of Java that normally comes with the distribution. VStar runs under Windows, Mac OS X, Linux, OpenSolaris. To obtain a suitable version of Java to run VStar for Linux, see:

The second or 3rd link above is what you need. I would suggest the simplest way to install it for Ubuntu is: via this method: for Linux Platforms

I have not done this for awhile under Ubuntu but if you have any problems, I can certainly help.

Of course, VStar will only be of use to you if you wish to analyse particular datasets from various sources as opposed to doing VSX queries.

Don't hesitate to ask questions.


drob's picture
David,   Thanks for the



Thanks for the information.  Obviously I posted my inquiry on the wrong wrong forum.




Bob Dudley

Log in to post comments
AAVSO 49 Bay State Rd. Cambridge, MA 02138 617-354-0484