Skip to main content

Jupyter Notebooks for data analysis

23 posts / 0 new
Last post
mau
mau's picture
Jupyter Notebooks for data analysis

I am the Education Specialist for the Large Synoptic Survey Telescope, which is currently under construction in Chile. My job is to create educational activities that will enable users to access LSST datasets through the use of Python-based online Jupyter notebooks. We are interested in determining whether there is significant interest in the amateur community to study/ discover variable stars in LSST data through the use of LSST data search and analysis tools that we can develop using Jupyter notebooks. These notebooks would be made freely available to all.

If any of you already have experience with this technology, I would love to talk with you. You can reply here or directly to my work e-mail: aherrold@lsst.org

If you don't know about the potential of Jupyter notebooks, here is a link: http://ls.st/pfl

 

 

Ardis Herrold

Bikeman
Bikeman's picture
Great idea

I think this is a great idea. The LSST will be a wonderful discovery machine and many of us here will be extremely interested in learning how to do meaningful research with the data.

I'm working in the LIGO collaboration, and as you will probably already know  it used Jupyter notebooks as part of its EPO material in the wake of the first gravitational wave discovery announcement last year. I think the feedback to that was quite favorable.

CS

HBE

mau
mau's picture
Thank you!

We were wondering if in fact there are non-students out there that might be interested in data access. Would it be okay with you if I passed your name along to another person in our group to do a followup on your thoughts?  I will not do so without your permission. If yes, could you please provide a contact e-mail address?

Ardis

Bikeman
Bikeman's picture
Sure,

Sure,

feel free to contact me :  heinz-bernd.eggenstein AT aei.mpg.de

CS

Heinz-Bernd Eggenstein

 

dokeeffe
dokeeffe's picture
Ardis,

Ardis,

I think this is a great idea. I use Jupyter quite a bit for photometry and for day to day work stuff (I work as a SW engineer).

I recently posted an example of one of my notebooks here

Are you thinking some sort of libraries to make it easier to access the data through a notebook? I wonder if integration with astropy/astroquery (https://github.com/astropy/astroquery) would be a good or bad idea?

Derek

derek0207 AT gmail.com

mau
mau's picture
Jupyter for photometry

Derek,  

I went to look at your nb example and it is great! We certainly will do mash-ups with other data sets and tools, driven by input from user needs and interests. Thanks for your reply.

David Benn
David Benn's picture
Great

Hi Ardis

I'm using Jupyter notebooks more and more these days (for work and astronomy) so I think this is great and want to echo the approval of others.

I also appreciate those contributing notebooks within AAVSO, e.g. such as Derek.

As an aside, if you or others think there's benefit in observation source and/or analysis plugins that make use of LSST APIs within VStar, I'd be happy to talk about that too.

David

mau
mau's picture
Thanks

David,

Thanks for the feedback. I will keep this API option in mind.  We do intend to offer APIs to data users, and the Jupyter nb option is just a more guided access to doing all sorts of analyses with the widgets we  (and potentially other data users) develop.

I'm interested to know how you are currently using Jupyter for astronomy applcations.  Do you have any sample notebooks to show me, as Derek did?  We're trying to get a sense of how others in the amateur community are using them.

Ardis

David Benn
David Benn's picture
Sample notebooks

Hi Ardis

Most of what I do is along the lines of this kind of sequence:

  • load a file (e.g. CSV) into a pandas dataframe +/- extracting particular columns
  • apply constraints to the dataframe (i.e. filtering)
  • generate some plots

If I have a particular dataframe that is more useful, I'll post it.

Most of the DSLR photometry Python scripts (e.g. FITS header changes) I still run via the command-line. I'm a bit old school at times. Having said that, I've had two thoughts about this lately:

  • I should combine the notes I make in a text editor for DSLR photometry processing with the Python scripts and other commands plus markdown via a notebook.
  • Converting the excellent DSLR photometry spreadsheet into a notebook would be nice. pandas would be great for this. I just don't have a lot of spare time. Happy to work with others on this.

It'd also be interesting to explore TA and/or TG possibly having notebook incarnations.

David

cmorsoc
cmorsoc's picture
Jupyter Notebooks for data analysis

Hi Ardis, I'm interested in it as both, a High School Maths Teacher [1] and as an amateur astronomer and AAVSO member, so write down my email in your teachers  or amateurs list!! :-)

[1] http://www.iac.es/divulgacion.php?op1=16&id=1173&lang=en

email: cmorsoc @ gmail.com

Canary Islands, SPAIN

Thanks!

mau
mau's picture
impressive accomplishments

Hi Carlos,

Thanks for sharing your thoughts about this.  I read the article about the students you worked with - very impressive accomplishments for them, and also for you,  for creating this excellent project-based application. I was an astronomy teacher for almost 4 decades, and that is the sort of work I most enjoyed with my students, and I found it to be most effective in creating lifelong learning experiences. I am now working for LSST, designing research tools for future students and amateurs. Please send any other ideas you may have my way- I was delighted to learn that you are a math teacher, using science data sets. Your students are indeed fortunate.

Ardis

 

cmorsoc
cmorsoc's picture
impressive accomplishments

Thanks Mau, We really enjoyed the project. I'm very proud of them because they come from low income families but... They really fight to learn every day.

See you 

hgeagle
Re: Jupyter Notebooks for data analysis

Hello Ardis,

Your efforts are a wonderful idea, and I believe - due to their interactive use - Jupyter notebooks for analysis, teaching, learning etc. in general will be very important and will be the way to go for many research efforts; i.e. also for the AAVSO. 

I like Derek's photometry efforts with Jupyter notebooks and David's LSST plug-in suggestion for VStar a lot, too.

Here is my "spin" on future Pro-Am collaborations (and I really believe there will be huge new opportunities for both parties): Getting access to the data from the upcoming large survey efforts (such as LSST) is not a trivial task, and in particular amateur astronomers will need here the support from the professional research community. I also believe that due to the vastness of available data in the near future, amateur astronomers should be and will be more and more interested in data mining activities. I would not be surprised that the Pros would like to offload some of the analysis/data mining activities to amateurs.

"Data sciences" are an exploding field right now, and where there might currently be a lot of hype involved, there is no doubt that the tools of the trade (e.g. "machine learning") could be usefully applied to the new large astronomy datasets. If there is a path to involve amateur astronomers in these activities, it might be even possible to attract a larger number of young people to work on these problems as new members of the AAVSO (I think the AAVSO is a little struggling in this area...).  So much for "pure" data mining... 

Another aspect I think is important to mention - the complimentary activities of Pros and Ams in regard to instrumentation and, e.g., allowed time to spend on a target. The large surveys appear to have detectors that will saturate relatively early (compared to "typical" amateur instrumentation). Isn't mag = 16 a value for the LSST (not so easy to reach for amateurs)? If LSST found variable stars that vary/outburst to brighter values, they cannot be followed by the LSST, but could be supported by amateurs. Thus, a path for alerts or requests for follow-up investigations (performed by amateurs) should be created, maybe based on the data-mining efforts mentioned before (at least for the slowly changing targets).

Jupyter notebooks could be a key to provide better transparency and understanding for all these efforts.

So, please, yes, Ardis, you have my vote for the Jupyter notebook approach, in particular for future activities of/with the AAVSO (but I am not a spokesperson of the AAVSO, just a very interested member...)!

Best Wishes,

Helmar   

     

 

mau
mau's picture
opportunities for research

Helmar,

Thanks for your thoughts on this. I, too, see data mining as a vehicle to draw more young people into doing scientific observations. I taught astronomy on the outskirts of Detroit, and the overwhelming light pollution plus persistent cloudiness made it a challenge for students to go outside and do routine observations of the night sky. At times we could use remote telescopes, but I also liked the option of data mining as a route to exploring the sky. Data mining also provides advanced students a chance to be in the driver's seat and design their own investigations.

AAVSO has a long history of pro-am collaboration, and that is one reason I am so proud to support it. I am also most excited to work for the LSST, since we will offer open access to data. I believe it will be a game-changer, offering a time-domain record of observations for an entire decade, and detecting over a million transient sources per night- it's hard to even wrap my mind around this. Additionally, the deep data (to magnitude 25) will turn up many interesting discoveries. 

Since LSST is a survey telescope, it will not be possible to request specific observations, since approximately once every three nights it will repeat its observing cadence. Over ten years, it will generate petabytes of data and the co-added images will generate very deep fields.

Our intent to develop Jupyter notebooks is to provide a set of easy to use tools to explore and analyze the data, even for those with no background in programming. Additionally, these notebooks will be online, so there will be no software to install, or bandwidth or memory issues.

 

 

SFS
Downloadable, please!

I like everything being said here except the very last paragraph where it is stated that the Jupyter notebooks will be online.  I don't use Jupyter much because I write code for my own use and don't like the slowness of a browser on top of my Python code.  (It's a generational thing.)  I think it would be very advantageous to those of us who want to look at data in different ways to provide direct access to the database(s) and to have a mechanism for downloading the Python code.  Cutting and pasting from the notebook is less than ideal because of various differences between a Jupyter- and Idle session.

CS,

Stephen

mau
mau's picture
possibly downloadable

Stephen,

Our plan for the formal education activities is to use online Jupyter notebooks because they eliminate the problems of bandwidth, installing specialized software into school computers, and firewall issues. But for API users, a few options are possible. First, we could design Jupyter notebooks specifically for variable star observers that would contain the functionality important to this group. We expect that most users will like the option of Jupyter notebooks because of their ease of use.

Downloading the actual data is a possibility, but few people have capacity for 700 Terabytes. Plus it costs us money for each download, at the rate of 10 cents per Gb, so you can see how this could quickly get pricey. Another option is that you could write code for your own broker and then query only the data you need. It still is likely to be a vast amount, unless you have a very specialized interest. We are still in the process of thinking through the API and data quotas, so thanks for your input.  If you have any other thoughts on this, please send them along.

dokeeffe
dokeeffe's picture
downloadable

Yes the download would be expensive, but bear in mind that you will pay for the compute & memory resources if there is a lot of data reduction done on your side too. Depends on your infrastructure I guess.

David Benn
David Benn's picture
Data and processing costs

That makes sense. A plain old naive VStar observation source plugin to download data would make less sense then. I see now that it's preferable to keep your data and compute close together given the quantity. I'm used to compute clusters at work so I get that notebooks would be useful for exposing an interface to this.

However, if your API allows this kind of thing to be expressed: carry out operation sequence S on data with constraints C returning result R in format F, that would be useful for remote use via an application plugin.

David

SFS
Not terabytes

I see your point, and certainly did not wish to imply a desire to download the entire databaase; that would be exceedingly stupid.  But what I envison is a targeted query in which one would download no more than perhaps a few hundred megabytes and do the crunching offline.  The cost of computer time is a rather abstract concept to someone who has a couple of workstation quality computers lying around the house otherwise gathering dust, but connect time, like Gbytes downloaded, is something I have to pay for.

I use VizieR all the time and have never needed to download even .000001% of their database.

Another question:  will it be possible for users to modify the online Jupyter notebooks, or to upload one?  That would eliminate the need to dowload anything but a record of the query and the summarized results.

CS,

Stephen

mau
mau's picture
Modifying Jupyter

It certainly will be possible to modify,share or customize any Jupyter notebbok, although I am not sure about upoading one. 

Ardis

 

YPFA
A Good Idea ...

Hi Ardis

You asked "whether there is significant interest in the amateur community to study/ discover variable stars in LSST data through the use of LSST data search and analysis tools".

I'd say that the short answer is "yes".

Amateur astronomy is changing. Although there will always be those who wish to grind mirrors and build their own telescopes, it is getting harder to do useful science with amateur-sized instruments, and so it makes real sense for amateurs to leave serious data-collection to the professionals (eg. SDSS and LSST). Also, lots of wanna-be amateur astronomers live in large, light-polluted cities and observational astronomy with their own telescopes is scarcely possible.

What to do then, if you wish to do meaningful astronomical research? The LSST project would seem to offer a terrific opportunity to gain access to large quantities of high-quality data (I'm thinking of variable stars now). Using this data, and having some knowledge of statistical techniques, data-mining know-how and / or data visualization techniques, a diligent amateur could analyze the LSST-supplied data and potentially make discoveries. All they would need, apart from the knowledge mentioned above, is a decent laptop or desktop computer and the software which implements the analysis techniques, the latter of which, it seems, LSST is willing to supply. We are now entering "an era in which the algorithm [not the telescope] is the instrument" (Thomas Laredo of Cornell Uni - Australian Sky and Telescope, p.31, October 2016).

So, yes, a great idea. Even though the impending flood of data is at least 5 years away. It's important to be ready.

Finally, I think you are interested in targeting non-students (based on some of your other posts in this thread)? If so, that too is a good idea. There is always a lot of emphasis on high-school kids (and rightly so) to get them interested in astronomy, but don't forgot the large numbers of older people, particularly retirees - some of them AAVSO members - who are no longer so keen to stand out all night observing in the cold but who still have lots of time on their hands. Analysis of already-collected data could really appeal to this group.

And (really finally, this time), have you given any thought as to what the appropriate skill set might be for a serious amateur to be able to do useful analysis of LSST data? Armed with this sort of information, an amateur might choose to spend the next 5 years learning, reading and generally coming up to speed, in order to be ready to contribute.

All the best: Paul

mau
mau's picture
Skill set + Jupyter

Paul, 

Thanks for your thoughtful reply. I too am getting to the point where I don't enjoy freezing outdoors all night. I solved it for the time being by partially automating my telescope. :-)

But seriously, I think there are a considerable number of folks out there who may find "data observing" appealing. Since the magnitudes LSST will deliver will be from about +17 to + 26, it will be as if exploring an entirely new sky for most people. In addition, one should be able to identify variables in the LMC and SMC and confirm their membership by metallilcity.

My goal is to design Jupyter notebboks for a novice learner- one who has little prior astronomy training, and no coding experience. We will be creating a suite of introductory variable star Jupyter notebooks for use in formal education that will introduce the concepts of light curves, standard candles, evolutionary stages (think AGB) and different types of variable stars. Certainly supernovae will figure prominiently into this as well.

Since all of our Jupyter notebooks will be customizable, it will be possible for someone with a little background to modify the code on these. In addition, we may be able to design some Jupyter notebooks just for people like you who are variable star observers. Certainly, my post has shown that there is interest in such, and with the input of the AAVSO community, we can solicit what types of widgets and data query tools might be most useful.

By making our Jupyter notebooks online, the back end requirements are simply a computer with ability to access the internet. The data and processing takes place on our server, so no special software is required. The Jupyter notebooks will have widgets that can do photometric anaylsis across all six of LSST's filters, construct light curves, etc.  With an observing cadence of repeat observations approximately every three nights for ten years, we will create an outstanding time-domain record of variables. I can hardly wait!

YPFA
Ardis

Ardis

You mentioned "all six of LSST's filters". Presumably these are u, g, r, i, z and Y.

Do these bands exactly match the SDSS bands in terms of central wavelength and bandwidth?

Regards: Paul

Log in to post comments
AAVSO 49 Bay State Rd. Cambridge, MA 02138 aavso@aavso.org 617-354-0484