Uncertainty estimates in ensemble photometry

Affiliation
American Association of Variable Star Observers (AAVSO)
Sun, 03/24/2019 - 22:43

I have been under the impression that the final ensemble uncertainty measurement Err, was based on the Std (s.d. of the measurement) and the Err(SNR) (uncertainty from the SNR), combined in quadrature. Have I got this wrong?

I ask because when I look in VPhot DocumentationA under Error Estimation I don't see anything about combining Std with Err(SNR).

Phil

Affiliation
American Association of Variable Star Observers (AAVSO)
I include comp star errors as well

This is as good a place as any to mention something on my mind for a while.

For my own part, I try to include all sources of uncertainty. Specifically, in addition to the mere statistical uncertainty reported by aperture photometry packages, with my every reported magnitude I include two additional sources of uncertainty:

  1. uncertainty arising from they sky model (mostly nightly zero point), which is extracted from dozens of standards' magnitudes (typically controlled to less than 0.010 mag); and
  2. the greater of these two uncertainties:
  • uncertainty due to the comp stars mag errors reported in the AAVSO star chart table (weighted quadrature of the comp stars' stated uncertainties), and
  • minimum possible degree of disagreement between the comp stars' reported magnitudes (weighted standard deviation of the mean effective result).

My own reported target magnitude uncertainties are generally dominated by 2. above, arising from comp stars. That is, in my observations the mag uncertainty in log(flux) arising from aperture photometry alone as reported by MaxIm etc is almost always negligible compared to uncertainty arising from the comp stars used. in the worst cases with only one or two comp stars of extremely high mag uncertainties (say, >0.050 mag), this does serve to explode my reported mag uncertainty--but that high reported uncertainty simply reflects the truth. (If AAVSO is unhappy with my occasionally high reported uncertainties...well, rest assured I'd be quick to adopt better comp stars were they available.)

As my own usual targets (LPVs) require observations from multiple observers, I believe it imperative for me to report uncertainties that best represent reproducibilities between nights and observers, not merely precisions over an hour or two in one night with one scope. The latter may well suit rapid time series--eclipsers, exoplanets and the like--where absolute magnitudes (and their uncertainties on an absolute scale) matter less than the time domain.

But for LPVs, Z Cams, and other long-term observations, to bury dominant sources of uncertainty and report wildly optimistic, merely statistical uncertainties would strike me as very unhelpful to database users. So, it may turn out that the definition of "Err" in the database should depend on target type (really, observation cadence). Settled guidance from AAVSO would be very welcome.

 

Affiliation
American Association of Variable Star Observers (AAVSO)
them errors

Eric, I see the same thing about large errors provided for comp stars. I tend to delete large error comps from ensemble observations. They tend to be the brightest, low noise comps. For transformed data I try to pick a low error comp. I have been wanting to do a closer study of exactly how comp error sizes set the errors coming out of TA and VPHOT. I have studied the effects of stacking/averaging and found that S/N stops getting smaller at about seven images averaged. I think things change a bit depending star and comp brightness.

This looks like a good research paper, at least to figure out what the software is doing. I have asked those that know what is in the code, but don't think there was any definative answer. So it has to be learned by experiment.

 

 

Affiliation
None
Amen to that

" to bury dominant sources of uncertainty and report wildly optimistic, merely statistical uncertainties would strike me as very unhelpful to database users "

And yet that is the very prescription embedded in the AAVSO Intermediate Spreadsheet that is encouraged for use by DSLR observers.  I realize that the powers that be need to keep things simple minded for us simple folks, but calling the sandard deviation of the set of target star instrumental magnitudes an "error" simply goes beyond the pale.  [I realize that it is common practice for astronomers to interchange the words error and standard deviation; it's part of the terroir of the field see https://physicstoday.scitation.org/doi/10.1063/PT.3.4152 for more.]

Affiliation
American Association of Variable Star Observers (AAVSO)
Standard Deviation

Phil

I'm not familar with VPhot, but in general if you calculate the standard deviation of a set of observations, you should capture the "randomness" of those observations from all sources including Poisson noise (1/SNR noise) of the variable and all of your comps.

Jim

 

 

 

Affiliation
American Association of Variable Star Observers (AAVSO)
Standard Deviation & SNR

Thanks for your comment.  I now have confirmation from a couple of reliable sources that this is the way VPhot does it. 

Phil

Affiliation
American Association of Variable Star Observers (AAVSO)
Does that answer the question

Does that answer the question about large errors on comps?

If you only rely on 1/SNR from the image, then we can ignore stated comp errors?

It is possible for  your reliable sources can publish on the AAVSO web pages?

Seems like it should be easy for everyone should know.

 

Ray

Affiliation
American Association of Variable Star Observers (AAVSO)
VPhot Ensemble Error of Target

Hi Ray:

As a vphot user, I'm sure you have seen the Single Image Photometry page. I have attached an example below.

Each ensemble comp is used to calculate the magnitude of the target variable. Subsequently, the mean magnitude of the target is calculated from all of the comp target mags. The standard deviation (std) of the target is based on  the std calculated on the basis of all the comp target magnitudes. When one of the comp target mags is far from the mean, it is highlighted in darker pink/red color. If any of the comps appear to give very divergent target mags, they can be unchecked and ignored in the target mag mean measurement. There may be several reasons for this deviation including bias in the comp mag, faintness of the comp, aperture overlap with a close companion, optical issues at this comp location on the image, poor flats, sky conditions, or other? Typically fainter comps will give larger errors BUT this is not always the case. Brighter comps may also have divergent target mags for the reasons noted above.

Subsequently, the total Err of the target is calculated by:  Err= sqrt (std^2 + errsnr^2). IMO, this is a reasonably conservative but not too conservative measure of error. Others may have different opinions and choose to enter different errors in the report notes field?

The main problem I see for some error measurements with other software is that one image and only one comp may be used to calculate the target mag AND only 1/SNR is used to represent the magnitude error. IMO, this is completely bogus!

Is this the answer you were looking for or is there some other issue that I have missed?

Ken

Affiliation
American Association of Variable Star Observers (AAVSO)
Yep, I uncheck the red ones,

Yep, I uncheck the red ones, just like you taught me Ken.

Can you explain Total Err= sqrt (std^2 + errsnr^2) ? You seem to have gotten there without any steps.

The std seems to come from the deviation from stated magnitudes. How is the error of SNRs obtained? Is it simply the geometric sum of the 1/SNRs? ie sqrt((1/SNR1)^2 + (1/SNR2)^2 + . . . )) ?

The errors given for comp magnitudes seems unmentioned and unused.

I suppose one way to get a measurement error is to take a time series of the comps, then use the std of the image-to-image error. A random error could be had,  Systematic error might show as an offset of one's reading from the stated comp values. These sorts of things make me agree with you that 1/SNR is not the whole error.

Ray

Affiliation
American Association of Variable Star Observers (AAVSO)
VPhot and 1/SNR

Hi Ray:

With respect to the target 1/SNR value, it is based only on the SNR of the target star itself. This is true since if the comps were all brighter and one did a sums of squares of those errsnr, the target errsnr would be smaller than it should be. So, VPhot just uses the SNR of the target in the image to calculate its own 1/SNR. I think this makes sense to me. How about you?

The std of the target comes from the normal std calculation using all the individual estimated target magnitudes which are also used to calculate the target mean magnitude.

HTH, Ken

Affiliation
American Association of Variable Star Observers (AAVSO)
Properly determining the

Properly determining the error in CCD observations is truly the "mother" of all challenges in photometry, as it has been discussed at length for years! At its most fundamental level, you need to differentiate the SYSTEMATIC from the RANDOM error components. Most of the discussions revolve around the calculations of 1/SNR and combining in quadrature, which are mathematical derivations based on the assumption the underlying mechanisms are purely RANDOM. Except for rare cases of measuring signals similar to shot noise, SNR near 1, so random errors become large, most measurements have large enough SNR that the calculated error yields very small numbers, which are quite unrealistic, especially when comparing actual CCD observations to visual, showing fairly similar errors!

So, as Eric mentioned, it's the systematic errors that usually dominate the error budget, of the comp stars and of the instrument. Only in the case of truly large numbers of measurements and ensembles, "the law of large numbers", can these systematic measurements become "random", by encompassing all the factors that cause systematics, and thus "cancelling out bias". In reality, the ensemble size most observers use, is nowhere near this huge size, and therefore contains SYSTEMATIC error as the primary factor. It's very likely that comp stars in a small field in the vicinity of an object will tend to have similar systematic errors, especially since its likely most comp stars are measured by a single instrument with a single run, or very few such measurements using similar equipment (eg. APASS).

I think, but cannot offer hard proof, if a "new" APASS were designed, consisting of a totally different telescope, different sensor, different software and reduction algorithms, and the whole sky done over again, significantly different "standard comp star" magnitudes would be the result!

So, what needs to be seriously addressed is the systematic errors, in the compstars and the observer's instrumentation, not these theoretical error budget calculations based only on random!

Mike

Affiliation
American Association of Variable Star Observers (AAVSO)
Weighting of compstars

Mike, I agree completely what you wrote here.

The method VPHOT is using, is definitely a simple and straightforward one. Easy to calculate, easy to figure out what's going on. Works well with one or very few compstars. However, IMHO it doesn't take into account that, some observed compstars just aren't that well measured (from the SNR point of view).

I haven't investigated that by myself, but maybe you happen to know if there are any good methods how to weight comparison stars in an ensemble.

E.g. when there are good (whatever that means) compstars from mag 10 to mag 14, several to many of them. Then in principle, the more stars you have, the less a single, occasionally deviating (maybe beause of small cosmic ray or flat imperfection), compstar would affect the result. But then it is unrealistic to have stars with such large magnitude difference exposed equally well.

And now my question - do you happen to know if there exists a good receipt how to take account that some of those good compstars have higher measurement uncertainties and probably weighting their contribution to the final result _could_ be lower. Somewhat similar as fitting e.g. a weighted linear regression into datapoints that have "errorbars" along all the possible axes, where errors are used in some or another form as weights.

Best wishes,
Tõnis

Affiliation
American Association of Variable Star Observers (AAVSO)
Weighting of Compstars

Hi Tonis/Mike:

Yes, VPhot does not provide a way of weighting individual comp stars. I'm not sure that any common photometry software does? Can anyone identify one? I suspect some more professional software does?

Alternatively, it does allow one to easily visualize the impact of poor comps, that is, those with higher relative error due to comp bias/inaccuracy, low SNR, saturation/non-linearity, or some image flaw. In the VPhot photometry table, one simply unchecks the comps that yield a poorer estimated target magnitude. In effect, one sets its weighting to zero. Yes, extreme but very simple and effective!?

Even though it is difficult to use comps with a very large magnitude range, I submit that the "best practice" we tell observers to follow (see CCD Manual) is to use comps with magnitudes (as well as color and location) near that of the target.

With current apass comp sequences becoming more available, I feel that it is not that difficult to get at least 5-10 "good" comps. I use that routinely for my targets in VPhot. While that is only a modest number of comps, IMHO it is adequate and a lot better than the "commonly used" single comp IF accuracy is of concern/importance? I find that most issues we find with outlier reported magnitudes result from failure to follow "best practices".

Thoughts? If your question is more general, I hope someone can help. BTW, I am a fan of these discussions as a way to educate our observers to understand the precision and accuracy of their measurements!

Ken (#1 VPhot Fan/User)   ;-)  wink

Affiliation
American Association of Variable Star Observers (AAVSO)
Ken, I also find VPHOT

Ken, I also find VPHOT interface very informative and easy to use. I typically select some compstars that agree well with eachother and then stick to them almost no matter what. Even if it seems that some of of them have (small) issues from time to time. I have found that if I select/deselect compstars based only by the indication of the deviation from average, colours start do strange things (when I have compstars with very different colours - and it oftens tends to turn out so). So such stable set of compstars increases final uncertainy budget for sure, but IMHO that it keeps colours much more stable.

But I'd like to do even better :-D So that's why I'm asking if someone of us has noticed some clever techniques how to improve furter.

Tõnis

Affiliation
American Association of Variable Star Observers (AAVSO)
Software that allows comp weighting

Mira Pro from Mirametrics (Michael Newberry's company) allows weighting of comps. It also has extensive documentation and tutorials. I used it exclusively for many years before I started using AIJ. It is really good but it isn't inexpensive. I still own it and use it occasionally.  It also allows you to to include the zero point error in ensemble photometry. That is the STD^2 term in the SQRT(STD^2 +1/SNR^2) computation in VPhot.

Let me throw a belated question into this discussion. Don't you get a reasonable error estimate by taking  several images (say 5) and simply use  stdev across the images of a check star magnitude with the check star selected to be the same or slightly fainter than the target, and if doing ensemble photometry to add that in quadrature with the stdev of the magnitudes calculated using individual comps in single images to capture zero point error for that image? Most of the work I do involves time series with many images in a single run on a target. I suppose that taking 5 or more images for one observation of a slowly changing variable could significantly limit the number of objects one can observe in one session, but it seems to me that such a technique would give realistic error estimates. 

 

Brad Walter

Affiliation
None
" I think, but cannot offer

" I think, but cannot offer hard proof, if a "new" APASS were designed, consisting of a totally different telescope, different sensor, different software and reduction algorithms, and the whole sky done over again, significantly different "standard comp star" magnitudes would be the result! "

One wouldn't even need to do that!  As an exercise for a paper I had hoped to present, I queried SeqPlot for a field centered on TU Cas, down to magnitude 8.5.  Among other things, I then went to APASS 10 for the APASS stars SeqPlot returned, and found that virtually all of them had a single measurement, yet quoted standard deviations of 0.001 magnitude.  This list, by the way, included both the target and four other known variables. 

I think, but cannot prove, that the keepers of SeqPlot formerly assigned a standard deviation of 0.100 magnitude to single APASS measurements, and sumise they've decided to move the decimal point.  Yet when they quote, for example, a GCPD star with a single measurement, they put NA in the error column.  So it is not that they don't know better.

The point of this is that no single measurement should ever be considered to be worthy of use as a standard, possibly excepting situations of extreme desperation.  Those who claim to be developing surveys for use as standards need to revisit each survey star many times, and report the measured standard deviation and sample size along with every mean.  That's how Landolt did it, how ESA did it (Hipparcos; Tycho; Gaia).  So what I am saying is certainly not unprecedented.

Affiliation
American Association of Variable Star Observers (AAVSO)
Testing

Ensemble photometry is clearly popular, but has anyone tried it out on standard stars to see if the ensemble-based magnitudes are in agreement with Landolt, et al?  That would seem to be an important test.

Many standards are quite faint -  a possible alternative is to use CST stars from VSX: stars once thought to be variable but now deemed not.  A nice feature of these stars is that many are in the AAVSO system, so you can generate charts and report data.  An example is VZ Aur (AUID 000-BBK-893, V=10.92), but it's easy to search VSX for others.  These stars are certainly not vetted for stability at the same level as the photometric standards, but they would make convenient test targets to get a handle on the quality of ensemble results.

Tom

Affiliation
None
Re #17

As a matter of fact, I have been doing some research on that topic, particularly focused on the Tycho2 to Johnson conversion process, in pursuit of the goal of enabling use of 1 Million+ Tycho2 stars as valid comparison stars for ensemble photometry.  In the process of doing so, I have uncovered a number of problems with the way that uncertainties are expressed in SeqPlot.

I expect to wrap up the analysis within the next couple of months, at which time I will submit it for publication - probably in Pub ASP.  As noted in my earlier comment, I documented the SeqPlot problems in a briefing I had hoped to submit for the June meeting.

As I am interested primarily in DSLR photometry, the focus of my work is on bright stars.  Because of that, the severe limitations on Tycho2 accuracy at magnitudes of about 11 or dimmer are of no concern to me.  But the general problem is that aside from Landolt and the E-F Regions photometry, there are essentially no data to use for comparison.  By that I mean measurements that have both small measured variances and sample sizes sufficient to place reasonable confidence bounds on those variances.  Large sample sizes also do a good but not perfect job of detecting variability.

Affiliation
American Association of Variable Star Observers (AAVSO)
errors, errors, and more errors

My graduate school professors always said that a measurement is not scientific unless it also includes an error estimate.  I'll define error=uncertainty for now.  There are two main sources of error: random, due to the varying incident flux and measurement techniques; and systematic, due to some outside influence:  poor transformation, comp star offset from the standard system, etc.

Random error is most reliably measured by taking "groups" of exposure "sets".  A set is defined as whatever sequence of filters you wish to use: B,V,Rc,Ic for example.  A group is then multiple sets of those exposures.  You can do this either by BBBBVVVV or by BVBVBVBV.  If you obtain an average and standard deviation of the target star estimate from multiple exposures, then that accounts for every random error that I can think of.  This is a far better technique than trying to determine the error from a single frame, but of course more time consuming in acquisition and analysis.  So I always recommend obtaining 3-5 frames per filter and reporting the average value rather than obtaining an error estimate from a single frame.  Note that this is what is done in the photoelectric world as well, and why I respect those observers for the time they take to get a reliable estimate.  Their method also requires recentering the star in their physical measuring aperture, and so removes many systematic effects.  The number of observations that you can report in a given night will decrease, but the quality of those measures will increase.  Of course, time series are a different beast.

If possible, the check star standard deviation may be a better choice for the reported error, as you then also remove any variation in the target star over the measurement interval.  This is very similar to the technique suggested by Tom Calderwood.  If the check star has a different magnitude, this estimate may be less accurate than reporting the target's standard deviation, if the target is slowly varying.  You may have to make that decision based on outside knowledge.  For time series, check star standard deviation may be your only choice.

There are many sources of systematic error.  As mentioned earlier, a comparison star sequence is often taken from a survey like APASS, where the reported comp star errors include both a random component and a component of how well it matches the standard system.  That systematic component is often identical for all stars in the sequence, so a bright star might show 0.05mag errors and a medium-bright star shows an identical 0.05mag error - obviously, not a random error.  You can often see this in an ensemble, where most of the comp stars give nearly identical results for the target, even though they have larger published error.  There are also a number of instrumental systematic effects:  how good your flatfielding is, whether you have a change with a GEM flip, whether there is optical distortion so that your comp star at the edge of the field has a different included flux in the measuring aperture than the target, how good your transformation (or lack thereof) might be, etc.

I think that most researchers are more interested in the random error component than the systematic error when analyzing datasets from AAVSO observers.  Like Joe Patterson does, systematic offsets can be largely accounted for by applying an offset between observers, if they have sufficient monitoring cadence so that curves from different observers can be overlayed and adjusted.  If you include the systematic error, which can be much larger than the random error, it degrades the internal precision of the report.  It would, of course, be ideal to report two uncertainties per measure:  one based on random error, and one an estimate of the systematic error.  Since it is usually impossible to make a clean separation, I prefer just reporting the random error from measurement groups.

I'll be discussing many of these techniques to improve precision and accuracy in my SAS workshop.

I think VPHOT does as good as possible for determining error from a single frame, but I'd consider it a lower bound on the true uncertainty of the measurement.

Arne

 

Affiliation
American Association of Variable Star Observers (AAVSO)
weighting

My thesis advisor wrote what I think is the seminal paper regarding ensemble photometry:

1992PASP..104..435H

In it, he weights comp stars based on their brightness (flux).  My personal ensemble program weights objects based on their magnitude difference from the target, their distance in the FOV, and the color difference from the target.  The main problem with any weighting scheme is in deciding the weight factors: why are you emphasizing one object over another, and the strength of the weighting.

Stephen, the APASS DR10 does include some n=1 objects, especially for bright objects like V=8.5.  I've told the Sequence Team that the uncertainty column reflects purely Poisson error in those cases and is unreliable.  I don't think I added those words to the APASS web page, but I'll do so soon.

Arne

Affiliation
None
Weighting

Arne,

I have not yet had the chance to read the paper you link to, but would like to share the results of my own analysis of the problem.  As I am sure the paper says, the Gauss-Markov theorem does not provide the basis for optimal estimation in the situation where the variances are unequal, but rather a weighting scheme in which the weights are inversely proportional to the variances.  This provided all the processes are unbiased.  The problem I have found in trying to apply general LMS estimation in ensemble photometry is that there are multiple contributors to the variance - not simply the Poisson noise process having to do with counting photons.  Moreover, the comparison star magnitudes contain small biases (or in some cases not so small) that greatly affect the accuracy of the weighting process.  My results indicate it is better in light of the uncertainties to just stick to the Gauss-Markov method because it seems to be about as optimal as one can achieve in a practical situation.

Re APASS, thanks for that.  Somehow the notion needs to be conveyed that standard deviations are computed based on the results of repeated measurements. To take the discussion further, I've noticed that although the GCPD itself does not contain standard deviations, calls to seqplot result in citations of such, and these, in very many cases, are incorrectly computed.  As GCPD is a data base of data bases, many of its entries are averages of multiple entries, and cannot be aggregated for the purpose of computing standard deviations.

CS,

Stephen