Skip to main content

Statistics question: How good are our measurements?

17 posts / 0 new
Last post
SGEO's picture
Statistics question: How good are our measurements?

I'm looking for a way to evaluate how accurate are my magnitude measurements.

If I'm looking at S Hya the software will generate an error value for the measurement; this is precision.

What if I use the same comp star and now measure the check star. There is good photometry for the check star ( Eg  000-BJQ-165   has a mag and error value    9.543 (0.043) )

Say my measurement is 9.3 +/- 0.05.  What is the confidence level that my data matches the reference?

The thought is that if I cannot accuratley measure a known star, then I should be examining my system and process to understand why before I start publishing my measurements of a variable star.

This technique, of measuring the check star and comparing it to the reference, is included in the TransformApplier software as the Test TC feature. But I've never had the confidence in my statistics knowledge to really push it to a judgemental result.

So, anyone out there knowledgeable in statistics willing to offer a process to analysis this situation?





SGEO's picture
A first guess

One way of looking at it might be to take the difference of the two estimates (9.543 - 9.3 = 0.243) and compute the error for that difference in quadrature:  (sqrt(0.05^2 + 0.043^2) = 0.065

If that error is one standard deviation, then the difference is almost 4 standard deviations from 0; not very good at all.

Criterion for a good measurement be that the diff be within 1 sd of 0?


Ed Wiley_WEY
Ed Wiley_WEY's picture
Interesting question


I will be interested to see what comes of you query. As a biologist used to normal statistics of populations I find dealing with statistical parameters of single objects rather mysterious (except to say that the uncertainties are a function of precision).



Eric Dose
Eric Dose's picture
Known accuracy probably unattainable

First of all, because accuracy is a measurement's deviation from the correct value, we still don't know the accuracy of that Mira S Hya magnitude--and indeed we cannot know it without authoritative external knowledge of its magnitude, say, at the top of our atmosphere. For these stars, such is probably never known. So strictly speaking you're not going to determine accuracy as such from a check star. Maybe if the check star and target star are extremely similar (cannot happen consistently for high-color, high-amplitude Miras), you can use the check star for a lower limit on bias, but I'm afraid that's about it for accuracy. [Proper check stars for Miras don't really exist anyway--any check star as red as a Mira will certainly be variable].

So that leaves us with (1) withholding data when we know from external knowledge (like cosmic ray hit on the star profile) that it's likely to be biased (inaccurate), and (2) expressing uncertainty (imprecision) in all submitted measurements, so that others know just how literally they're being asked to trust the data. For an isolated Mira observation, numerous uncertainties arise including comp star uncertainties, color effects, extinction, background flux subtraction, and yes CCD shot noise. I try to account for all of these in every uncertainty ("error" as the Extended Format would have it--sigh) value I report.

The situation might be relaxed in specific ways in other cases, especially for more time-based measurements on eclipsers and exoplanets, for example, where the absolute magnitudes are less important than the time and relative magnitude data (light curve shape alone). But for Miras and other LPVs where different observers' data will be merged into an aggregate light curve, absolute magnitudes do matter--a lot. To me, that logically means that taking steps to responsibly minimize bias (like transforming magnitudes) and then including all sources in reported uncertainties are both required, at least for LPVs.

Ed Wiley_WEY
Ed Wiley_WEY's picture
Some thoughts on the problem


I thought on this overnight and was drawn to a discussion I had in my former work measuring visual double star separation and angle. The discussion centered around accuracy of measures. My thoughts were that accuracy could only be determined by fitting data to models. The purpose of measuring true binary visual doubles (or even optical pairs with rectilinear motion) was to develop orbital or rectilinear models that were predictive of future measures. "Relative accuracy" of any particular measure could be judge by how well a datum fit the model. The point of making the model was to gain some astrophysical meaning of the phenomenon (for example, orbital motion or checking independent estimates of proper motion). As we know, models are never “true,” the function of a “good model” is accurate prediction of future data as a whole (scatter of uncertainty expected to be random across observers). The better the model the more robust are any conclusions based on the model.

Transfer that to variable stars. One wishes to derive some astrophysical insights into the behavior of Mira variables. Do they show a period? Do their max and min vary? What astrophysical insights can we derive about the evolution (or ontogeny as we evolutionary biologists might say) of Mira variables based not only on this particular one, but a large population of like-Miras? For these questions we do not need absolute accuracy, we need relative accuracy. We can access relative accuracy, even for a single datum. Fit your data to the model and see how it performs.

If we have a robust model, then astrophysical insights follow. So, I would be interested in how well your measure of the variable conforms to the general trend of measures for that variable. So long as there are not systematic errors across observers it should all come out in the wash as some will underestimate, others will overestimate and a lucky few will hit the model on the money. If it is super bad the cleanup team will tag it as discrepant.

Does the error you report have real consequences for the measure of the Mira? I hope not, but I will be very interested in what a professional says as I have seen the same thing. Not as extreme, but certainly there. 

My prediction: as long as the check star and comps do not vary, you are peachy fine and so am I.


tcalderw's picture
Return of the Broken Record

I remain baffled that AAVSO observers appear unwilling to test their systems against standard stars (eg:  If one cannot, within reason, reproduce Arlo Landolt's photometry, to what purpose is any analysis that is based upon non-standard stars?




I agree with you.  But it goes beyond that.  In the first place, as Arlo Landolt himself pointed out, some of his "standard" stars have been discovered to be variable.  So before anyone embarks on a program of characterizing the accuracy of their photometry, they need to do a thorough investigation of the proposed target(s), as well as the comparison and check stars.  (I have encountered numerous examples from my own program of comp/check stars from AAVSO photometry lists or SeqPlot that are variable.)

Second, a point measurement provides little insight.  The measurement system as well as the target and comparison stars' signals are corrupted by noise.  Consequently, it is necessary to perform a number of measurements, under conditions as nearly identical as possible, before computing the error measures and their statistics.

Third, measurements made on one target with one comparison star or ensemble of comparison stars in not necessarily transferrable to other stars.  The accuracy will depend on the quality of the comp star(s), the measurement SNR, and other factors such as the color indices.   That said, if the sets of measurements for a small ensemble of standard stars indicates good performance, it should be indicative of doing the job correctly.

lmk's picture
Ah, Errors, my favorite topic

Ah, Errors, my favorite topic! To the original poster - SGEO - Never forget that the mathematics of statistics makes a strong assumption that the error is purely a random one, and typically a Gaussian type distribution. We all remember that the TOTAL error is the random PLUS the systematic error! (In quadrature, of course). Software/math knows "nada" about systematic errors, so it will always underestimate total error. I think prior discussions have clearly shown that it is this systematic error between observers which is the major contribution to the total error, and why there is significant scatter between observers of the same star, typically 0.05 to 0.1 magnitudes. Sadly it remains similar to the total error in good visual observations.

Mike L.



Good points, but might I point out that sometimes we do know about the systematic errors.  As a for instance, I used the formula that the Sequence Team employs to translate Tycho2 magnitudes into the Johnson-Cousins system on the very data set that was used to derive it, resulting in a determination that the average bias of that formula is a bit over 6 millimagnitudes.  Of course knowing that the average bias is 6 mmag does not do much good because it varies from star to star. 

Another point is that if one followed the process I outlined in my earlier post, it would be possible to ascertain the existence of biases, provided the sample sizes were large enough.  Part of the problem with that, though, is that lots of the stars used as comps or checks have such small measurement support that one cannot be sure that they themselves are not significantly biased.

One last point:  The assumption that the errors are normally distributed can be tested - if the sample size is large enough.  This is something I learned to always do in my real job, but which most researchers other than mathematicians, EEs and physicists seem to rarely do.  It can be downright dangerous to assume a normal distribution when the underlying distribution is otherwise, and having small sample sizes to boot simply exacerbates the problem.

SGEO's picture
Check star measurement analysis

I want to be able to do a computation and make a statement about a check star measurement. How certain am I that my check star measurement matches the reference value? The discussion so far is more about the theory of astronomical error. My question is more specific, looking at an observation record about to be submitted to webobs.

1- What is the appropriate statistical tool to compare two estimates of a stars magnitude. We have the reference value and its error and we have the observation of the check star and assume the error in that measure is the same as the reported error of the target star. Can you make a statement that we are x% sure that the two measures are the same?

2- How useful will this number be? If you find that you are only 10% sure that your check star is right, should you throw out the observation of the target? Or should you increase the error estimate of the observation until that percentage goes up to 50%?

Maybe more should be said about the purpose of the check star measurement. How is it used by researchers looking at the AID data?



MZK's picture
Simple BUT not statistically valid choice?


When I compare my calculated check magnitude with the known magnitude, I do a quick check of the difference, e.g., 12.652 and 12.643 differs by 0.009 magnitude. If less than 0.01 mag, I happily accept that my process is very good and the target magnitude is also very good.

If the difference is <0.03 magnitude  (a typical Std for my own ccd observations), I consider the target magnitude as good and acceptable.

If the difference is <0.1 magnitude, I look for any issues with comp selection, try to improve, reconsider and consider target magnitude marginal but acceptable.

If the difference is >0.1 magnitude, I also look for issues with comp selection or process, reconsider and consider target magnitude unacceptable (usually!) if not improved.

You could translate that into % Difference if you wanted (e.g., 100 x  ((Mag1 - Mag 2) /Mag 1) ) and create a acceptance criterion of <1% or <10%, if you wanted? (Yes I know, % Diff works better with flux than magnitude!).

Since some will argue with any statistical analysis of a small population, perhaps a pragmatic criterion is better?


PS: Again, I have taken cover. Let the comments begin. 

MZK's picture
One Sample t-test?

Hi George:

Note 1 - I am NOT a statistician by education!

Assuming the variable has a normal distribution, use a one sample t-test. Yes, this may/will be a big assumption but for lack of additional info, forge ahead. The calculated magnitude has a mean and standard deviation from VPhot (my assumption). The test value could be the known check magnitude from a catalog. Degrees of freedom is the tough part. Very (too) small but perhaps same as number of comps used for calculated magnitude from ensemble.  One comp would be really tough and make the confidence interval enormous and worthless?  Worth trying a few examples by hand?


<<A one sample t-test allows us to test whether a sample mean (of a normally distributed interval variable) significantly differs from a hypothesized value.  For example, using the hsb2 data file, say we wish to test whether the average writing score (write) differs significantly from 50.  We can do this as shown below.

 /testval = 50
 /variable = write.

The mean of the variable write for this particular sample of students is 52.775, which is statistically significantly different from the test value of 50.  We would conclude that this group of students has a significantly higher mean on the writing test than 50.??>>


PS: I have taken cover. Let the comments begin. 

sample size, t-tests, and all that

First off, Ed Wiley is right:  the very best we can do with a single observation is to compare the measured value of the check star to its catalog value, with error measured in standard deviations from the mean.  Based on concrete experience, however, I need to caution you that many of the posted standard deviations are in fact garbage, so to use this approach you need to snoop a bit into the measurements for the check star.  If there is, for instance, a single measurement, then the standard deviation is undefined and you cannot use this method.

Regarding the t-test, the appropriate test is a variant in which the variances are considered to be unequal, and possibly unknown.  That's in the first place.  Secondly - and pay attention, Ken! - the number of degrees of freedom is equal to the sample size minus 1, and goes into the denominator in the formula for t.  Ergo, 1 sample, infinite value for t.  [Lothar Sachs, Applied Statistics, 2nd Ed.  Springer; pp 271-275].

There is yet another problem that needs to be considered.  The target and check star measurements are both subject to Poisson distributed photon noise, and will be contaminated unequally, even if the SNRs are the same.  This problem, like the problems of dissimilar backgrounds for the target and check stars, different target and check star pixel sets on the sensor (leading to different internal noise values), cannot be eliminated.  The best one can do is to ensure a large SNR, typically 23 dB or more.

Also, to repeat a couple of earlier posts, the best thing you can do to test the acccuracy of your technique is to collect a reasonable sized sample (n > 30) of measurements on a high quality standard star, and use the results to perform statistical testing.  With that many samples, you can first determine whether it is reasonable to assume Gaussian statistics by performing, for example, a Kolmogorov-Smirnov test.  If you fail to reject the hypothesis of normality, then you can use the t-test.  Again, be careful in researching your "standard target".

Eric Dose
Eric Dose's picture
Or Q-Q plot

Alternatively to Kolmogorov-Smirnov test, one can make a Q-Q plot, which not only tests Gaussian statistics but indicates outlier observations.

I generate a Q-Q plot on my data before further reducing it. For each filter, for each night, without exception.

CTX's picture
Zero Pointing Option

As an aid to the comp/check star selection;

AIP4Win allowed me to zero point the Instrumental V  magnitude of the selected Comp Star to the Photometry Table Value (settings tab).  When this was done then clicking on any check star would return the measured magnitude.  This process then enabled a quick way to choose a check star with close agreement and or change to a different comp star (which required a new zero point such that the software measured value = the photometry table value).

Per Ardua Ad Astra

Tim Crawford

Zero point

Adjusting the zero point has nothing whatsoever to do with the accuracy of the measurement; moreover setting it in AIP4WIN has no effect whatsoever on the processed value.  If you don't believe me, try it!

The things that determine the accuracy of the measurement are, inter alia, the SNRs of the target and comparison stars, the errors in the comparison star magnitudes (catalog vs actual), errors in transforming magnitudes from the sensor to the standard, errors in accounting for extinction and differential extinction, internal sensor noise, and quantization error. 

CTX's picture
Zero Point - A Revist


I suspect some possible confusion regarding my rather abbreviated post.

  1.  It was offered as  “As an aid to the comp/check star selection”
  2. When using the AIP4WIN  Single Image Photometry Measuring tool the zero point value within the settings tab allows the observer to adjust that same value such that  the returned instrumental magnitude with the Result Tab  will equal to the actual V value of the comp star along with the uncertainty.
  3. Once this has been accomplished then each mouse click on a check star (wihin the Result tab of the AIP4WIN SIP tool) will return a V value (and uncertainty) that can be immediately and directly compared with the sequence value to see how close they match.
  4. If the comparisons show unreasonable differences with the actual sequence value then it is easy to simply zero point on a different comp star and begin the process all over again.
  5. The above technique avoids the necessity of using spread sheets or applying a differential solution to do the comparisons.
  6. For me, I always took the time to do this when selecting a comp and check star(s), even if then, ultimately,  using a different tool.
  7. This is a quick way to determine the efficacy of both a selected comp star and check star(s).

Again, simply an aid, and an effective  visual one, to the process of comp/check star selections.

Not offered as a Statistical analysis tool If I somehow created that impression in your mind.

Per Ardua Ad Astra,

Tim Crawford

Log in to post comments
AAVSO 49 Bay State Rd. Cambridge, MA 02138 617-354-0484