A colleague and I are compiling a series of times of minima (ToM) for a number of eclipsing binaries we observed. For poorly observed systems we are compiling additional TOMs using SuperWasp data. Obviously, some of the SuperWasp data are better than others. Just because Peranso can compute a ToM does not necessarily indicate that the ToM is accurate enough to be useful. So, I am wondering about just how accurate a ToM should be relative to O-C to be valuable enough to publish.
SuperWasp rules: All data with uncertainties of over 0.05 rejected, duplicates removed. Any and all minima thought to be analyzable were harvested.
An example: V0644 Aur. Our observation: Cycle 7736, ToM = 2458225.644824 HJD, O-C = 0.001568. I consider this to be a good dataset, good SNR, uncertainty = 0.012, one small gap (n= 2 observations) just after minima due to a meridian flip.
SuperWasp O-Cs ranges from -0.0001516 tp +0.005864 and cover cycles 1986-2469 (p = ~0.78). So, the SuperWasp cycles are tightly grouped in a period August 2004-April 2008. Six of these O-Cs exceed +_0.003 and an additional three exceed 0.002. The rest line up very nicely with our measure, clustered around +/-0.001. So, it is not likely that the period is changing although the measures are a tad off the expectation vis-à-vis the Epoch.
Given the tight time frame of the SuperWasp data, it is unlikely that the outlier measures indicate anything other than suboptimal SuperWasp data, either in terms of coverage of data points for particular minima or some other factor My inclination is to delete the outliers, certainly above 0.003 and perhaps above 0002.
Opinions of experienced ToM researchers valued.
My inclination is to let the data tell its story. If a SWASP time series looks to be of reasonable quality, with no known problems, reasonably good sampling etc (subjective!), then determine a ToM and report it along with its uncertainty.
I apologize if I am misunderstanding your question, but by my read of it, I'd say that to reject some SWASP ToMs because they have an O-C a bit larger than others, seems a lot like cherry-picking the data. Putting it another way, it is the ToMs that determine the overall O-C "curve" or trend, but you are saying you believe you know the O-C trend and it is telling you your ToM is wrong. If you really do believe that, perhaps you should reject all the ToMs you've derived from SWASP data, because it doesn't sound like you have a way of rejecting a particular night's data a priori.
Putting it yet another way, publish the ToMs, and those using the ToMs will judge them when they look at the entire suite of available ToMs, on an O-C diagram. When I make an O-C diagram, I usually code the graph to show the data sources. In the context of explaining the overall O-C trends, the user would, implicitly or explicitly, judge the various data sources and how much to honour each data point. If the SWASP points have greater scatter, so be it.
As a related matter, I don't have a study to prove it, but I believe that the error in a ToM is usually quite a lot greater than the error reported by, e.g., the KvW algorithm. In other words, I believe "external errors" result in the ToMs having greater error than those reported. Such additional error might be due to flat-fielding problems, changing sky conditions not fully compensated by differential photometry, light leaks, mirror flop issues, changing PSF through the night not fully accounted for by aperture photometry, etc. Anything that puts a "tilt" on a night's lightcurve, will shift the ToM, but probably won't show up in the uncertainty reported for the ToM.
All processes have variation, some more so than others. From the sounds of it, the SuperWasp data merely have a larger spread (i.e., standard deviation) around the population mean. That being the case, it would not be good practice to delete those data points simply because they have greater spread than data from another source that has smaller scatter. Consider, for example, the older visual timings of EB minima. They tend to scatter all over the place. Now, we wouldn't throw out that valuable history simply because the variability of visual observations is greater than CCD observations, would we?
Discarding observations because they don't fit some model puts the cart before the horse. Observations drive models, not the other way around. On the other hand, if there is a specific assignable cause that we understand (e.g., clouds, broken filter, faulty calibration, poor guiding, high airmass) to have influenced the observation, then it probably is OK to set the data point aside. Oherwise, suggest not.
One of my interests is O-C diagrams of some delta Scuti stars. My practice has been to capture several ToMs each season, in a cluster over a few days or a few weeks. As the O-C diagrams I have been interested in cover many years, looking at the spread of my O-C values for each season gives me a pretty good measure of my error.
within the BAV, the TOM and its error, of e.g. EBs is recorded with 4 decimal points.
I like to use the SW Minima27 of Bob Nelson of Variable South. There one can see, that different algorithms produce different errors of TOM. The error of Minima27 has 5 decimals. In the help section of Minima27 Bob explains a bit which error uncertainity one can trust or not. E.g. If a weighted mean of 6 algorithms produce an mathematical error of -/+ 0.00001 , this does not need to match reality.
Physics has an even harder rule of publishing errors. If in an physical paper, an error of e.g. -/+ 0.123456 jd is published, the author makes therefore a statement, that the measurement is indeed as accuarate as to the sixth decimal! So if a TOM is recorded, e.g. To the sevent digit, one has tou round the measurement up to the sixth digit, to match the error. Only if a calcultor produces e.g. 12 digiths, one has to think of the accuracy of the measurement of a certain device ( Air, telescope, ccd) itself.
So your error of v664 Aur has 6 decimals... If you have observed and gained this error with amateur telescopes, i would polite question it.
One can look for Swasp publications and look at the published errors there...
Or at JAAVSO, which instruments, gain wich errors? But even with a one meter mirror, if the seeing and atmosphere is bad, one would only obtain low accuracy.
Diiscarding measurements: One can reject data points (Not TOMs) where c-k is not in line. Than the jd error would be slightly better?. A famous outlier story is that of the Hertzsprung Russel Diagram HRD. There was mented to find the main sequence. And it was found. But there were data points else. Outliers above and below the main sequence. First they could not be interpreted. But they let these outliers in the HRD. Later it become clear, that the outliers above where red giants, and the outliers below where white dwarfs. So good for the two H&R that they did not delete them (-;
So one has to deceide, where can i trust the masurements or data points, or is there a reason where i can remove outliers for shure.
Apologies for not being able to show graphs.
The SuperWasp ToMs I measured for this particular system (V644 Aur) have a skewed distribution, not a normal distribution. (O-C range is -0.0001 - +0.0058). I would not be bothered if the range of variation was more or less equal on both sides of the average suggesting a normal distribution. This kind of normal distribution of O-C variation is common in the literature and to other ToMs I have compiled using SuperWasp data during this project, ranges +/- 0.005 are not uncommon
This particular skewness suggests that there is some systematic bias creeping into the overall results for this system. For example, in my zeal to take full advantage of the dataset to harvest every possible ToM I picked some eclipses that I should have avoided given the data. But how do I know that apriori? What do I do about it aposteriori? Nothing, just report what seems to me to be bias data? Make a judgement on the bias and eliminate it?
I was being lazy about the precision, sorry Bernhard. The usual is 4-5 decimal places. For the SuperWasp data it may be as low as 3 decimal places, but it doesn't really matter for the discussion, 3 decimal places would do and that is what I will report.
Strictly speaking, the statistical definition of skew is that one tail of a distribution is longer than the other. That's different from bias, which is the situation where data come from a population that has a mean different than what's expected.
To demonstrate that data are biased, a t-test of the means is in order. A low p-value would support the hypothesis that the observed minima are biased relative to the expectation that O-C = 0.
Also, it would useful to make a normal probability plot of the observed SuperWasp minima. This would yield insight as to whether the observations come from a non-normal distribution.
Appearances can, at times, be misleading. Statistical hypothesis tests are valuable in that they reduce subjectivity in decision-making.
I assume the data are skewed because the median is a better representation of the central tendency than the mean in this particular case and because of the nature of the distribution of the ToM values (one tail longer than the other). Given that, we are in the nonparametric world not the parametric world. I can't furnish you plots because we are not permitted to post them on the forum. But I will be glad to do so by email along with other rationale as to my reasoning. Shoot me a message with your email address and I will send you a summary.
This discussion has raised three issues: (1) O-C observations by SuperWasp are non-normal (i.e., they are skewed); (2) SuperWasp observations are inaccurate (i.e., they are biased); and (3) SuperWasp observations have more scatter (i.e., they are imprecise).
Of these three issues, I believe that bias is the greatest concern. The other two are surmountable. I've forwarded my email so that we can discuss off-line. Thanks.
After all that discussion: when I update the elements the problem disappears. Huge spread of O-C values but a normal distribution of data points. A valuable lesson learned in my efforts to understand these data. (Why skewed with the original elements?) Thanks for all for pitching in with your advice and help. Given the spread the ToM timing will be to two decimal places only.
So, Andy: The results are normally distributed, the ToMs are scattered, but not biased. You are on target. It was my failure not SuperWasp bias or even my bias. That gives me some solace. But my failure to find the most recent elements was my failure and apologies to all for that failure and thanks to John Greaves for pointing them out.