|
|
|
Bayesian Blocks for Timing Data
All the examples presented here were created with the example
script found on the SITAR downloads
page. To reproduce (most of) the results below, save the appropriate
file as bblocks_examp.sl, and
then download the following Chandra data file:
ps.evt.gz. (You do not need to uncompress
it.)
Note that the `event mode' examples shown below were produced with
sitar v0.3.00, wherein there was only one choice of `prior'
(i.e., the probability of one or more events being found in a single
`frame' being uniformly distributed between 0 and 1; this is still the
default). The new choices (selected by setting the alpha and
beta parameters in sitar_global_optimum) are not shown
below. I suggest trying the versions where alpha>0 and
beta = 0 and alpha=beta >0. I believe that both of
these choices will yield a 'more sensitive' test, as well as result in
the ncp_prior parameter yielding `significance values' closer
to the naive frequentist interpretation. (This is only a rough
proxy, which should be used to guide you in, for example, running
Monte Carlo simulations to gauge a more realistic `significance'
level.)
The data file mentioned above contains a point source extraction from
a 40 ksec observation of a quiescent neutron star. The peak count
rate is roughly one photon every 20 seconds, and there are a total of
1300 photons. We therefore do not bin the data and we run
sitar_make_data_cells with type=1. (If you have on
average more than one event per time represented by frame,
type=3 binned mode is a much better choice.) The tstart
and tstop times in the FITS file are actually slightly beyond
the endpoints of the data "good time intervals", so in creating the
cells we choose the start and stop times to be that of the first and
last detected photon.
As a first cut, we set ncp_prior=1.1395, which nominally is a
"68% confidence level" (although see the caveats about this
interpretation on the Functions page - it is
best treated as a `starting guess' for the significance level, which
should be verified by simulations). As shown below, this clearly
detects the eclipses in the lightcurve, as well as structure
associated with the orbital modulation. Several spurious spikes,
however, are also detected.
Increasing the value of ncp_prior to 2.303 (nominally 90%
confidence per block) and 7 (nominally 99.9% confidence per
block), the number of detected blocks decreases. Note that in the
figures below, we have also increased the plotting bin size. This
is for plotting purposes only, and has no effect on the statistic.
(The figure on the right shows change points occurring within the
plotting bins.) One may wonder why, for the figure on the left, a
block with lower count rate wasn't placed near 30,000 sec, as in the
figure above. This would have required increasing from one to three
blocks between 22,000 and 34,000 sec, with each of these blocks
needing to pass the 90% confidence threshold. These two additional
blocks taken together imply a 99% confidence threshold to fit the low
count rate region around 30,000 sec. Such a high threshold is not
supported by the data (with the choice of uniform prior, alpha=beta
=0, used here; other choices of prior can reveal structure with
the same choice of ncp_prior).
When creating the figure on the left above, we also set the driver
to write the results to a file, shown below. Note that the indices of
the change point locations are specific to the output structure from
sitar_make_data_cells. Because we have chosen to create
type=1 event cells (i.e., the cell boundaries range from
halfway from the previous event to halfway to the subsequent event),
the lightcurve eclipse is represented by two cells (i.e., the
last/first event detected before/after the eclipse) with cell sizes of
roughly half the eclipse width. These are cell indices 81 and 82 for
the first eclipse, and 772 and 773 for the second eclipse. If we had
instead chosen type=2 event cells (i.e., the cell boundaries
range from an event up until right before the subsequent event), each
eclipse would have been represented by one event cell.
0 9.2019610e+07 9.2021300e+07 4.972e-02 6.164e-03
81 9.2021300e+07 9.2022905e+07 1.293e-03 1.719e-03
83 9.2022905e+07 9.2024582e+07 4.884e-02 6.139e-03
162 9.2024582e+07 9.2032733e+07 2.838e-02 2.031e-03
385 9.2032733e+07 9.2040151e+07 5.412e-02 2.894e-03
772 9.2040151e+07 9.2041828e+07 1.237e-03 1.644e-03
774 9.2041828e+07 9.2056183e+07 3.794e-02 1.729e-03
Change Location - Lower Time - Upper Time - Rate (cps) - Error (cps)
Here are several other examples of output using the Bayesian Blocks
driver (data files not provided). These are a source in the Chandra
field of the galactic center and Sgr A* itself (see the homepage of
Dr. Fred Baganoff). The first,
especially, represents a very sparse lightcurve: roughly 30 counts
were detected over a 150,000 second period. For such a sparse
lightcurve, how the endpoints are treated, and whether one chooses
type=1,2, or 3 event cells/bins, can make a difference.
Here I have chosen type=2 binning, and have taken the start and
stop times of the observation as the start and stop times for the call
to sitar_make_data_cells. (This was to allow for the
possibility of the lightcurve really representing a discrete flare in
the middle of a very long observation.) Choosing, for example,
type=1 and a start and stop time equal to the time of the first
and last event, respectively, results in a less sensitive detection of
the flare. (For this latter case, the first and last events are not
included in the statistic, as they cannot be assigned cell widths on
both side of the events.) Note also that lowering the acceptable
`significance level' (as always, see the caveats on the functions
description page about the meaning of significance level), by
decreasing ncp_prior, does not result in the detection of
multiple blocks, but rather causes the code to detect a narrower,
brighter feature in the lightcurve.
It is also interesting to note that by choosing type=3, i.e., a
binned mode with 250 sec bins in this case (plotting bins were 1000
sec), results in a more sensitive detection of the flare (including
resolving it into multiple components at lower significance
thresholds). Binning the lightcurve on time scales much less than
that of the characteristic variations can increase the sensitivity for
several reasons. Bins can have zero counts, whereas event cells
always have at least one count. This removes some of the ambiguity
about whether or not one should assign events via type=1 or
type=2. In the binned mode (with small bin size), effectively
the algorithm itself decides between type=1 or type=2,
and even allows for a mixture of the two in the same lightcurve. The
tradeoff, however, is much longer run times, as the algorithm is
approximately an N^2 routine.
The danger of using a binned mode, of course, is choosing a bin size
wider than narrow, statistically significant features. An example is
shown below of a very long, fairly sparse lightcurve that contains
large, rapid events. Here, type=2 event mode was used, and the
flares were significantly detected. Bins of only a few thousand
seconds in duration might have missed these flares. Users are
encouraged to explore the effects of the various possible choices of
event type and binning on their own data.
As a final note, we would like to contrast the 'Bayesian Blocks'
algorithm to searches for variability with a 'Kolmogorov-Smirnov'
(K-S) test (i.e., comparing event time distributions to that expected
from Poisson). The latter, applied to the same data, may appear to
yield higher 'significance levels' for any detected variability. It
is important to keep in mind, however, that the K-S test is only
answering the question, 'Does this lightcurve vary?' The Bayesian
Blocks algorithm, on the other hand, is being asked to provide
additional information: how many portions of the lightcurve vary,
where are they located, and what is the count rate within each portion
of the lightcurve? As one is asking for additional information, one
should naturally expect lower 'significances'. Furthermore, it is
rather more straightforward to come up with a `frequentist'
interpretation of an absolute `significance level' in the case of the
K-S test. Significance levels as described on this page should be
treated as rough proxies, and should be verified, for example, via
Monte Carlo simulations.
|