Version 1.2.8

Bayesian Blocks Examples

Bayesian Blocks for Timing Data

All the examples presented here were created with the example script found on the SITAR downloads page. To reproduce (most of) the results below, save the appropriate file as bblocks_examp.sl, and then download the following Chandra data file: ps.evt.gz. (You do not need to uncompress it.)

Note that the `event mode' examples shown below were produced with sitar v0.3.00, wherein there was only one choice of `prior' (i.e., the probability of one or more events being found in a single `frame' being uniformly distributed between 0 and 1; this is still the default). The new choices (selected by setting the alpha and beta parameters in sitar_global_optimum) are not shown below. I suggest trying the versions where alpha>0 and beta = 0 and alpha=beta >0. I believe that both of these choices will yield a 'more sensitive' test, as well as result in the ncp_prior parameter yielding `significance values' closer to the naive frequentist interpretation. (This is only a rough proxy, which should be used to guide you in, for example, running Monte Carlo simulations to gauge a more realistic `significance' level.)

The data file mentioned above contains a point source extraction from a 40 ksec observation of a quiescent neutron star. The peak count rate is roughly one photon every 20 seconds, and there are a total of 1300 photons. We therefore do not bin the data and we run sitar_make_data_cells with type=1. (If you have on average more than one event per time represented by frame, type=3 binned mode is a much better choice.) The tstart and tstop times in the FITS file are actually slightly beyond the endpoints of the data "good time intervals", so in creating the cells we choose the start and stop times to be that of the first and last detected photon.

As a first cut, we set ncp_prior=1.1395, which nominally is a "68% confidence level" (although see the caveats about this interpretation on the Functions page - it is best treated as a `starting guess' for the significance level, which should be verified by simulations). As shown below, this clearly detects the eclipses in the lightcurve, as well as structure associated with the orbital modulation. Several spurious spikes, however, are also detected.

Increasing the value of ncp_prior to 2.303 (nominally 90% confidence per block) and 7 (nominally 99.9% confidence per block), the number of detected blocks decreases. Note that in the figures below, we have also increased the plotting bin size. This is for plotting purposes only, and has no effect on the statistic. (The figure on the right shows change points occurring within the plotting bins.) One may wonder why, for the figure on the left, a block with lower count rate wasn't placed near 30,000 sec, as in the figure above. This would have required increasing from one to three blocks between 22,000 and 34,000 sec, with each of these blocks needing to pass the 90% confidence threshold. These two additional blocks taken together imply a 99% confidence threshold to fit the low count rate region around 30,000 sec. Such a high threshold is not supported by the data (with the choice of uniform prior, alpha=beta =0, used here; other choices of prior can reveal structure with the same choice of ncp_prior).

When creating the figure on the left above, we also set the driver to write the results to a file, shown below. Note that the indices of the change point locations are specific to the output structure from sitar_make_data_cells. Because we have chosen to create type=1 event cells (i.e., the cell boundaries range from halfway from the previous event to halfway to the subsequent event), the lightcurve eclipse is represented by two cells (i.e., the last/first event detected before/after the eclipse) with cell sizes of roughly half the eclipse width. These are cell indices 81 and 82 for the first eclipse, and 772 and 773 for the second eclipse. If we had instead chosen type=2 event cells (i.e., the cell boundaries range from an event up until right before the subsequent event), each eclipse would have been represented by one event cell.

            0    9.2019610e+07    9.2021300e+07     4.972e-02     6.164e-03
           81    9.2021300e+07    9.2022905e+07     1.293e-03     1.719e-03
           83    9.2022905e+07    9.2024582e+07     4.884e-02     6.139e-03
          162    9.2024582e+07    9.2032733e+07     2.838e-02     2.031e-03
          385    9.2032733e+07    9.2040151e+07     5.412e-02     2.894e-03
          772    9.2040151e+07    9.2041828e+07     1.237e-03     1.644e-03
          774    9.2041828e+07    9.2056183e+07     3.794e-02     1.729e-03
    Change Location -  Lower Time - Upper Time - Rate (cps) - Error  (cps)

Here are several other examples of output using the Bayesian Blocks driver (data files not provided). These are a source in the Chandra field of the galactic center and Sgr A* itself (see the homepage of Dr. Fred Baganoff). The first, especially, represents a very sparse lightcurve: roughly 30 counts were detected over a 150,000 second period. For such a sparse lightcurve, how the endpoints are treated, and whether one chooses type=1,2, or 3 event cells/bins, can make a difference. Here I have chosen type=2 binning, and have taken the start and stop times of the observation as the start and stop times for the call to sitar_make_data_cells. (This was to allow for the possibility of the lightcurve really representing a discrete flare in the middle of a very long observation.) Choosing, for example, type=1 and a start and stop time equal to the time of the first and last event, respectively, results in a less sensitive detection of the flare. (For this latter case, the first and last events are not included in the statistic, as they cannot be assigned cell widths on both side of the events.) Note also that lowering the acceptable `significance level' (as always, see the caveats on the functions description page about the meaning of significance level), by decreasing ncp_prior, does not result in the detection of multiple blocks, but rather causes the code to detect a narrower, brighter feature in the lightcurve.

It is also interesting to note that by choosing type=3, i.e., a binned mode with 250 sec bins in this case (plotting bins were 1000 sec), results in a more sensitive detection of the flare (including resolving it into multiple components at lower significance thresholds). Binning the lightcurve on time scales much less than that of the characteristic variations can increase the sensitivity for several reasons. Bins can have zero counts, whereas event cells always have at least one count. This removes some of the ambiguity about whether or not one should assign events via type=1 or type=2. In the binned mode (with small bin size), effectively the algorithm itself decides between type=1 or type=2, and even allows for a mixture of the two in the same lightcurve. The tradeoff, however, is much longer run times, as the algorithm is approximately an N^2 routine.

The danger of using a binned mode, of course, is choosing a bin size wider than narrow, statistically significant features. An example is shown below of a very long, fairly sparse lightcurve that contains large, rapid events. Here, type=2 event mode was used, and the flares were significantly detected. Bins of only a few thousand seconds in duration might have missed these flares. Users are encouraged to explore the effects of the various possible choices of event type and binning on their own data.

As a final note, we would like to contrast the 'Bayesian Blocks' algorithm to searches for variability with a 'Kolmogorov-Smirnov' (K-S) test (i.e., comparing event time distributions to that expected from Poisson). The latter, applied to the same data, may appear to yield higher 'significance levels' for any detected variability. It is important to keep in mind, however, that the K-S test is only answering the question, 'Does this lightcurve vary?' The Bayesian Blocks algorithm, on the other hand, is being asked to provide additional information: how many portions of the lightcurve vary, where are they located, and what is the count rate within each portion of the lightcurve? As one is asking for additional information, one should naturally expect lower 'significances'. Furthermore, it is rather more straightforward to come up with a `frequentist' interpretation of an absolute `significance level' in the case of the K-S test. Significance levels as described on this page should be treated as rough proxies, and should be verified, for example, via Monte Carlo simulations.

This page was last updated Oct 21, 2020 by Michael Nowak. To comment on it or the material presented here, send email to mnowak@physics.wustl.edu.

MIT Accessibility