%+
% Name:
% deltoclk_spec_2.0.tex
%
% History:
% 2011 Jun 03, created (v1.0), Glenn E. Allen
% 2011 Jun 09, removed mention of acis_format_events; added the
% exposure-statistics file to the list of input files; added the
% parameter minnumframes; added the definition of a valid frame; added a
% description of what to do if frames are missing; added a definition of
% median for even and odd numbers of elements; added the section "TBD"
% (v1.1), GEA
% 2011 Jun 12, added the package epsfig; added Figures 1, 2, and 3 (v1.2),
% GEA
% 2011 Jun 17, tried to clarify the conditions in equations 2-5; added
% a figure of examples of anomalous DELTOCLKs (v1.3), GEA
% 2011 Nov 15, changed frame k to frame k+1 following a discussion with
% Peter; use the median instead of the actual value of OVRCLOCK to
% compute PHAS for frames with anomalous values of OVRCLOCK (v1.4), GEA
% 2012 Apr 01, pipeline only, switched OVRCLOCK to DELTOCLK, added
% parameter exrfile, added adjustment to event island, added adjustment
% to OVRCLOCK, added GRADED mode, added caveat about response tools
% (v2.0), GEA
%-
\documentclass{article}
\usepackage{cxo-memo-logo}
\usepackage{epsfig}
\usepackage{gea}
\usepackage{pstricks}
\newrgbcolor{red}{1 0 0}
\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\begin{document}
% 1. Header
\memobasic{
Jonathan McDowell, SDS Group Leader }{
Glenn E.\ Allen, SDS }{
\tt DELTOCLK spec }{
2.0 }{
http://space.mit.edu/CXC/docs/docs.html\#deltoclk }{
/nfs/cxc/h2/gea/sds/docs/memos/deltoclk\_spec\_2.0.tex }
\vspace{1.0\baselineskip}
\noindent
%
Since this version of the spec differs substantially from Rev.\ 1.4, the
changes are not highlighted in red.
% 2. Description
\section{ Description }
Infrequently, the values of {\tt DELTOCLK} in exposure-records files are
anomalously high or anomalously low for a particular frame of a particular
node. As a result the bias- and {\tt DELTOCLK}-adjusted pulse-height data
is inaccurate. Recall that
%
\begin{equation}
{\tt PHAS}[i,j]_{k}
=
{\tt RAW\_PHAS}[i,j]_{k} -
{\tt BIAS}[i,j] -
{\tt DELTOCLK}_{k-1},
\end{equation}
%
where ${\tt PHAS}[i,j]_{k}$ is the adjusted pulse height for pixel $[i,j]$
of an event island in frame $k$, ${\tt RAW\_PHAS}[i,j]_{k}$ is the
unadjusted pulse height, ${\tt BIAS}[i,j]$ is the bias value of the pixel,
and ${\tt DELTOCLK}_{k-1}$ is the value of {\tt DELTOCLK} for the
corresponding node in the preceding frame. %, which may or may not be the
% preceding row.
This memo describes an algorithm to identify anomalous values of {\tt
DELTOCLK}. The identification process accommodates the following features.
The initial values of {\tt DELTOCLK} are typical invalid (\ie\ 4095). There
can be steep positive or negative gradient in the values of {\tt DELTOCLK}
at the beginning of an observation until the temperature of a detector
stabilizes. Small scale, periodic variations occur in some of the data. One
feature that is not handled by the algorithm, at least using the default
parameter settings, is the case when there is a large number of frames
between consecutive values of {\tt DELTOCLK} (``frame gaps''). An
examination of the data suggests that such cases are rare and that the
sparsely sampled values of {\tt DELTOCLK} are not anomalous. The
well-sampled data immediately preceding and following a gap is handled well.
Level 0 data files remain unchanged (\ie\ include the anomalies), but Level
1 files have pulse-height and overclock data that are corrected, if
possible, to compensate for the anomalies. Events that are adversely
affected by anomalies have a {\tt STATUS} bit set to one and are excluded
from Level 2 event files. A list of the anomalies is included in
exposure-statistics files so that users can compensate for any changes to
the exposure time.
% 3. Input
\section{ Input }
\begin{enumerate}
\item
Level 0 pulse-height event islands
\item
Level 0 {\tt DELTOCLK}s
\item
Level 0 {\tt EXPNO}s
\end{enumerate}
% 4. Output
\section{ Output }
\begin{enumerate}
\item
Level 1 pulse-height event islands that have been updated to compensate
for anomalies
\item
Level 1 event {\tt STATUS} bit information that has been updated to
indicate whether or not events are affected by anomalies
\item
A Level 1 exposure-statistics file with {\tt OVRCLOCK} values that have
been updated to compensate for anomalies
\item
A Level 1 exposure-statistics file that includes a list of the anomalies
in a {\tt BADOCLK} HDU
\end{enumerate}
% 5. Parameters
\section{ Parameters }
\begin{enumerate}
\item
{ \tt infile,f,a,"",,,"Name(s) of input event-data file(s)" }
\item
{ \tt outfile,f,a"",,,"Name of output event-data file" }
\item
{ \tt exrfile,f,a,"",,,"Name(s) of input exposure-records file(s)" }
\item
{ \tt expstatfile,f,a,"",,,"Name of output exposure-statistics file" }
\item
{ \tt numframes,i,h,5,3,1001,"Nominal number of data points in the
sliding time window used \\ to smooth the data (must be odd and no less
than {\tt minnumframes})" }
\item
{ \tt maxframegap,i,h,11}%
%
\footnote{The largest {\tt DTYCYCLE} used to date is 10. Furthermore, as
shown in Figure~\ref{fig02}, the frame numbers in consecutive rows of
data typically differ by eleven or less.}%
%
{\tt,1,1001,"Maximum number of frames between consecutive data points in
window" }
\item
{ \tt minnumframes,i,h,3,3,1001,"Minimum number of data points in window" }
\item
{ \tt deltoclkthresh,i,h,3}%
%
\footnote{As shown in Figures~\ref{fig03}--\ref{fig05}, the algorithm is
relatively insensitive to the value of {\tt deltoclkthresh} for values
greater than or equal to three. Therefore, a value of three seems to
represent a good balance between maximizing the number of anomalies
identified and minimizing the false-positive rate.}%
%
{\tt,1,4096,"Minimum offset in adu that is considered anomalous" }
\item
{ \tt numiter,i,h,3,1,10,"Number of iterations performed to smooth the
data" }
\end{enumerate}
% 6. Processing
\section{ Processing }
\begin{enumerate}
\item
The validity of the input is verified.
%
The {\tt infile} and {\tt exrfile}(s) must exist.
%
The data in the {\tt exrfile}(s) must be valid. A valid {\tt exrfile}
is one where each row contains a numerical value for {\tt EXPNO} and
four numerical values for {\tt DELTOCLK} (\ie\ one value for each node).
The values of {\tt EXPNO} must increase from one row to the next.
%
The values of the parameters {\tt numframes}, {\tt maxframegap}, {\tt
minnumframes}, {\tt deltoclkthresh}, and {\tt numiter} must be in their
valid ranges.
%
The parameter {\tt numframes} must be an odd number that is greater than
or equal to {\tt minnumframes}.
%
If one or more of these conditions is not satisfied, then a warning
message is produced and the data are not searched for anomalous values
of {\tt DELTOCLK}.
\item
To determine if one or more values of {\tt DELTOCLK} is anomalous, the
data in an {\tt exrfile} are processed to estimate the actual values of
{\tt DELTOCLK}. Except as noted below, the estimates $M$ are obtained by
smoothing the data with a sliding median%
%
\footnote{Here, the median is defined as follows. If the number of
elements in the set for which the median is being computed is odd,
then the median is the middle value of the set after the set has been
sorted. For example, the median of the set [1,1,4,5,6] is 4. If the
number of elements is even, then the median is the mean of the middle
two values of the sorted set. If the mean is not an integer, then the
mean is truncated to obtain an integer. For example, the median of
[1,1,4,5] is 2, not 2.5.}
%
filter.
\begin{enumerate}
\item
The computation of the value of $M$ for the first row of data is
performed last as described in item~\ref{step02f}.
\item
If there are at least two rows of data, then for the second row
$M_{2} = {\tt DELTOCLK}_{2}$.
\item
\label{step02c}
%
If there are at least three rows of data and if $n \ge 2$, where
%
\begin{equation}
n = \frac{{\tt numframes} - 1}{2},
\end{equation}
%
then for rows $k$ from 3 to $n+1$, $M_{k}$ is the median of the set
$[{\tt DELTOCLK}_{k-1}$, ${\tt DELTOCLK}_{k}$, ${\tt
DELTOCLK}_{k+1}]$, provided
%
\begin{eqnarray}
{\tt EXPNO}_{k} - {\tt EXPNO}_{k-1} & \le & {\tt maxframegap}\
{\rm and}
\label{eqn03} \\
{\tt EXPNO}_{k+1} - {\tt EXPNO}_{k} & \le & {\tt maxframegap}.\
\label{eqn04}
\end{eqnarray}
%
If either equation~\ref{eqn03} or \ref{eqn04} is not satisfied, then
$M_{k} = {\tt DELTOCLK}_{k}$. The data at the beginning is handled
differently than the rest of the data because it is not uncommon to
have relatively steep {\tt DELTOCLK} gradients at the beginning of
an observation.
\item
\label{step02d}
%
If there are at least $n+2$ rows of data, then for rows $k \ge n+2$,
$M_{k}$ is the median of the set $[{\tt DELTOCLK}_{k-n}$, {\ldots},
${\tt DELTOCLK}_{k+n}]$, provided that the {\tt maxframegap}
constraint is satisfied. For example, if $n = 2$, then $M_{k}$ is
the median of $[{\tt DELTOCLK}_{k-2}$, ${\tt DELTOCLK}_{k-1}$, ${\tt
DELTOCLK}_{k}$, ${\tt DELTOCLK}_{k+1}$, ${\tt DELTOCLK}_{k+2}]$
provided
%
\begin{eqnarray}
{\tt EXPNO}_{k-1} - {\tt EXPNO}_{k-2} & \le & {\tt maxframegap},
\label{eqn05} \\
{\tt EXPNO}_{k} - {\tt EXPNO}_{k-1} & \le & {\tt maxframegap},
\label{eqn06} \\
{\tt EXPNO}_{k+1} - {\tt EXPNO}_{k} & \le & {\tt maxframegap},
{\rm and}
\label{eqn07} \\
{\tt EXPNO}_{k+2} - {\tt EXPNO}_{k+1} & \le & {\tt maxframegap}.
\label{eqn08}
\end{eqnarray}
%
If equation~\ref{eqn05} is not satisfied, then ${\tt
DELTOCLK}_{k-2}$ is excluded from the computation of the median
because the frame gap is too large.
%
If equation~\ref{eqn06} is not satisfied, then both ${\tt
DELTOCLK}_{k-2}$ and ${\tt DELTOCLK_{k-1}}$ are excluded from the
computation of the median.
%
If equation~\ref{eqn07} is not satisfied, then both ${\tt
DELTOCLK}_{k+1}$ and ${\tt DELTOCLK}_{k+2}$ are excluded from the
computation of the median.
%
If equation~\ref{eqn08} is not satisfied, then ${\tt
DELTOCLK}_{k+2}$ is excluded from the computation of the median.
%
If there are not at least {\tt minnumframes} values included in the
computation of the median, then $M_{k} = {\tt DELTOCLK}_{k}$ because
there is too little data to accurately estimate $M_{k}$ (\ie\ to
determine if {\tt DELTOCLK} is anomalous).
\item
Steps \ref{step02c} and \ref{step02d} are repeated an additional
${\tt numiter}-1$ times (for a total of {\tt numiter} times) to
remove some of the noise from the estimates $M$. For these
additional computations, the new values of $M$ are computed from the
previous values of $M$ instead of from the values of {\tt DELTOCLK}.
That is, $M_{k}$ is the median of the previous set of values
$[M_{k-n}$, {\ldots}, $M_{k+n}]$ instead of the median of $[{\tt
DELTOCLK}_{k-n}$, {\ldots}, ${\tt DELTOCLK}_{k+n}]$. This
computation is subject to the same {\tt maxframegap} constraints
noted above. However, the constraint on the number of values
required to compute the median is different. The value of $M_{k}$
is only recomputed if there are {\tt numframes} values instead of
{\tt minnumframes} values in the computation of the median. This
more restrictive condition on the number of values in the
computation of the median helps prevent the loss of accuracy in the
value of $M_{k}$ where there are more data points in the computation
on one side of frame $k$ than there are on the other side (as is the
case at the beginning of a data set and at the edges of frame gaps).
\item
\label{step02f}
%
If there is more than one value of {\tt DELTOCLK} in the {\tt
exrfile}, then the first value $M_{1} = M_{2}$, independent of value
of ${\tt DELTOCLK}$. If there is only one value of {\tt DELTOCLK}
and if ${\tt DELTOCLK}_{1} = 4095$, then $M_{1} = 0$ because such a
value of 4095 is anomalous. If there is only one value of {\tt
DELTOCLK} and if ${\tt DELTOCLK}_{1} \ne 4095$, then $M_{1} = {\tt
DELTOCLK}_{1}$.
\end{enumerate}
\item
Once the estimates $M$ of the actual values of {\tt DELTOCLK} are
computed, the value of {\tt DELTOCLK} for frame $k$ is identified as
anomalous if
%
\begin{eqnarray}
{\tt DELTOCLK}_{k} & \ge & M_{k} + {\tt deltoclkthresh}\ {\rm or} \\
{\tt DELTOCLK}_{k} & \le & M_{k} - {\tt deltoclkthresh}.
\end{eqnarray}
\item
If the value of {\tt DELTOCLK} for frame $k$ is anomalous, then
%
\begin{enumerate}
\item
the {\tt outfile} is modified as follows
%
\begin{enumerate}
\item
\label{step04ai}
The pulse-height data are computed using
%
\begin{equation}
{\tt PHAS}[i,j]_{k+1} =
{\tt RAW\_PHAS}[i,j]_{k+1} -
{\tt BIAS}[i,j] -
M_{k}
\label{eqn11}
\end{equation}
%
instead of
%
\begin{equation}
{\tt PHAS}[i,j]_{k+1} =
{\tt RAW\_PHAS}[i,j]_{k+1} -
{\tt BIAS}[i,j] -
{\tt DELTOCLK}_{k}
\end{equation}
%
for all of the events that occur during frame $k+1$ (not frame
$k$) on the particular CCD and node with an anomaly. Since the
value of {\tt DELTOCLK} is computed at the end of a frame, not
the beginning, there is a one frame offset between the time when
it is obtained and time when it is used.
%
If an event occurs along the edge of a node that has an
anomalous value of {\tt DELTOCLK}, then only the pulse-height
values $[i,j]$ of the event island that are affected by the
anomaly are adjusted. The pulse heights of pixels that lie on
the adjacent, good node are not adjusted. Similarly, if an event
in frame $k+1$ occurs on a good node that is adjacent to a node
with an anomalous value of {\tt DELTOCLK}, then only the pulse
heights of pixels $[i,j]$ that lie on the adjacent, bad node are
adjusted.
%
The pulse-height computation described by equation~\ref{eqn11}
is only performed if the {\tt DATAMODE} is {\tt CC33\_FAINT},
{\tt FAINT}, {\tt FAINT\_BIAS}, or {\tt VFAINT}. The computation
is not performed if the {\tt DATAMODE} is {\tt CC33\_GRADED} or
{\tt GRADED}.
\item
{\tt STATUS} bit 10 (of 0-31) is set to one for all of the
events that are affected by an anomaly, whether the events are
on the node with the anomaly or an adjacent node (see
item~\ref{step04ai}).
\end{enumerate}
\item
and the {\tt expstatfile} is modified as follows.
%
\begin{enumerate}
\item
The {\tt OVRCLOCK}s are computed using
%
\begin{equation}
{\tt OVRCLOCK}_{k} =
M_{k} +
{\tt INITOCL}x
\end{equation}
%
instead of
%
\begin{equation}
{\tt OVRCLOCK}_{k} =
{\tt DELTOCLK}_{k} +
{\tt INITOCL}x,
\end{equation}
%
where $x = {\tt A}$, {\tt B}, {\tt C}, or {\tt D} for ${\tt
NODE\_ID} = 0$, 1, 2, or 3, respectively.
\item
A ``{\tt BADOCLK}'' HDU is included, which contains a list of
the {\tt EXPNO}, {\tt CCD\_ID}, and {\tt NODE\_ID} information
for the frame immediately following frame $k$.
\end{enumerate}
\end{enumerate}
\end{enumerate}
% 7. Caveats
\section{ Caveats }
\begin{enumerate}
\item
As of the date of this memo, the tools {\tt mkarf}, {\tt mkgarf}, {\tt
mkinstmap}, and {\tt mkwarf} have not been modified to include the time
dependence introduced by specific nodes being bad during specific
frames. Note that the bad region includes not only the node with an
anomaly, but also a column in one or more adjacent nodes (see
item~\ref{step04ai}).
\end{enumerate}
% 8. TBD
% \section{ TBD }
% \begin{enumerate}
% Add to description:
% Fig -- spikes up
% Fig -- spikes down
% Fig -- 4095 in first frame, second frame OK (histograms)
% Fig -- steep positive gradients
% Fig -- steep negative gradients
% Fig -- periodic variations (acisf053060108N003_3_exr0.fits)
% Fig -- gaps
% List of files with gaps
% Fig of frame gaps: include an integral distribution
% Note to Kenny:
% DELTOCLK adjustment is done incorrectly (k instead of k-1)
% \item
% Make sure the algorithm works for the first frame of data, which often
% has a value of 4095.
% \item
% Make sure the algorithm works near the beginning of the data, when there
% can be steep gradients in the values of DELTOCLK.
% \item
% Make sure the algorithm works for short files.
% \item
% Make sure that there are no problems for the occasions when the frame
% gaps are too big to compute the median (e.g. interleaved mode). Should
% the value of {\tt maxframegap} be a multiple of {\tt DTYCYCLE}?
% \item
% Periodicity due to heater cycling (Catherine: 120426)?
% \item
% Verify that equations~11 and 12 are not off by one frame
% \item
% Are there specific pathological cases?
% What is wrong with acisf056324119N004\_5\_exr0.fits?
% \item
% What is the false positive rate?
% \item
% Try the CIAOX tool {\tt deltoclk}.
% \item
% Fig.~1: a case with negative anomalies and a case that curves up
% Add a case to Fig. 1 that has low values.
% \item
% Recreate Figures \ref{fig03}--\ref{fig05} (especially \ref{fig05}) to
% determine what the default value of {\tt maxframegap} should be.
% \item
% Verify that the invalid pulse heights are in the following frame.
% \item
% Optimize the parameter values. If ${\tt numframes} = 3$, then the
% values of $M_{k}$ may be inaccurate for the occasions when there are
% adjacent anomalous values of {\tt DELTOCLK}.
% \item
% Spec for V\&V {\tt INITOCL}$x + {\tt DELTOCLK}$ test
% \item
% Plot the number of anomalies as a function of pulse height.
% \item
% Add filenames to EXR\_INDEX
% \item
% Test EXR\_INDEX and histogram
% \end{enumerate}
% 9. Figures
\newpage
% 9.1. Fig. 1
\begin{figure}
\begin{center}
\epsfig{file=plot_deltoclk_examples,angle=-90,width=6.5in}
\end{center}
\caption{%
Three examples of the relative values of {\tt DELTOCLK} as a function of
frame number. The data are from exposure-statistics files and an
arbitrary constant has been added to the values of {\tt DELTOCLK} to
improve the visual clarity. The black, red, and green curves are the
data for $({\tt OBS\_ID}, {\tt CCD\_ID}, {\tt NODE\_ID}) = (1886,7,0)$,
(11793,9,1), and (13019,9,2), respectively. Anomalous ``spikes'' are
evident in the red and green curves. The black curve has no such
anomalies.
%
\label{fig01}
}
\end{figure}
% 9.2. Fig. 2
\begin{figure}
\begin{center}
\epsfig{file=plot_expno_diffs,angle=0,width=6.5in}
\end{center}
\caption{%
A histogram of the differences between consecutive frame numbers for
63,778 exposure-records files. A separate histogram was computed for
each node of each file. The histogram in this figure is a sum of each
of these histograms. If a difference is greater than 1000, then the
difference is set to 1000. If a difference is less than 0, then the
difference is set to 0. The inset histogram is a plot of the same data,
except that it includes only the data in the range from 0 to 100. The
histogram includes all frames.
%
\label{fig02}
}
\end{figure}
% 9.3. Fig. 3
\begin{figure}
\begin{center}
\epsfig{file=plot_deltoclk_diffsA,angle=-90,width=6.5in}
\end{center}
\caption{%
A histogram of the differences between the values of {\tt DELTOCLK} and
the values of the median {\tt DELTOCLK} for 63,778 exposure-records
files. A separate histogram was computed for each node of each file.
The histogram in this figure is a sum of each of these histograms. The
values of the median were computed using a ${\tt numframes} = 5$, ${\tt
maxframegap} = \infty$, and ${\tt minnumframes} = 3$. If a value of
{\tt DELTOCLK} differs by more than 4096 adu from the corresponding
median, then the difference is set to 4096 or $-4096$ as appropriate.
%
\label{fig03}
}
\end{figure}
% 9.4. Fig. 4
\begin{figure}
\begin{center}
\epsfig{file=plot_deltoclk_diffsB,angle=-90,width=6.5in}
\end{center}
\caption{%
This figure is identical to Figure~\ref{fig03}, except that it includes
only the data in the range from -100 to 100 adu.
%
\label{fig04}
}
\end{figure}
% 9.5. Fig. 5
\begin{figure}
\begin{center}
\epsfig{file=plot_anom_frac,angle=0,width=6.5in}
\end{center}
\caption{%
A plot of the fraction of the values of {\tt DELTOCLK} that are
identified as anomalous as a function of the parameter {\tt
deltoclkthresh}. This plot was produced using the data that is shown in
Figure~\ref{fig02}. The inset histogram is a plot of the same data,
except that it includes only the data in the range from 0 to 20.
%
\label{fig05}
}
\end{figure}
% 10. Finish
\end{document}