# Theory¶

## Image processing¶

The first step of extracting features from a detector image is the application of a filter, also known as convolution. A convolution kernel is a small matrix that operates on every element (pixel) of an image, combining it with the surrounding pixels in a convolution, resulting in some modification of the image that makes it easier to process, for example, sharpening, blurring or edge-finding.

The general expression for convolution is,

where \(g(x, y)\) is the filtered or transformed image, \(f(x, y)\) is the original image, and \(w\) is the convolution kernel matrix. This is roughly equivalent to a weighted average of the elements of a sub-matrix of the image with the same dimensions as the kernel (the weight being the value of the element in the kernel), applied to the central element of the sub-matrix.

The effect of the filter obviously depends on the form of the convolution kernal, of which there are 5 types.

Delta

A dirac delta function that does not modify the image.

Constant

Apply a blur, as the average over the pixels surrounding a central pixel. A 3 x 3 example:

Radial

Taking elements in a radius between \(r_1\) and \(r_2\) to be one, and normalising by the sum of the elements of the matrix, this either becomes a notionally circular filter that has the same effect as a constant filter, or an annulus. Not generally of interest itself, but used to construct the annular kernel. The following (unnormalised) example has \(r_1 = 10\) and \(r_2 = 15\).

Annular

Consists of two regions: firstly, between 0 and \(r_1\), with a positive value, and secondly, between \(r_2\) and \(r_3\), with a negative value. This is equivalent to the sum of two radial kernels, with parameters \(\{0, r_1\}\) and \(\{r_2, r_3\}\) respectively, each individually normalised to the sum of its elements. The following (unnormalised) example has parameters \(r_1 = 5\), \(r_2 = 10\) and \(r_3 = 15\). The overall effect is a local average with a local mean background subtraction.

Enhanced annular

This is similar to the annular kernel, but with a different normalisation. Instead of normalising two radial kernels separately, the enhanced annular kernel is normalised by the sum of the absolute values of the elements.

The next state is thresholding, i.e. discarding any pixel with a count below the threshold value. After this point, it is possible to use either a standard blob-finding algorithm or an OpenCV implementation to detect connected components in the image. The difference between the OpenCV implementation in the Experiment section, and the proprietary implementation in the Peak Finder section is that the latter merges blobs across images to form what is notionally a 3D model, the third dimension being perpendicular to the plane of the images.

## Shape Prediction¶

### Definition of peak shape¶

In image analysis (peak finding), contiguous blobs (notionally detector spots) are used to construct an ellipsoid in 3D space (two detector coordinates and the frame number). This blob can be parameterised in terms of the moments of mass: the zeroth (total mass) \(m_0\), the first \(\mathbf{m}_1\) and second (moment of inertia) \(M_2\). These can be used to compute the inertia matrix, or variance/covariance matrix.

The metric tensor of the ellipsoid \(A\) d is then the inverse of
covariance, i.e. \(A = C^{-1}\). The eigenvalues of the covariance matrix
(\(\Lambda\) in \(3\times 3\) diagonal matrix form) of an ellipsoid
defines the radii around the principal axes, or alternatively the *variances* in
the directions of the principal axes. The eigenvalues (\(U\) as a matrix)
define the principal axes. These are related by \(CU = V\Lambda\), thus the
metric tensor can be derived from eigenvectors and eigenvalues via \(A =
U\Lambda^{-1}U^T\).

When constructing integration regions, we construct a central ellipsoid representing the peak, and a second concentric ellipsoid to represent the background. A third ellipsoid of intermediate radius may defin the beginning of the background region such that there is an ignored or “guard” region between the peak and background. If we interpret the eigenvalues \(\sigma_i, (i = 1, 2, 3)\) as the square root of thevariance in the direction of the principal axes, then scaling the peak ellipsoid by \(3\sigma\) means that 99.7% of the points are contained in the peak region.

An example is sketched above. The dotted ellipses are the integration regions where the ellipsoids intersects the detector images. The grid is a single image, each square representing one pixel. The principle axis of the ellipsoid is not generally perpendicular to the image plane, so centre of the ellipse intersecting the image will vary from frame to frame, as the sample rotates.

There are two schemes for defining an integration region in OpenHKL:

*Variable ellipsoid*— The peak and background ellipsoids are bounded using three parameters: peak end, background begin and background end, all of which are scaling factors. The peak ellipsoid is defined by the covariance matrix of a blob of voxels in the case of strong peaks, and as the mean covariance of neighbouring strong peaks in the case of weak peaks. The covariance matrix is scaled by the peak end factor to define the peak region, and the shell between the covariance matrix scaled by background begin and background end defines the background region. All three of these parameters are in units of multiples of \(\sigma\), the covariance of the ellipsoid.*Fixed ellipsoid*— The peak region is defined by the “mean radius” of the covariance ellipsoid in pixels; this is simply the mean of the half principal axes of the ellipsoid. The background begin and background end are scaling factors, which multiply the peak region.

The variable ellipsoid scheme is more natural, since it only requires three scaling factors; however, weak peaks with large variances will necessarily have larger integration regions, which is usually undesirable. Thus the fixed ellipsoid scheme is normally preferable.

### Rotating the beam profile¶

We make a simplifying assumption, that for a *perfect plane wave* \({\mathbf{{k}}}_\text{i}\), the observed scattering function has the form

i.e. that the peak shape is independent of its intensity and Miller index, specified by a single function \(f({\mathbf{{q}}})\).

Now suppose that the incoming plane wave actually has momentum \({\mathbf{{k}}}_\text{i}+ \delta {\mathbf{{k}}}_\text{i}\), with \(\delta {\mathbf{{k}}}_\text{i}\) sampled from a probability distribution \(P(\delta {\mathbf{{k}}}_\text{i})\). Let \(\mathbf{{u}}\) be the unit vector pointing from the sample origin to a given detector pixel. As we only consider elastic scattering, we can write the wavenumber as \(K:= k_\text{i}= k_\text{f}\). Then the outgoing momentum associated with this pixel is

where \({\mathbf{{k}}}_\text{f}= \mathbf{{u}}K\) and \(\delta {\mathbf{{k}}}_\text{f}= \mathbf{{u}}({\mathbf{{k}}}_\text{i}\cdot \delta {\mathbf{{k}}}_\text{i}) / K\). Therefore, we have

where \(\mathbf{{A}}\) is the matrix

Note that \(\mathbf{{A}} {\mathbf{{k}}}_\text{i}= {\mathbf{{q}}}\) and therefore \({\mathbf{{q}}}+ \delta {\mathbf{{q}}}= \mathbf{{A}}({\mathbf{{k}}}_\text{i}+ \delta {\mathbf{{k}}}_\text{i})\).

Therefore, the observed intensity at detector position \((x,y)\) should be proportional to

where \(\mathbf{{R}}\) is the rotation matrix taking lab coordinates to
sample-fixed coordinates. The matrix \(\mathbf{{A}}\) has
\(\det \mathbf{{A}} = -\frac{1}{2}{\mathbf{{q}}}^2\) and therefore is
invertible. [1] So we have
\(\delta {\mathbf{{k}}}_\text{i}= A^{-1} \delta {\mathbf{{q}}}\) and
\(d(\delta {\mathbf{{k}}}_\text{i}) = |\det \mathbf{{A}}|^{-1}
d(\delta {\mathbf{{q}}})\). Let \(\mathbf{{\Sigma}}_M\) denote the
variance-covariance matrix of the profile shape \(f\) and let
\(\mathbf{{\Sigma}}_D\) denote the variance-covariance matrix of the
beam divergence \(\delta {\mathbf{{k}}}_\text{i}\). Then from the above
formula we see that the *observed* profile shape will have (in
sample-fixed q-space) a variance-covariance matrix given by

where \(R\) is the rotation matrix from lab space to sample space and \(\mathbf{{A}} = K^{-2} {\mathbf{{k}}}_\text{f}{\mathbf{{k}}}_\text{i}^\intercal- \mathbf{{1}}\). Note that the matrix \(\mathbf{{A}}\) depends only on \({\mathbf{{k}}}_\text{f}\), i.e. on the detector pixel location, and the matrix \(\mathbf{{R}}\) depends on the sample orientation, i.e. the frame number.

Now make a simplifying assumption, \(\mathbf{{\Sigma}}_M = \sigma_M^2 \mathbf{{1}}\) and \(\mathbf{{\Sigma}}_D = \sigma_D^2 \mathbf{{1}}\) so that the expected variance-covariance matrix (1) becomes

Consider \(N\) observed blobs parameterized by \((\mathbf{{\Sigma}}_b, \mathbf{{R}}_b, \mathbf{{A}}_b)\). Write \(\mathbf{{S}}_b:=\mathbf{{R}}_b\mathbf{{A}}_b\). Form the penalty function

Determine \(\sigma_M^2\) and \(\sigma_D^2\) by minimizing the difference between the empirical \(\mathbf{{\Sigma}}_b\) and the expectation (2). Set \(\mathbf{{\nabla }}L = 0\) to obtain the 2x2 system of linear equations

which is easily solved. One can also solve for the the full covariance matrices \(\mathbf{{\Sigma}}_M, \mathbf{{\Sigma}}_D\) via gradient descent, since the gradient is easily computed analytically. [Here Jonathan says he tested “this” out in Python, and it seemed to work pretty well, so the assumptions may be justified. But we ignore whether “this” refers to the simplified \(\mathbf{{\Sigma }}= \sigma^2 \mathbf{{1}}\) or to the full computation with arbitrary \(\mathbf{{\Sigma}}\).]

Now, if we work in lab-based q-space, under the simplifying assumptions
above, we find a covariance matrix **TODO: this is notationally wrong
and totally obscure**

### Kabsch’s Coordinate System¶

In [T2] Kabsch introduced a per-peak coordinate system intented to undo effects from detector geometry. See also [T3] for an updated description of the coordinates and integration technique. The basis introduced by Kabsch is the following:

with corresponding coordinates

The coordinates \(\epsilon_1, \epsilon_2\) correspond to the angular distribution (in radians) of the peak, as if it were measured on the Ewald sphere. Hence this corresponds to beam divergence and we may model the intensity distribution as \(\exp(-(\epsilon_1^2 + \epsilon_2^2)/2 \sigma_D^2)\).

To understand the last coordinate, consider the following. Take a peak with center \({\mathbf{{q}}}\) and consider a nearby point \({\mathbf{{q}}}'\). We project \({\mathbf{{q}}}'\) back to the Ewald sphere by rotating along the axis \({\mathbf{{e}}}_1\) (which is the normal of the plane containing \({\mathbf{{k}}}_\text{f}\) and \({\mathbf{{k}}}_\text{i}\)). The velocity of \(q\) when it crosses the Ewald sphere by rotating along this axis is \({\mathbf{{e}}}_1 \times {\mathbf{{q}}}\). It is easy to verify that

and therefore the coordinate \(\epsilon_3\) may be interpreted as (approximately) and angular distance from the Ewald sphere.

To better understand \({\mathbf{{e}}}_3\), consider the following: we want to find the axis \(\mathbf{{a}}\) such that \({\mathbf{{q}}}\) passes through the Ewald sphere as fast as possible. Hence, we want to maximize \((\mathbf{{a}}\times {\mathbf{{q}}}) \cdot {\mathbf{{k}}}_\text{f}\) subject to the constraint \(\mathbf{{a}}\cdot \mathbf{{a}}= 1\). Now \((\mathbf{{a}}\times {\mathbf{{q}}}) \cdots {\mathbf{{k}}}_\text{f}) = \mathbf{{a}}\cdot (\mathbf{{a}}\times {\mathbf{{k}}}_\text{f}) = \mathbf{{a}}\cdot ({\mathbf{{k}}}_\text{f}\times {\mathbf{{k}}}_\text{i})\), so by the method of Langrange multipliers we must solve \({\mathbf{{k}}}_\text{f}\times {\mathbf{{k}}}_\text{i}= \lambda \mathbf{{a}}\), which tells us immediately that the axis is in the direction of \({\mathbf{{e}}}_1\).

## Least squares integration¶

### Fitted Intensity¶

As shown in [T4], the integration error for weak peaks is dominated by background subtraction and it is typically better to find the integrated intensity by fitting to a profile learned from strong peaks.

3D profile fitting is used by XDS [T5] and is described in some detail in [T2].

As in the previous subsection, using a covariance matrix and a parameters \(r_1 < r_2 < r_3\) we produce sets \(\mathcal{P}\) and \(\mathcal{B}\) of peak and background points. Assume that we know the resolution function \(R_i\), normalized as

We model the observed intensities \(M_p\) as

where \(B, I\) are the mean background and integrated intensity, yet to be fit. To find optimal values of \(B,I\) we minimize the chi-squared loss

For a fixed set of variances, minimizing \(\chi^2\) reduces to the 2x2 linear system below:

Write this equation as \(Ax = b\). It is easy to compute that the covariance matrix of \(b\) is exactly the coefficient matrix \(A\), and therefore the variance-covariance matrix of the solution vector \(x = (B, I)\) is given by \(A^{-1}\).

The solution given above depends on the pixel uncertainties \(\sigma_p^2\). As suggested by Kabsch 2010, we solve this iteratively. To begin, we set all \(\sigma^2_p\) equal to some fixed value, say 1. This allows us to solve for \(B\) and \(I\). We then put the solved values into the error model

and iterate until either \(I\) becomes negative, or \((B, I)\) do not change within some given convergence criterion.

The image above demonstrates the process of generating and fitting profiles in practice. One or more strong peaks in the neighbourhood of a weak reference peak are averaged over histogram with a different grid. In the image, a strong peak with a 7 x 6 pixel bounding box is binned on a 20 x 20 histogram, which is normalised after adding all of the neighbouring peaks. In practice, each pixel of a strong peak is subdivided on a finer mesh, in this case, a 5 x 5 subgrid, and 1/25 of the pixel signal is added to the relevant histogram bin. This results in smoother profiles. When profile integrating a weak peak, the profile value of pixel \(i\), \(p_i\), is taken from the nearest histogram bin.

## \(I/\sigma\) Integration¶

This is the integration technique used by RETREAT . The method is described in detail in [T6]. In the article [T7] there is a detailed comparison between this method and profile fitting. For a given peak with mean background \(\mu_b\), center \(x_0\), and covariance matrix \(\mathbf{{\Sigma}}\), define

Then the error of \(I_\sigma\) can be estimated (assuming Poisson statistics) as

where \(n_s = |X_s|\) is the number of points contributing to \(I_\sigma\) and \(n_b\) is the number of points used for background estimation.

**Important Remark:** The function \(I_\sigma\) is, to a good
approximation, *independent of the coordinate system x*. It is an
*intrinsic* property of the intensity distribution, independent of the
coordinates used to express the distribution. We therefore do not have
to worry about changes of coordinates, as in Kabsch’s paper.

Now, suppose that we take some value \(t\) to be the cutoff for strong peak integration. We can define the integrated peak profile

The uncertainty in \(p_s\):

Assuming \(s <\), we have

and therefore we have everything we need to estimate \(p_s\) and \(\sigma^2(p_s)\). Finally, if we have \(N\) independent strong peaks with measured profiles \(p^i_s, \sigma^2(p^i_s)\), then (assuming the peaks are non-overlapping) we can estimate the true profile as

**Assumptions:** We now assume that the intensity distributions for all
peaks are approximately equal, or at least slowly varying as a function
of detector position and sample orientation. Therefore, we model the
function \(I_\sigma\) as

where \(I_0\) is the “true” integrated intensity and \(P(\sigma)\) is a function independent of the particular peak. Given a collection of \(N\) strong peaks, we can estimate \(P(\sigma)\) as

**Remark** When calculating \(\sigma^2(I_\sigma / I_0)\) be very
careful, because \(I_\sigma\) and \(I_0\) are definitely
correlated!! Assuming \(s < t\), and the sets of peak points and
backgruond points are disjoint, *and Poisson statistics*, we have

Now, suppose that we estimate the true intensity as \(I = I_t\) for some \(t\). Then for \(s < t\) we have

**Integration Method:** Now suppose we have a good estimate of
\(p_\sigma, \sigma^2(p_\sigma)\) and we have computed
\(I_\sigma\) for some weak peak (note: this assumes we can
accurately predict the covariance matrix; see below). From the model
intensity distribution, we have \(I_\sigma \approx I_o p_\sigma\),
and therefore \(I_0 \approx I_\sigma / p_\sigma\). We have

Therefore, the relative error \(\sigma^2(I_\sigma / p_\sigma) / (I_\sigma/p_\sigma)^2\) is

The fitted intensity is then defined to be

A. Ostermann and T. Schrader. Biodiff: diffractometer for large unit cells. *Journal of large-scale research facilities JLSRF*, 1:2, 2015. doi:10.17815/jlsrf-1-19.

W. Kabsch. Evaluation of single-crystal x-ray diffraction data from a position-sensitive detector. *Journal of Applied Crystallography*, 21:916–924, 1988. doi:10.1107/S0021889888007903.

W. Kabsch. Integration, scaling, space-group assignment and post-refinement. *Acta Crystallographica Section D Biological Crystallography*, 66:133–144, 2010. doi:10.1107/s0907444909047374.

R. Diamond. Profile analysis in single crystal diffractometry. *Acta Crystalograhica*, 25:43–55, 1969. doi:10.1107/S0567739469000064.

W. Kabsch. Xds. *Acta Crystallographica Section D Biological Crystallography*, 66:125–132, 2010. doi:10.1107/s0907444909047337.

C. Wilkinson, H. W. Khamis, R. F. D. Stansfield, and G. J. McIntyre. Integration of single-crystal reflections using area multidetectors. *Journal of Applied Crystallography*, 21:471–478, 1988. doi:10.1107/S0021889888005400.

E. Prince, C. Wilkinson, and G. J. McIntyre. Comparison of the sigma(\it I)/\it I and least-squares methods for integration of bragg reflections. *Journal of Applied Crystallography*, 30:133–137, 1997. doi:10.1107/S0021889896012824.

Go to top.