Workflow¶
The workflow in OpenHKL approximately follows the order of the icons in
the sidebar: find peaks, filter, autoindex, predict, refine, merge. The
Home
tab enables experiment creation and loading/saving, while
the Experiment
tab allows inspection and editing of various aspects
of the physical experiment.
The virtual “experiment” is the highest level object, and contains all
information from the physical experiment (the data sets), plus any
derived/reduced data, such as the unit cell, peaks, indices, and merged
data statistics. This object can be saved at any stage in the workflow
by returning to the Home
tab.
Home¶
The home tab shows a list of loaded experiments, and allows creation of new experiments, the loading of a saved experiment state and saving an existing experiment state.
Create new experiment
opens a dialogue prompting the user to name the experiment and select the instrument used. The parameters specific to that instrument are loaded from a YAML-formatted instrument file found with the source code.Load from file
loads ahdf5
file containing a saved experiment state.Save current
saves the current experiment state as ahdf5
file.Save all
saves all experiments inhdf5
format.
There are three tables summarising the state of the experiment on the left hand side of this window.
Column |
Unit |
Description |
---|---|---|
Name |
Name of the data set |
|
Diffractometer |
Name of diffractometer used |
|
Number of frames |
Number of images in this data set |
|
Number of rows |
pixels |
Height of image |
Number of columns |
pixels |
Width of image |
Column |
Unit |
Description |
---|---|---|
Name |
Name of the peak collection |
|
Number of peaks |
Number of peaks in this collection |
|
Number of invalid |
Number of rejected peaks in this collection |
|
Is indexed |
Y/N |
Peaks in this collection have Miller indices |
Is integrated |
Y/N |
Peaks in this collection have calculated intensities |
Type |
Labels peak collection as |
Column |
Unit |
Description |
---|---|---|
ID |
Integer label of unit cell |
|
Name |
Name of Unit Cell |
|
Space group |
Assigned space group of unit cell |
|
\(a\) |
Å |
\(a\) cell parameter |
\(b\) |
Å |
\(b\) cell parameter |
\(c\) |
Å |
\(c\) cell parameter |
\(\alpha\) |
degrees |
\(\alpha\) cell angle |
\(\beta\) |
degrees |
\(\beta\) cell angle |
\(\gamma\) |
degrees |
\(\gamma\) cell angle |
Sets of detector images taken at different sample rotation angles can be added to the experiment either via the Data
menu or by clicking the “add data set” icon on the home panel. Image data sets can take the form of .tiff
images, .raw
images or Nexus data sets. After selecting the appropriate option to load data files, the user is prompted to select files, and then in the case of raw and tiff data, enter some metadata parameters via the raw/tiff data loader dialogue.
Parameter |
Unit |
Description |
---|---|---|
Data arrangement |
Row/column major |
Whether rows or columns are contiguous in memory |
Data format |
16/32 bit integer/float |
Bit depth and type of pixel value |
Swap endian |
T/F |
Swap endianness of data (big/little) |
Image resolution |
pixels |
Select image resolution, columns x rows (Raw only) |
Rebinning |
pixels |
Downscale image by given ratio (Tiff only) |
\(\Delta\chi\) |
degrees |
\(\chi\) angular stepping for sample |
\(\Delta\omega\) |
degrees |
\(\omega\) angular stepping for sample |
\(\Delta\phi\) |
degrees |
\(\phi\) angular stepping for sample |
wavelength |
Å |
wavelength of incident neutron beam |
Use baseline/gain |
T/F |
Use the baseline and gain specified for the detector |
baseline |
Value subtracted from each pixel before integration |
|
gain |
Value dividing each pixel before integration |
Unlike raw images, tiff image files include a header containing most of the required metadata, so only information on the experimental parameters will be required. Nexus files contain all the necessary experimental metadata and has no associated loading dialogue.
Experiment¶
This panel contains three tabs with functionality that is normally required
before the data reduction process is started: strategy
, histograms
and
masks
.
There are three tabs on the left-hand panel: Strategy
, Histograms
and
Masks
.
Strategy¶
The strategy
tab contains controls for finding blobs (notionally
peaks) in a single image, using those blobs to determine the unit cell, and
predicting the completeness of the peaks given a sample rotation angle
increment. The Set initial direct beam position
controls the point at which
the direct beam intersects the detector image. In the first instance, this is
assumed to be in the centre of the image, but this may be off by a few pixels.
Clicking the checkbox allows the user to drag a resizable crosshair in the
detector image panel, which will define the exact direct beam position. The x
offset
and y offset
controls define the offset of this crosshair, in pixels,
with respect to the centre of the image.
The Find blobs in this image
box allows the user to leverage image processing
algorithms from the OpenCV (namely SimpleBlobDetector
) library to locate
detector spots.
Parameter |
Unit |
Description |
---|---|---|
Convolution kernel |
Matrix for image filtering |
|
Filtered image threshold |
pixel counts |
Pixels with value below threshold are discarded |
Minimum blob threshold |
pixel counts |
Blob is discarded if it contains fewer points than this |
Maximum blob threshold |
pixel counts |
Blob is discarded if it contains more points than this |
Search all images |
FInd spots in all images in data set |
|
Apply threshold to preview |
Show the filtered and threshold image |
The autoindexer parameters are described in Autoindexing, but it should be noted that indexing from a single image generally requires masking of ``difficult’’ regions of the detector such as the beam stop, and a good initial guess for the direct beam position.
Parameter |
Unit |
Description |
---|---|---|
\(\Delta\chi\) |
degrees |
Angle increment for sample rotation about \(\chi\) axis |
\(\Delta\omega\) |
degrees |
Angle increment for sample rotation about \(\omega\) axis |
\(\Delta\phi\) |
degrees |
Angle increment for sample rotation about \(\phi\) axis |
Number of increments |
Sample rotation increments or images to simulate |
|
d range |
Å |
Resolution range for predicting peaks |
Histograms¶
The Histograms
tab allows the user to plot histograms of pixel statistics
(as opposed to peak statistics).
The Per-pixel detector count histograme
allows the user to plota histogram of
pixel counts for either the current single image, or for all images (by checking
the All images
box. Checking the Plot intensity profiles
box changes the
interaction mode in the detector image to draw a (“Line plot”, “Horizontal
slice” or “Vertical slice”) through the image, and plot a histogram of the
intensity along that line with the given number of bins.
Masks¶
The Masks
tab allows the user to add masks to the data set. A mask is
either an ellipse or a rectangle present on all images in the data set, on
which detected spots or peaks and integration is not valid. Possible reasons
to add a mask can be to prevent peak finding on a the beam spot, or to
prevent integration of peaks on heterogeneous features such as seams between
detector plates. The Add detector image masks
check box changes the
interaction mode in the detector image to draw a mask by dragging and
dropping, the shape of which is specified in the list (rectangular or
elliptical). Masks are displayed in the list below, and the extents of the
masks can be fine tuned.
The screenshot above demostrates masking the detector image to exclude invalid regions from the peak search. The beam stop and the seam between detector plates (thin white line in this context) have been masked using the masking tool in the bottom right hand corner, such that any peaks found in these regions will be rejected. The region around the beam stop containing the air scattering halo has also been masked because the heterogeneous background will result in poor integration.
Find peaks¶
The initial peak search is essentially a pure image processing step, with no crystallographic input. The technique is roughly as follows
Apply an image filter to subtract local background
Apply a threshold to the resulting image
Find connected components (“blobs”) of the resulting thresholded image
Merge blobs that overlap, according to some cutoff
In the first step, we apply a filter which consists of a central circular region with positive weight, and an outer annular region with negative weight. The weights are chosen so that the convolution computes the local average of the circular region subtracted by the average of the annular region, effectively giving a local background subtraction. The radii of the circle and annulus may be specified by the user.
To find connected components, we use a standard blob detection algorithm. In the last step, we compute inertia ellipsoids for each blob, and merge those blobs whose ellipsoids overlap, after a user-defined scaling factor has been applied. The merging process is repeated until there are no longer any overlapping ellipsoids.
The collision detection problem for ellipsoids is sped up by storing them in an octree.
Parameter |
Unit |
Description |
---|---|---|
Threshold |
pixel counts |
During peak finding, pixels above this value are set to 1, otherwise 0 after filtering |
Merging scale |
\(\sigma\) |
Scale factor for covariance matrix to detect collisions between blobs |
Blob size range |
pixel counts |
Only blobs with counts in this range will be kept |
Maximum width |
frames |
Only blobs spanning fewer images than this number will be kept |
Convolution kernel |
Type of convolution matrix to use in image filtering |
|
Parameters |
Radius parameters used in construction of convolution matrix |
|
Frame range |
frames |
Find peaks in this image range |
Apply threshold to preview |
Switch detector image to filtered and thresholded view |
At this stage in the workflow, there are no available profiles to perform profile integration. The found peaks are integrated at this stage using Pixel sum integration, a simple summation of peak pixel counts with a mean background subtraction.
The following three integration parameters are explained in detail in Definition of peak shape . Briefly, however, they are scaling factors that determine the size of the ellipsoids representing the peak and background region. The covariance matrix is scaled by a dimensionless \(\sigma^2\), such that an ellipsoid scaled by a “peak end” of \(\sigma\) contains 66.3% of points in the ellipsoid, 95.4% for \(2\sigma\) and 99.7% for \(3\sigma\). The ellipsoids (projected to ellipses on the detector scene) can be visualised via the “Show/hide” peaks widget.
Parameter |
Unit |
Description |
---|---|---|
Peak end |
\(\sigma\) |
End of peak region in multiples of the blob covariance matrix |
Background begin |
\(\sigma\) |
Beginning of background region in multiples of the blob covariance matrix |
Background end |
\(\sigma\) |
End of background region in multiples of the blob covariance matrix |
Compute gradient |
Whether to compute the image gradient |
|
FFT gradient |
Whether to use Fast Fourier Transform to compute gradient |
|
Gradient kernel |
Matrix kernel to use for gradient convolution |
Filter peaks¶
The filter peaks tab allows the user to remove peaks that meet certain criteria froma collection and save this subset as a new collection. The following controls cause the filter to catch that have:
- State
a specific (hidden) state flag set to “true”
Selected — unselected peaks are generally unfit for integration for some reason
Masked — a peak is masked if it has been manually highlighted on on the detector view
Predicted — the peak has been predicted as opposed to found via the peak search algorithm
Indexed — the peak has a unit cell assigned
- Indexed peak
been indexed (i.e. have a unit cell assigned)
- Strength
a strength (\(I/\sigma\)) in the specified range
- d range
a d value (Å) in the specified range
- Frame range
a frame value (i.e. image number) in the specified range
- Overlapping
Remove pairs of peaks for which the intensity region (“peak end”) overlaps an adjacent background region (“background end”). Set these to the same value to remove only overlapping intensity regions.
- Rejection reason
Remove all peaks other than those which the selected rejection reason.
- Sparse dataset
Remove peaks from data sets which contain too few peaks.
- Merged peak significance
Reject peaks which fail a chi squared test. If the probability of a peak having an intensity less than the chi squared of the intensities of the merged peaks of which it is a member is less than the expected variance, it is rejected.
- Extinct from spacegroup
Reject peaks that are forbidden by space group symmetry considerations. See Peak Table for a detailed list of options, with explanations.
Note that the peak table contains an extra column on this widget, caught by
filter
. This allows the user to sort peaks caught by the filter to the top of
the peak table with a single click.
Autoindexing¶
The unit cell is determined in this tab using the 1D Fourier transform method [W1], and peaks are assigned Miller indices. A unit cell is required for all subsequent sections of the workflow.
The algorithm works as follows. We are given some set of \(\mathbf{q}\) vectors which lie approximately on a lattice, yet to be determined. To find candidate lattice directions, we take a random sample of directions using the Fibonacci sphere algorithm. For each direction, we perform the orthogonal projection of each \(\mathbf{q}\) vector to the infinite line specified by the direction. We then take a finite number of bins along this line (the way the binning is performed can be controlled by user-defined parameters), and then take FFT of the resulting histogram. The histogram will be strongly periodic when the direction corresponds to a lattice direction, so we identify lattice vectors by taking the strongest Fourier modes of the histograms.
The FFT method produces a finite set of potential lattice vectors. To find a basis, we enumerate over triples of these basis vectors and rank them according to
The percentage of peaks that can be indexed (with integer indices)
The volume of the resulting unit cell
This provides a ranked list of candidate unit cells, from which the user may choose.
Parameter |
Unit |
Description |
---|---|---|
Image range |
frames |
Choose a limited (contiguous) subset of images over which to index |
Resolution (d) range |
Å |
Peaks with q-vector outside this range will not be used in indexing |
Strength range |
Peaks with strengths outside this range will not be used in indexing |
|
Gruber tolerance |
||
Niggli tolerance |
||
Find Niggli cell |
T/F |
Whether to find the Niggli primitive cell |
Max. cell dimension |
Å |
Maximum length of any cell vector |
Num. Q-space trial vectors |
Number of reciprocal space directions to search for lattice vector |
|
Num. FFT histogram bins |
Number of reciprocal space bins for Fourier transform |
|
Number of solutions |
Number of trial lattice vectors with which to construct triples |
|
Minimum volume |
\(Å^3\) |
Minimum unit cell volume |
Indexing tolerance |
Maximum difference between floating point \(hkl\) and integer \(hkl\) |
|
Frequency tolerance |
0.0 - 1.0 |
Minimum fraction of amplitude of zeroth Fourier frequency to accept as candidate lattice vector |
The FFT indexing method can be difficult to use correctly because there is no systematic method for reaching the correct solution, and there are many adjustable parameters. As a guide, the follwing tend to have a substantial effect on the success (or otherwise) of the procedure:
Number of peaks/number of frames: using too many peaks/frames tends to result in failure. This is obviously strongly dependent on the nature of the sample. For example, using the BioDiff detector, up to 10 frames, containing no more than 300 peaks seems to be sufficient to index complicated biological crystals.
Subdivisions: The process is strongly dependent on the number of FFT histogram bins.
Q Vertices: This is the parameter that is most easy to systematically vary, since more Q vectors will increase the likelihood of finding one that is parallel to the normal to a lattice plane. Increasing this value will usually (but not invariably) enhance the odds of finding a lattice vector.
Frequency Tol: the FFT algorithm will discard any candidate reciprocal lattice vector whose amplitude is less than this fraction of the zeroth Fourier frequency. Use with care!
The closest unit cell can then be selected as a row from the table of solutions
and assigned to a peak collection (usually the collection of found peaks. Note
that it is important to find the cell with the correct centering (Bravais type)
or the correct space group may not be visible in the list in the Assign unit
cell
dialogue box. This may require additional experimentation with the
parameters.
In practice, the position of the direct beam is the parameter that usually determines the success of this algorithm. In the first instance, OpenHKL will assume that the direct beam position is at the exact centre of the detector image, when it is in fact likely to be off by a few pixels, enough to prevent the algorithm from finding a solution. At this stage, we have no unit cell, so refinement is not an option, leaving the option of manually adjusting the direct beam position. This can be done by checking the “set initial direct beam position” box and dragging and dropping a crosshair in the detector scene. The “x offset” and “y offset” boxes show the offset in pixels from the centre of the image, and the “crosshair size” and “crosshair linewidth” controls offer a guide to the eye when determining the
An example of this procedure is shown above. The air scattering halo in this instance can be used to give a better estimate of the direct beam position, which is off by 2-3 pixels in each direction. This small adjustment is enough to successfuly find the correct unit cell, orientation and Bravais lattice with the default autoindexing parameters.
Shape model¶
The details of the shape model are explained in Definition of peak shape, but for the purposes of this section it is enough to know that each peak is modeled as an ellipsoid extending over several frames (specifically over a finite sample rotation angle). The shape model is intended to define the shape of peaks which do not have strong intensity regions on the detector image, and whose shape (covariance matrix) is unknown, even though the position of the centre of the peak is known. A shape model is constructed by adding the shapes of strong peaks from a peak collection to a “shape model”; this model can be used to predict the shape of the peak with its centre at given coordinates by taking the mean of the covariance matrix of the neighbouring peaks, within a cutoff.
The first set of parameters determines the shape model, and includes,
The size and shape of the histogram on which to construct the mean profile
The number of subdivisions per pixel to use when binning
The coordinate systems (Kabsch or detector)
The parameters used by the Kabsch coordinate system
Parameters to filter unwanted peaks from the model
Integration parameters for the shape model
The binning scheme for constructing the shape model is described in
Least squares integration. Once the parameters are set, the shape model is
constructed by clicking Build shape model
. The shape model is used later,
in assigning shapes to predicted peaks and profile integration.
Parameter |
Unit |
Description |
---|---|---|
Histogram bins x |
Number of bins to sample peak pixels in detector x direction |
|
Histogram bins y |
Number of bins to sample peak pixels in detector y direction |
|
Histogram bins frames |
Number of bins to sample peak pixels in detector frame (rotation) direction |
|
Subdivisions |
Number of sampling subdivisions along each axis, per pixel |
|
Kabsch coordinates |
T/F |
Use Kabsch coordinate system to undo effects of detector geometry on profiles |
Beam divergence \(\sigma\) |
Peak variance due to beam divergence in Kabsch model (\(\sigma_D\)) |
|
Mosaicity \(\sigma\) |
Peak variance due to crystal mosaicity in Kabsch model (\(\sigma_M\)) |
|
Minimum \(I/\sigma\) |
Minimum strength of peak to use in shape model |
|
Resolution (d) range |
Å |
Only include peaks in this resolution range in the model |
Integration region type |
Switch between variable and fixed-size integration regions |
|
Show single integration region |
Display integration region of single clicked peak on detector image |
|
Peak end |
\(\sigma\) |
End of peak region in multiples of the blob covariance matrix |
Background begin |
\(\sigma\) |
Beginning of background region in multiples of the blob covariance matrix |
Background end |
\(\sigma\) |
End of background region in multiples of the blob covariance matrix |
The second set of parameters controls the preview images generated in the “Shape preview” panel. These include the coordinates of the chosen peak (these can also be set by clicking on a peak in the detector image), the minimum number of neighbouring strong peaks in the given radius required to construct a sensible shape, and two radii for neighbour searches, in the plane of the detector image (in pixels) and perpendicular to the detector image (in frames). The weighting scheme determines the weights used in averaging neighbouring strong peaks to construct a profile: this can be set to “none” (a weight of 1), “inverse distance” (peaks further from the reference peak have a smaller contribution) and “intensity” (weaker peaks have a smaller contribution).
Parameter |
Unit |
Description |
---|---|---|
x coordinate |
pixels |
x-coordinate of target peak to visualise |
y coordinate |
pixels |
y-coordinate of target peak to visualise |
frame coordinate |
image number |
image number of target peak to visualise |
Minimum neighbors |
Minimum number of neighbouring profile to construct a profile/shape |
|
Search radius (pixels) |
pixels |
Pixel radius in image to search for neighbouring profiles |
Search radius (images) |
image number |
Image radius in data set to search for neighbouring profiles |
Interpolation Type |
Weighting scheme to use when averaging profiles |
A preview shape can be constructed either by clicking on a peak in the detector
image, or entering the coordinates of the peak and clicking Calculate profile
.
Either way, a shape model must have been built beforehand. The preview panel
shows two peaks side by side: on the left the reference peak as it appears on
the detector image, and on the right, the mean profile as computed by the shape
model. The selected peak is highlighted with a red box. This is the shape that
will be either assigned to a predicted peak collection (by clicking Apply shape
model
if such a peak collection exists), or used in profile integration.
An example of a shape generated from a model is shown above: clicking on a peak from the selected predicted peak collection (“target peak collection”) displays the integration region for the shape int he Preview widget, and plots
The beam divergence and mosaicity variances are estimated as in the section on Rotating the beam profile. The beeam divergence variance \(\sigma_D\) affects the spread of the detector spot in the plane of the detector image, and the mosaicity variance \(\sigma_M\) affects the spread in the direction of the frames (i.e. the sample rotation axis). These parameters can be adjusted to control the extent of the detector spots if it seems that the model is not representative of the detector images. Physically, \(\sigma_M\) will change the number of spots on an image since with a higher value they will extend onto more frames, and a higher \(\sigma_D\) will increase the size of the integration regions.
Predict peaks¶
Given the unit cell, an exhaustive set of Miller indexed reflections can be generated within the specified resolution (d) range, with space group-forbidden reflections rejected (marked in red).
A complete set of Miller index \((hkl)\) triples is generated withing a given resolution range, then for each triple, a reciprocal space vector \(\mathbf{q}\) is computed by multiplying the \((hkl)\) vector by the reciprocal basis. For each \(\mathbf{q}\), the rotation angle at which it intersects the Ewald sphere is located using a bisection algorithm (essentially finding the non-integer frame coordinate at which the sign of \(\mathbf{k}_f - \mathbf{k}_i\) changes, bearing in mind that this can happen more than once over the rotation range.
The position of the direct beam is of crucial importance at this stage. If it is off by a few pixels, the predicted peak positions may be off-centre to an extent that can’t be corrected by least squares refinement. If the direct beam position was set in the autoindexing step, this should not be necessary, but can also be one at this stage.
Parameter |
Unit |
Description |
---|---|---|
Set initial direct beam position |
T/F |
Add a draggable crosshair to the detector image to adjust direct beam position |
x offset |
pixels |
Offset of the direct beam relative to the image centre, x direction |
y offset |
pixels |
Offset of the direct beam relative to the image centre, y direction |
Crosshair size |
pixels |
Radius of the crosshair |
Moreover, now that the approximate unit cell is known, the beam position can be adjusted by refinement, as discussed in Refine.
Parameter |
Unit |
Description |
---|---|---|
Found peaks |
Peaks from image analysis step |
|
Number of batches |
Split peaks into this many batches, sorted by rotation angle (i.e. image number) |
|
Maximum iterations |
Maximum number of steps for least squares refinement |
|
Show direct beam |
Add a black circle to the detector image indicating the direct beam position |
Peak prediction requires only a unit cell and a resolution range over which to limit the predictions.
Parameter |
Unit |
Description |
---|---|---|
Unit cell |
unit cell used to predict peaks |
|
Maximum resolution (min. d) |
Å |
Upper resolution limit for predicted peaks |
Minimum resolution (max. d) |
Å |
Lower resolution limit for predicted peaks |
At this point, the predicted peaks (detector spots) have a position, but no shape. A saved shape model (generated in Shape model) can be applied to the predicted peaks.
For the purposes of refinement, it is extremely important to assign a shape model to the predicted peak collection. Each peak can be considered to be an ellipsoid in real space (see Definition of peak shape), and the detector spots are ellipses where the ellipsoid intersects the detector image. In general , the principle axes of ellipsoid will not coincide with the plane of the detector image, and as a result the ellipse for a single peak will generally have differenct centre coordiinates on each frame on which it appears (this results in the “precession” of the spot across the detector if one scrolls through the images). If we do not have a good initial guess for the shape of the ellipsoid before refinement, then it will be impossible for the refiner to improve the positions of the detector spots across all frames. This can be seen by comparing the integration regions of a predicted peak before and after the shape model is assigned.
If a shape is not assigned, the predicted peak retains its default shape (spherical), which will be grossly inaccurate. Note that the above window can be opened by double clicking on a peak in the detector image.
Refine¶
In this tab, nonlinear least-squares minimisation is used to find the unit cell and instrument states that best fit the given peak collection. The instrument states optimised are the detector position offset, the sample position offset, the sample orientation offset and the incident wavevector.
Since detector images are generated over a period of time as well as over an angular range, the conditions of the experiment may have changed between the first frame and the last, for example, the temperature, which would affect the unit cell. As such the peaks are refined in batches, each encompassing a few frames in a limited subset of the angular range of the experiment. For example, if we specify 10 batches for an experiment with 100 frames (detector images), we will get 10 batches of equal numbers of peaks in partially overlapping but distinct angular ranges.
The change in each of these quantities can be plotted as a function of frame (or equivalently angle) in the bottom panel. The per-frame values for the unit cell and each instrument state before and after refinement are visible in the tables.
The refinement uses the non-linear least squares minimisation routines from the
Gnu scientific library (GSL). The free parameters as determined by the checkboxes
under parameters to refine
are varied such that the sum of residuals is
minimised. These residuals can be computed in two ways, and can be changed using
the residual type
combo:
Real space — the residual is computed as the difference in real space (i.e. detector coordinates) between the integer Miller indices and floating point Miller indices.
Reciprocal space — the residual is computed as the difference in reciprocal space between the integer Miller indices and floating point Miller indices.
These are described in [W2].
Parameter |
Unit |
Description |
---|---|---|
Use refined cell |
T/F |
Use per-batch unit cells from previous refinement |
Number of batches |
Split peaks into this many batches, sorted by rotation angle (i.e. image number) |
|
Maximum iterations |
Maximum number of steps for least squares refinement |
|
Residual type |
Reciprocal or real space residuals |
|
Cell vectors |
T/F |
Refine unit cell vectors |
Sample position |
T/F |
Refine sample position offset |
Sample orientation |
T/F |
Refine sample orientation matrix |
Detector position |
T/F |
Refine detector position offset |
Incident wavevector |
T/F |
Refine direct beam position |
After refinement, clicking Update
in the Update predictions
panel will
update the peak centre coordiates that changed as a result of unit cell and
instruement state refinement. The change in peak centre coordinates after
refinement is usually significant, as shown in the example below (pre-refinement
positions are shown in dark green, post-refinement positions in light green).
Note that floating point Miller indices are generated from the “found” peaks, the peaks derived from image processing. The predicted peaks by definition have integer Miller indices, and are purely a function of the unit cell and instrument states. Thus the peak collection undergoing refinement will always be a “found” collection.
Under the tables
tab, the values of each free variable is shown before (left)
and after (right) refinement. By switching to the detector
tab, the change in
the peak centres before and after refinement can be visualised.
Integrate peaks¶
In this section, the peaks, usually a set of predicted peaks, are integrated to compute their intensities and variances (sigmas). Integrating a predicted peak collection using the basic pixel sum integrator is somewhat flawed because many (indeed, most) of the predicted peaks will have intensities that are difficult to distinguish from the background, and simply summing the pixels and subtracting the background will give a basic estimate. Profile integration can improve on this; here we use “profile” as a catch-all term to encompass all integrators implemented in OpenHKL that are not the pixel sum integrator. These integrators will usually improve the integration results, with a judicious parameter choice.
Note that only the parameters Peak end
, Background begin
and
Background end
apply to pixel sum integration; the rest are specific to
profile fitting integration.
Parameter |
Unit |
Description |
---|---|---|
Integration region type |
Switch between variable and fixed-size integration regions |
|
Peak end |
\(\sigma\)/pixels (see below) |
End of peak region in multiples of the blob covariance matrix |
Background begin |
\(\sigma\)/factor (see below) |
Beginning of background region in multiples of the blob covariance matrix |
Background end |
\(\sigma\)/factor (see below) |
End of background region in multiples of the blob covariance matrix |
Integrator |
Select from Pixel sum or profile integrators |
|
Fit peak center |
T/F |
Adjust peak centre coordinates during integration |
Fit peak covariance |
T/F |
Adjust peak covariance matrix during integration |
Remove overlaps |
T/F |
Reject peaks with overlapping peak areas |
Remove masked peaks |
T/F |
Remove peaks intersecting detector image masks |
Compute gradient |
T/F |
Compute the image gradient (Pixel sum only) |
Gradient kernel |
Convolution kernel to use when computing image gradient |
|
FFT gradient |
T/F |
Use Fast Fourier Transform to compute the image gradient |
Discard saturated |
T/F |
Discard peaks containing saturated pixels |
Maximum count |
counts |
Count threshold for discarding saturated pixels |
Maximum strength for profile integration |
T/F |
Only profile integrate strong peaks |
Maximum strength |
\(I/\sigma\) |
Strength threshold defining a weak peak, to be profile integrated |
Search radius (pixels) |
pixels |
Pixel radius in image to search for neighbouring profiles |
Search radius (images) |
image number |
Image radius in data set to search for neighbouring profiles |
Minimum neighbors |
Minimum number of neighbouring profile to construct a profile/shape |
|
Interpolation Type |
Weighting scheme to use when averaging profiles |
|
Shape model |
Shape model to use for profile integration |
The integration region type can be switched between a variable integration region and a fixed integration region. For the former, the covariance matrix of the peak, \(\sigma\) is the starting point. \(sigma\) is scaled by a factor to define the integration region bounds; for example, the default “peak end” value, i.e. the end of the peak region occurs at \(3\sigma\), meaning the covariance matrix is scaled by a factor of three, and thus according to Gaussian statistics, contains 99.5% of counts in the peak. The background begin and background end scaling factors determine the beginning and end of the background region in a similar way. The construction of the integration region is described in Definition of peak shape. When the fixed ellipsoid integration region is selected, the definiteions of these parameters changes. Peak end is now in units of pixels, and determines the size of the peak region ellipsoid. If a value of r is given, the ellipsoid is scaled to have a volume equal to a sphere of radius r. The background begin and background end are now simple scaling factors for the covariance matrix, with a value of 1 corresponding exactly to the peak end limit.
The Fit center
and Fit covariance
options apply only to pixel sum
integration, and set the peak centre coordinates and covariance matrix to be
those of the blob of pixels (notioinally an ellipsoid) found during integration,
rather than the ellipsoid specified as the peak shape.
The remove overlaps
checkbox will remove any instances of the peak
(intensity) region of a peak intersecting with an adjacent peak region, since
this will obviously result in inaccurate integrated intensities for both. Note
that peak pixels are automatically removed from local background calculations,
so background calculations are not ruined by intruding peak intensity regions.
It is also possible to prevent overlaps by modifying the integration region
parameters “peak end”, “background begin” and “background end”. These
respectively affect the scaling of the peak region, the start of the background
region and the end of the background region respectively. The remove masked
checkbox ensures that any peaks intersecting a masked region of the detector
image will be rejected.
The Compute gradient
checkbox enables computation of the background
gradient, and is only available during Pixel sum integration. The selected
kernel is convoluted with the image, resulting in a gradient in the x/y
direction, i.e. only in the image plane. This can be done in real space, or more
efficiently in reciprocal space using the FFT option. The background gradient
can be used as a rejection criterion later in the workflow.
A 16 bit detector image (for exmple) can hold a maximum of 65535 counts per pixel, so if the detector image is overexposed, pixels will overflow and be saturated. “saturated” pixels. Such pixels will result in incorrect integrated intensities, so in cases where an accurate integration is required, peaks containing such pixels should be rejected.
The remaining options apply only to profile integration.
The pixel sum integrator will attempt to integrate all peaks in a collection,
but profile integrators will only generally be used to integrate weak peaks with
a low signal to noise ratio. This is because (image resolution and finite sample
rotation angle notwithstanding), pixel sum integration is more reliable for
strong peaks than profile integration, because weighting pixels using a mean
profile will only degrade the quality of a well-defined peak. Therefore, when
profile integrating, it is generally advisable to only integration weak peaks by
setting a strength threshold. This can be achieved by checking the Maximum
strength for profile integration
and setting the maximum strength threshold
appropriately. Given the correct baseline and gain for the instrument, this
should be of the order of one.
The Search radius
controls determine the search radius for neighbouring
profiles (i.e. strong peaks) to use to construct a mean profile for the given
peak. A minimum number of peaks to construct a mean profile can also be
specified. The Peak interpolation
combo sets the type of interpolation to
use when computing the shape of a peak. A predicted peak is given a shape that
is the mean of all found peaks in a given radius of pixels on the detector
image and rotation increments (i.e. frames). When computing the mean, the
neighbouring peak contributes with a weight determined by the chosen peak
interpolation method. For none
, all peaks are given a weight of 1.0. For
inverse distance
, the neighbouring peak is given a weight of the inverse of
the distance from the reference peak in reciprocal space, i.e. peaks that are
further away in reciprocal space have a lower weight. For intensity
, the
neighbouring peak is weighted by its intensity divided by its variance, i.e.
weaker peaks have a lower weight.
Merge peaks¶
This section displays the results of the data reduction process: a set of
indexed and integrated peaks, with statistics to determine whether the process
yielded a sensible result. The quality statistics are visible in the D-shell
statistics
tab, and all peaks in their merged and unmerged representations in
their respective tabs.
The interface makes it possible to merge two peak collections, although only one
is normally used. By selecting a peak collection in peak collection 1
, any
symmetry-related peaks are merged into one; the number of peaks merged is the
“redundancy”. The R-factor CC quality metrics are meant to sanity-check the
data, which are available to save in a merged or unmerged representation.
D-shell statistics tab¶
The data quality metrics described in Measures of Data Quality are computed under the “Merger” tab, and tabulated as a function of resolution shell (including a row for the whole resolution range). These measures can be plotted as a function of resolution in the panel at the bottom.
The sphere in q-space defined by d range
is divided into a number of
concentric resolution shells of equal reciprocal volume, determined by number
of d-shells
. For each shell and the overall volume, R-factors and CC values
are calculated, allowing the user to determine the maximum resolution (if any)
to which the data set is reliable. The merge is controlled by the following
parameters.
Parameter |
Unit |
Description |
---|---|---|
Resolution (d) range |
Å |
Limit merged peaks to this resolution range |
Image range |
Limit merged peaks to this range of images |
|
Num. resolution shells |
Number of resolutions shells into which to divide reciprocal space |
|
Space group |
Space group of the unit cell |
|
Include friedel |
T/F |
Include the Friedel relation if not part of the space group |
Plot y axis |
Select statistic to plot on the graph, as a function of resolution shell |
Not that it is possible for the user to only merge peaks in a specific frame range; the rationale for this is that it may be better to ignore peaks on the first and last frames, for which it is impossible to interpolate the frame coordinate.
The tabulated statistics are comprised of the following fields:
Abbreviation |
Description |
---|---|
dmax |
Maximum value of d for this resolution shell |
dmin |
Minimum value of d for this resolution shell |
nobs |
Number of observed peaks in this shell |
nmerge |
Number of merged (symmetry-unique) peaks in this shell |
redundancy |
Average peak redundancy (nobs/nmerge) |
Rmeas |
|
Rmeas (est.) |
|
Rmerge/Rsym |
|
Rmerge/Rsym (est.) |
|
Rpim |
|
Rpim (est.) |
|
CChalf |
|
CC* |
|
Completeness |
Number of valid peaks / theoretical maximum number of peaks |
A high quality data set will have R-factors close to zero, CC values close to one and a completeness close to 100%.
Merged representation tab¶
A list of merged peaks is displayed in this section.
Abbreviation |
Description |
---|---|
h |
h Miller index |
k* |
k Miller index |
l |
l Miller index |
I |
Mean integrated intensity of unmerged peaks |
\(\sigma\) |
Variance of integrated intensity of unmerged peaks |
nobs |
Redundancy of this peaks (number of symmetry equivalents observed) |
\(\chi^2\) |
Chi-squared of intensity |
p |
Probability that inensity takes a value less than the chi-squared |
The merged peaks can be saved to CCP4 (.mtz), ShelX, FullProf or Phenix format. The Phenix
format is fixed width, and some instruments such as BioDiff have a
photomultiplier, meaning that one count on the detector corresponds not to one
neutron, but some factor greater than one. This can cause the intensities to
become too large for the column, and make them unreadable by Phenix. The
intensity scale factor
control allows the user to post-multiply the
intensity and its associated variance by some factor such that the columns no
longer overlap.
Unmerged representation tab¶
A list of unmerged peaks is displayed in this section.
Abbreviation |
Description |
---|---|
h |
h Miller index |
k* |
k Miller index |
l |
l Miller index |
I |
Integrated intensity |
\(\sigma\) |
Variance of integrated intensity of unmerged peaks |
x |
x coordinate of unmerged peak (pixels) |
y |
y coordinate of unmerged peak (pixels) |
frame |
frame coordinate of unmerged peak |
The unmerged peaks can be saved to CCP4 (.mtz), ShelX, FullProf or Phenix format. The Phenix
format is fixed width, andsome instruments such as BioDiff have a
photomultiplier, meaning that one count on the detector corresponds not to one
neutron, but some factor greater than one. This can cause the intensities to
become too large for the column, and make them unreadable by Phenix. The
intensity scale factor
control allows the user to post-multiply the
intensity by some factor such that the columns no longer overlap.
I. Steller, R. Boltovsky, and M. G. Rossman. An algorithm for automatic indexing of oscillation images using fourier analysis. Journal of Applied Crystallography, 30:1036–1040, 1997. doi:10.1107/S0021889897008777.
A. G. W. Leslie. The integration of macromolecular diffraction data. Acta Crystallographica Section D Biological Crystallography, 62:48–57, 2005. doi:10.1107/s0907444905039107.
Go to top.