hardklor

Usage:

crux hardklor [options] <spectra>

Description:

Hardklör analyzes high-resolution mass spectra, identifying protein or peptide isotope distributions and determining the corresponding monoisotopic masses and charge states. The algorithm aims to identify persistence peptide isotope distribution (PPIDs), i.e., isotope distributions that recur over multiple scans. Hardklör is specifically designed to handle overlapping isotope distributions in a single spectrum. A detailed description of the Hardklör algorithm is given in

Hoopmann MR, Finney GL and MacCoss MJ. "High speed data reduction, feature selection, and MS/MS spectrum quality assessment of shotgun proteomics datasets using high resolution mass spectrometry." Analytical Chemistry. 79:5630-5632 (2007).

Input:

spectra – The name of a file from which to parse high-resolution spectra. The file may be in MS1 (.ms1), binary MS1 (.bms1), compressed MS1 (.cms1), or mzXML (.mzXML) format.

Output:

The program writes files to the folder crux-output by default. The name of the output folder can be set by the user using the --output-dir option. The following files will be created:

hardklor.mono.txt – a tab-delimited text file containing one line for each isotope distribution. The columns appear in the following order:
1. scan: The scan number assigned to this spectrum in the input file.
2. retention time: The time (in seconds) at which the spectrum was collected.
3. mass: The uncharged monoisotopic mass of the protein or peptide.
4. charge: The inferred charge state of the protein or peptide.
5. intensity: The intensity of the base isotope peak of the model used to predict the protein or peptide.
6. m/z: The m/z of the base peak.
7. s/n: The signal-to-noise threshold, i.e., the relative abundance a peak must exceed in the spectrum window to be considered in the scoring algorithm. Note that this is a local noise threshold for the area of the spectrum that the peptide was identified in.
8. modifications: Deviations to the averagine model. Only modifications specified by the user are considered. If no modifications are found in a particular PPID, then the column is marked with an underscore.
9. dotp: The dot product score applies to all predictions in a given spectrum window. Thus, if two protein or peptide predictions share the same spectrum window, then they have a single dot product score that is the score of their combined peaks.
hardklor.params.txt – a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.
hardklor.log.txt – a log file containing a copy of all messages that were printed to stderr.

Options:

hardklor options
- --hardklor-algorithm basic|version1|version2 – Determines which spectral feature detection algorithm to use. Different results are possible with each algorithm, and there are pros and cons to each. There are three algorithms to choose from:
  - basic – Performs unoptimized deconvolution and is provided for legacy purposes only.
  - version1 – Uses the optimizations developed during the 1.0+ series. It is very accurate, but has limited sensitivity, and moderate speed improvements.
  - version2 – Uses the optimizations developed for version 2.0+. It is highly sensitive, but less accurate for very low abundance features, and performs exceptionally fast.
  Default = version1.
- --averagine-mod <string> – Defines alternative averagine models in the analysis that incorporate additional atoms and/or isotopic enrichments. Modifications are represented as text strings. Inclusion of additional atoms in the model is done using by entering an atomic formula, such as: PO2 or Cl. Inclusion of isotopic enrichment to the model is done by specifying the percent enrichment (as a decimal) followed by the atom being enriched and an index of the isotope. For example, 0.75H1 specifies 75% enrichment of the first heavy isotope of hydrogen. In other words, 75% deuterium enrichment. Two or more modifications can be combined into the same model, and separated by spaces: B2 0.5B1 Default = <empty>.
- --boxcar-averaging <integer> – Boxcar averaging is a sliding window that averages n adjacent spectra prior to feature detection. Averaging generally improves the signal-to-noise ratio of features in the spectra, as well as improving the shape of isotopic envelopes. However, averaging will also change the observed peak intensities. Averaging with too wide a window will increase the occurrence of overlapping features and broaden the chromatographic profiles of observed features. The number specified is the total adjacent scans to be combined, centered on the scan being analyzed. Therefore, an odd number is recommended to center the boxcar window. For example, a value of 3 would produce an average of the scan of interest, plus one scan on each side. A value of 0 disables boxcar averaging. Default = 0.
- --boxcar-filter <integer> – This parameter is only functional when boxcar-averaging is used. The filter will remove any peaks not seen in n scans in the boxcar window. The effect is to reduce peak accumulation due to noise and reduce chromatographic broadening of peaks. Caution should be used as over-filtering can occur. The suggested number of scans to set for filtering should be equal to or less than the boxcar-averaging window size. A value of 0 disables filtering. Default = 0.
- --boxcar-filter-ppm <float> – This parameter is only functional when boxcar-filter is used. The value specifies the mass tolerance in ppm for declaring a peak the same prior to filtering across all scans in the boxcar window. Default = 10.
- --centroided T|F – Indicates whether the data contain profile or centroided peaks. Default = false.
- --cdm B|F|P|Q|S – Choose the charge state determination method. There are five methods to choose from:
  - B – Basic method, assume all charge states are possible.
  - F – Fast Fourier transform.
  - P – Patterson algorithm.
  - Q – QuickCharge method, uses inverse peak distances.
  - S – Senko method, or combined Fast Fourier Transform and Patterson algorithm.
  Default = Q.
- --min-charge <integer> – Specifies the minimum charge state to allow when finding spectral features. It is best to set this value to the lowest assumed charge state to be present. If set higher than actual charge states that are present, those features will not be identified or incorrectly assigned a different charge state and mass. Default = 1.
- --max-charge <integer> – Specifies the maximum charge state to allow when finding spectral features. It is best to set this value to a practical number (i.e. do not set it to 20 when doing a tryptic shotgun analysis). If set higher than actual charge states that are present, the algorithm will perform significantly slower without any improvement in results. Default = 5.
- --corr <float> – Sets the correlation threshold (cosine similarity) for accepting each predicted feature. Default = 0.85.
- --depth <integer> – Sets the depth of combinatorial analysis. For a given set of peaks in a spectrum, search for up to this number of combined peptides that explain the observed peaks. The analysis stops before depth is reached if the current number of deconvolved features explains the observed peaks with a correlation score above the threshold defined with the correlation parameter. Default = 3.
- --distribution-area T|F – When reporting each feature, report abundance as the sum of all isotope peaks. The value reported is the estimate of the correct peak heights based on the averagine model scaled to the observed peak heights. Default = false.
- --hardklor-data-file <string> – Specifies an ASCII text file that defines symbols for the periodic table. Default = <empty>.
- --instrument fticr|orbitrap|tof|qit – Indicates the type of instrument used to collect data. This parameter, combined with the resolution parameter, define how spectra will be centroided (if you provide profile spectra) and the accuracy when aligning observed peaks to the models. Default = fticr.
- --isotope-data-file <string> – Specifies an ASCII text file that can be read to override the natural isotope abundances for all elements. Default = <empty>.
- --max-features <integer> – Specifies the maximum number of models to build for a set of peaks being analyzed. Regardless of the setting, the number of models will never exceed the number of peaks in the current set. However, as many of the low abundance peaks are noise or tail ends of distributions, defining models for them is detrimental to the analysis. Default = 10.
- --mzxml-filter <integer> – Filters the spectra prior to analysis for the requested MS/MS level. For example, if the data contain MS and MS/MS spectra, setting mzxml-filter = 1 will analyze only the MS scan events. Setting mzxml-filter = 2 will analyze only the MS/MS scan events. Default = 1.
- --mz-max <float> – Constrains the search in each spectrum to signals below this value in Thomsons. Setting to 0 disables this feature. Default = 0.
- --mz-min <float> – Constrains the search in each spectrum to signals above this value in Thomsons. Setting to 0 disables this feature. Default = 0.
- --mz-window <float> – Only used when algorithm = version1. Defines the maximum window size in Thomsons to analyze when deconvolving peaks in a spectrum into features. Default = 4.
- --resolution <float> – Specifies the resolution of the instrument at 400 m/z for the data being analyzed. Default = 100000.
- --scan-range-max <integer> – Used to restrict analysis to spectra with scan numbers below this parameter value. A value of 0 disables this feature. Default = 0.
- --scan-range-min <integer> – Used to restrict analysis to spectra with scan numbers above this parameter value. A value of 0 disables this feature. Default = 0.
- --sensitivity <integer> – Set the sensitivity level. There are four levels: 0 (low), 1 (moderate), 2 (high), and 3 (max). Increasing the sensitivity will increase computation time, but will also yield more isotope distributions. Default = 2.
- --signal-to-noise <float> – Filters spectra to remove peaks below this signal-to-noise ratio prior to finding features. Default = 1.
- --smooth <integer> – Uses Savitzky-Golay smoothing on profile peak data prior to centroiding the spectra. This parameter is recommended for low resolution spectra only. Smoothing data causes peak depression and broadening. Only use odd numbers for the degree of smoothing (as it defines a window centered on each data point). Higher values will produce smoother peaks, but with greater depression and broadening. Setting this parameter to 0 disables smoothing. Default = 0.
- --sn-window <float> – Set the signal-to-noise window length (in m/z). Because noise may be non-uniform across a spectrum, this value adjusts the segment size considered when calculating a signal-over-noise ratio. Default = 250.
- --static-sn T|F – Applies the lowest noise threshold of any sn_window across the entire mass range for a spectrum. Setting this parameter to 0 turns off this feature, and different noise thresholds will be used for each local mass window in a spectrum. Default = true.
Input and output
- --fileroot <string> – The fileroot string will be added as a prefix to all output file names. Default = <empty>.
- --output-dir <string> – The name of the directory where output files will be created. Default = crux-output.
- --overwrite T|F – Replace existing files if true or fail when trying to overwrite a file if false. Default = false.
- --parameter-file <string> – A file containing parameters. See the parameter documentation page for details. Default = <empty>.
- --verbosity <integer> – Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.