tide-search
Usage:
crux tide-search [options] <tide spectra file> <tide database>
Description:
Tide is a tool for identifying peptides from tandem mass spectra. It is an independent reimplementation of the SEQUEST® algorithm, which assigns peptides to spectra by comparing the observed spectra to a catalog of theoretical spectra derived from a database of known proteins. Tide's primary advantage is its speed. Our published paper provides more detail on how Tide works. If you use Tide in your research, please cite:
Benjamin J. Diament and William Stafford Noble. "Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra". Journal of Proteome Research. 10(9):3871-9, 2011.
When tide-search
runs, it performs several intermediate steps, as follows:
- If a FASTA file was provided, convert it to an index using
tide-index
. - Convert the given fragmentation spectra to a binary format.
- Search the spectra against the database and store the results in binary format.
- Convert the results to one or more requested output formats.
By default, the intermediate binary files are stored in the output directory and deleted when Tide finishes execution. If you plan to search against given database more than once or search a given set of spectra more than once, then you can direct Tide to save the binary spectrum files using the --store-index
and --store-spectra
options. Subsequent runs of the program will go faster if provided with inputs in binary format.
Input:
tide spectra file
– The name of one or more files from which to parse the fragmentation spectra, in any of the file formats supported by ProteoWizard. Alternatively, the argument may be one or more binary spectrum files produced by a previous run of crux tide-search using the store-spectra parameter. Multiple files can be included on the command line (space delimited), prior to the name of the database.tide database
– Either a FASTA file or a directory containing a database index created by a previous run of crux tide-index.
Output:
The program writes files to the folder crux-output
by default. The name of the output folder can be set by the user using the --output-dir
option. The following files will be created:
tide-search.target.txt
– a tab-delimited text file containing the target PSMs. See txt file format for a list of the fields.tide-search.decoy.txt
– a tab-delimited text file containing the decoy PSMs. This file will only be created if the index was created with decoys.tide-search.params.txt
– a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other Crux programs.tide-search.log.txt
– a log file containing a copy of all messages that were printed to the screen during execution.
Options:
-
tide-search options
--use-tailor-calibration T|F
– Fast, but heuristic PSM score calibration as described in this article. Default =false
.
-
Amino acid modifications
--mod-precision <integer>
– Set the precision for modifications as written to .txt files. Default =4
.
-
Precursor selection
--auto-precursor-window false|warn|fail
– Automatically estimate optimal value for the precursor-window parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default =false
.--max-precursor-charge <integer>
– The maximum charge state of a spectra to consider in search. Default =5
.--min-precursor-charge <integer>
– The minimum charge state of a spectra to consider in search. Default =1
.--precursor-window <float>
– Tolerance used for matching peptides to spectra. Peptides must be within +/- 'precursor-window' of the spectrum value. The precursor window units depend upon precursor-window-type. Default =50
.--precursor-window-type mass|mz|ppm
– Specify the units for the window that is used to select peptides around the precursor mass location (mass, mz, ppm). The magnitude of the window is defined by the precursor-window option, and candidate peptides must fall within this window. For the mass window-type, the spectrum precursor m+h value is converted to mass, and the window is defined as that mass +/- precursor-window. If the m+h value is not available, then the mass is calculated from the precursor m/z and provided charge. The peptide mass is computed as the sum of the monoisotopic amino acid masses plus 18 Da for the terminal OH group. The mz window-type calculates the window as spectrum precursor m/z +/- precursor-window and then converts the resulting m/z range to the peptide mass range using the precursor charge. For the parts-per-million (ppm) window-type, the spectrum mass is calculated as in the mass type. The lower bound of the mass window is then defined as the spectrum mass * (1.0 + (precursor-window / 1000000)) and the upper bound is defined as spectrum mass * (1.0 - (precursor-window / 1000000)). Default =ppm
.
-
Search parameters
--auto-mz-bin-width false|warn|fail
– Automatically estimate optimal value for the mz-bin-width parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default =false
.--compute-sp T|F
– Compute the preliminary score Sp for all candidate peptides. Report this score in the output, along with the corresponding rank, the number of matched ions and the total number of ions. This option is recommended if results are to be analyzed by Percolator.If sqt-output is enabled, then compute-sp is automatically enabled and cannot be overridden. Note that the Sp computation requires re-processing each observed spectrum, so turning on this switch involves significant computational overhead. Default =false
.--deisotope <float>
– Perform a simple deisotoping operation across each MS2 spectrum. For each peak in an MS2 spectrum, consider lower m/z peaks. If the current peak occurs where an expected peak would lie for any charge state less than the charge state of the precursor, within mass tolerance, and if the current peak is of lower abundance, then the peak is removed. The value of this parameter is the mass tolerance, in units of parts-per-million. If set to 0, no deisotoping is performed. Default =0
.--exact-p-value T|F
– Enable the calculation of exact p-values for the XCorr score as described in this article. Calculation of p-values increases the running time but increases the number of identifications at a fixed confidence threshold. The p-values will be reported in a new column with header "exact p-value", and the "xcorr score" column will be replaced with a "refactored xcorr" column. Note that, currently, p-values can only be computed when the mz-bin-width parameter is set to its default value. Variable and static mods are allowed on non-terminal residues in conjunction with p-value computation, but currently only static mods are allowed on the N-terminus, and no mods on the C-terminus. Default =false
.--isotope-error <string>
– List of positive, non-zero integers. Default =<empty>
.--min-peaks <integer>
– The minimum number of peaks a spectrum must have for it to be searched. Default =20
.--mz-bin-offset <float>
– In the discretization of the m/z axes of the observed and theoretical spectra, this parameter specifies the location of the left edge of the first bin, relative to mass = 0 (i.e., mz-bin-offset = 0.xx means the left edge of the first bin will be located at +0.xx Da). Default =0.4
.--mz-bin-width <float>
– Before calculation of the XCorr score, the m/z axes of the observed and theoretical spectra are discretized. This parameter specifies the size of each bin. The exact formula for computing the discretized m/z value is floor((x/mz-bin-width) + 1.0 - mz-bin-offset), where x is the observed m/z value. For low resolution ion trap ms/ms data 1.0005079 and for high resolution ms/ms 0.02 is recommended. Default =0.02
.--peptide-centric-search T|F
– Carries out a peptide-centric search. For each peptide the top-scoring spectra are reported, in contrast to the standard spectrum-centric search where the top-scoring peptides are reported. Note that in this case the "xcorr rank" column will contain the rank of the given spectrum with respect to the given candidate peptide, rather than vice versa (which is the default). Default =false
.--score-function xcorr|residue-evidence|both
– Function used for scoring PSMs. 'xcorr' is the original scoring function used by SEQUEST; 'residue-evidence' is designed to score high-resolution MS2 spectra; and 'both' calculates both scores. The latter requires that exact-p-value=T. Default =xcorr
.--fragment-tolerance <float>
– Mass tolerance (in Da) for scoring pairs of peaks when creating the residue evidence matrix. This parameter only makes sense when score-function is 'residue-evidence' or 'both'. Default =0.02
.--evidence-granularity <integer>
– This parameter controls the granularity of the entries in the dynamic programming matrix used in residue-evidence scoring.Smaller values make the program run faster but give less accurate p-values; larger values make the program run more slowly but give more accurate p-values. Default =25
.--remove-precursor-peak T|F
– If true, all peaks around the precursor m/z will be removed, within a range specified by the --remove-precursor-tolerance option. Default =false
.--remove-precursor-tolerance <float>
– This parameter specifies the tolerance (in Th) around each precursor m/z that is removed when the --remove-precursor-peak option is invoked. Default =1.5
.--scan-number <string>
– A single scan number or a range of numbers to be searched. Range should be specified as 'first-last' which will include scans 'first' and 'last'. Default =<empty>
.--skip-preprocessing T|F
– Skip preprocessing steps on spectra. Default = F. Default =false
.--spectrum-max-mz <float>
– The highest spectrum m/z to search in the ms2 file. Default =1e+09
.--spectrum-min-mz <float>
– The lowest spectrum m/z to search in the ms2 file. Default =0
.--use-flanking-peaks T|F
– Include flanking peaks around singly charged b and y theoretical ions. Each flanking peak occurs in the adjacent m/z bin and has half the intensity of the primary peak. Default =false
.--use-neutral-loss-peaks T|F
– Controls whether neutral loss ions are considered in the search. For XCorr, the loss of ammonia (NH3, 17.0086343 Da) is applied to singly charged b- and y-ions, and the loss of water (H2O; 18.0091422) is applied to b-ions. If the precursor charge is >=3, then a doubly-charged version of each ion is added. For XCorr p-value, three types of neutral losses are included. Loss of ammonia and water are applied to b- and y-ions, and a carbon monoxide loss (CO, 27.9949) is also applied to b-ions. Higher charge fragments are included for all possible charges less than the precursor charge. All neutral loss peaks have an intensity 1/10 of the primary peak. Neutral losses are not yet implemented for the res-ev score function. Default =true
.
-
CPU threads
--num-threads <integer>
– 0=poll CPU to set num threads; else specify num threads directly. Default =1
.
-
param-medic options
--pm-charges <string>
– Precursor charge states to consider MS/MS spectra from, in measurement error estimation, provided as comma-separated values. Default =0,2,3,4
.--pm-max-frag-mz <float>
– Maximum fragment m/z value to use in measurement error estimation. Default =1800
.--pm-max-precursor-delta-ppm <float>
– Maximum ppm distance between precursor m/z values to consider two scans potentially generated by the same peptide for measurement error estimation. Default =50
.--pm-max-precursor-mz <float>
– Minimum precursor m/z value to use in measurement error estimation. Default =1800
.--pm-max-scan-separation <integer>
– Maximum number of scans two spectra can be separated by in order to be considered potentially generated by the same peptide, for measurement error estimation. Default =1000
.--pm-min-common-frag-peaks <integer>
– Number of the most-intense peaks that two spectra must share in order to potentially be generated by the same peptide, for measurement error estimation. Default =20
.--pm-min-frag-mz <float>
– Minimum fragment m/z value to use in measurement error estimation. Default =150
.--pm-min-peak-pairs <integer>
– Minimum number of peak pairs (for precursor or fragment) that must be successfully paired in order to attempt to estimate measurement error distribution. Default =200
.--pm-min-precursor-mz <float>
– Minimum precursor m/z value to use in measurement error estimation. Default =400
.--pm-min-scan-frag-peaks <integer>
– Minimum fragment peaks an MS/MS scan must contain to be used in measurement error estimation. Default =40
.--pm-pair-top-n-frag-peaks <integer>
– Number of fragment peaks per spectrum pair to be used in fragment error estimation. Default =5
.--pm-top-n-frag-peaks <integer>
– Number of most-intense fragment peaks to consider for measurement error estimation, per MS/MS spectrum. Default =30
.
-
Input and output
--concat T|F
– When set to T, target and decoy search results are reported in a single file, and only the top-scoring N matches (as specified via --top-match) are reported for each spectrum, irrespective of whether the matches involve target or decoy peptides. Default =false
.--file-column T|F
– Include the file column in tab-delimited output. Default =true
.--fileroot <string>
– The fileroot string will be added as a prefix to all output file names. Default =<empty>
.--mass-precision <integer>
– Set the precision for masses and m/z written to sqt and text files. Default =4
.--mzid-output T|F
– Output an mzIdentML results file to the output directory. Default =false
.--output-dir <string>
– The name of the directory where output files will be created. Default =crux-output
.--overwrite T|F
– Replace existing files if true or fail when trying to overwrite a file if false. Default =false
.--parameter-file <string>
– A file containing parameters. See the parameter documentation page for details. Default =<empty>
.--pepxml-output T|F
– Output a pepXML results file to the output directory. Default =false
.--pin-output T|F
– Output a Percolator input (PIN) file to the output directory. Default =false
.--precision <integer>
– Set the precision for scores written to sqt and text files. Default =8
.--print-search-progress <integer>
– Show search progress by printing every n spectra searched. Set to 0 to show no search progress. Default =1000
.--spectrum-parser pwiz|mstoolkit
– Specify the parser to use for reading in MS/MS spectra. The default, ProteoWizard parser can read the MS/MS file formats listed here. The alternative is MSToolkit parser. If the ProteoWizard parser fails to read your files properly, you may want to try the MSToolkit parser instead. Default =pwiz
.--sqt-output T|F
– Outputs an SQT results file to the output directory. Note that if sqt-output is enabled, then compute-sp is automatically enabled and cannot be overridden. Default =false
.--store-index <string>
– When providing a FASTA file as the index, the generated binary index will be stored at the given path. This option has no effect if a binary index is provided as the index. Default =<empty>
.--store-spectra <string>
– Specify the name of the file where the binarized fragmentation spectra will be stored. Subsequent runs of crux tide-search will execute more quickly if provided with the spectra in binary format. The filename is specified relative to the current working directory, not the Crux output directory (as specified by --output-dir). This option is not valid if multiple input spectrum files are given. Default =<empty>
.--top-match <integer>
– Specify the number of matches to report for each spectrum. Default =5
.--txt-output T|F
– Output a tab-delimited results file to the output directory. Default =true
.--brief-output T|F
– Output in tab-delimited text only the file name, scan number, charge, score and peptide.Incompatible with mzid-output=T, pin-output=T, pepxml-output=T or txt-output=F. Default =false
.--use-z-line T|F
– Specify whether, when parsing an MS2 spectrum file, Crux obtains the precursor mass information from the "S" line or the "Z" line. Default =true
.--verbosity <integer>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default =30
.