pipeline

Usage:

crux pipeline [options] <mass spectra> <peptide source>

Description:

Given one or more sets of tandem mass spectra as well as a protein database, this command runs a series of Crux tools and reports all of the results in a single output directory. There are four steps in the pipeline:

Bullseye to assign high-resolution precursor m/z values to MS/MS data. This step is optional.
Database searching using either Tide-search or Comet. The database can be provided as a file in FASTA format, or additionally, an index as produced by tide-index.
Post-processing using either assign-confidence or Percolator.
Pseudo quantitation using spectral-counts

All of the command line options associated with the individual tools in the pipeline can be used with the pipeline command.

Input:

mass spectra – The name of the file(s) from which to parse the fragmentation spectra, in any of the file formats supported by ProteoWizard. Alteratively, with Tide-search, these files may be binary spectrum files produced by a previous run of crux tide-search using the store-spectra parameter. Multiple files can be included on the command line (space delimited), prior to the name of the database.
peptide source – Either the name of a file in fasta format from which to retrieve proteins and peptides or an index created by a previous run of crux tide-index (for Tide searching).

Output:

The program writes files to the folder crux-output by default. The name of the output folder can be set by the user using the --output-dir option. The following files will be created:

bullseye.pid. – a file containing the fragmentation spectra for which accurate masses were successfully inferred. Unless otherwise specified (with the --spectrum-format option), the output file format is ".ms2". Note that if the output format is ".ms2," then a single spectrum may have multiple "Z" lines, each indicating a charge state and accurate mass. In addition, Bullseye inserts an "I" line (for charge-dependent analysis) corresponding to each "Z" line. The "I" line contains "EZ" in the second column, the charge and mass from the associated "Z" line in the third and fourth colummns, followed by the chromatographic apex and the intensity at the chromatographic apex.
bullseye.no-pid. – a file containing the fragmentation spectra for which accurate masses were not inferred.
hardklor.mono.txt – a tab-delimited text file containing one line for each isotope distribution, as described here.
bullseye.params.txt – a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.
bullseye.log.txt – a log file containing a copy of all messages that were printed to standard error.
tide-search.target.txt – a tab-delimited text file containing the target PSMs. See txt file format for a list of the fields.
tide-search.decoy.txt – a tab-delimited text file containing the decoy PSMs. This file will only be created if the index was created with decoys.
tide-search.params.txt – a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other Crux programs.
tide-search.log.txt – a log file containing a copy of all messages that were printed to the screen during execution.
comet.target.txt – a tab-delimited text file containing the target PSMs. See txt file format for a list of the fields.
comet.params.txt – a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.
comet.log.txt – a log file containing a copy of all messages that were printed to standard error.
percolator.target.proteins.txt – a tab-delimited file containing the target protein matches. See here for a list of the fields.
percolator.decoy.proteins.txt – a tab-delimited file containing the decoy protein matches. See here for a list of the fields.
percolator.target.peptides.txt – a tab-delimited file containing the target peptide matches. See here for a list of the fields.
percolator.decoy.peptides.txt – a tab-delimited file containing the decoy peptide matches. See here for a list of the fields.
percolator.target.psms.txt – a tab-delimited file containing the target PSMs. See here for a list of the fields.
percolator.decoy.psms.txt – a tab-delimited file containing the decoy PSMs. See here for a list of the fields.
percolator.params.txt – a file containing the name and value of all parameters for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.
percolator.log.txt – a log file containing a copy of all messages that were printed to standard error.
assign-confidence.target.txt – a tab-delimited text file that contains the targets, sorted by score. The file will contain one new column, named "<method> q-value", where <method> is either "tdc" or "mix-max".
assign-confidence.log.txt – a log file containing a copy of all messages that were printed to stderr.
assign-confidence.params.txt – a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.
spectral-counts.target.txt – a tab-delimited text file containing the protein IDs and their corresponding scores, in sorted order.
spectral-counts.params.txt – a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other Crux programs.
spectral-counts.log.txt – All messages written to standard error.

Options:

pipeline options
- --bullseye T|F – Run the Bullseye algorithm on the given MS data, using it to assign high-resolution precursor values to the MS/MS data. If a spectrum file ends with .ms2 or .cms2, matching .ms1/.cms1 files will be used as the MS1 file. Otherwise, it is assumed that the spectrum file contains both MS1 and MS2 scans. Default = false.
- --search-engine comet|tide-search – Specify which search engine to use. Default = tide-search.
- --post-processor percolator|assign-confidence – Specify which post-processor to apply to the search results. Default = percolator.
- --memory-limit <integer> – The maximum amount of memory (i.e., RAM), in GB, to be used by tide-index. Default = 4.
- --auto-modifications-spectra <string> – Specify the spectra file to be used for modification inference when the auto-modifications option is enabled. Multiple files may be separated by commas. Default = <empty>.
- --max-charge-feature <integer> – Specifies the maximum charge state feature. When set to zero, use the maximum observed charge state. Default = 0.
- --no-terminate T|F – Do not stop execution when encountering questionable SVM inputs or results. "percolator.weights.txt". Default = false.
- --protein-name-separator <string> – Determines the character to separate the protein IDs in the tab-delimited output format Default = ,.
- --spectral-counting-fdr <float> – Report the number of unique PSMs and total (including shared peptides) PSMs as two extra columns in the protein tab-delimited output. Default = 0.
- --train-best-positive T|F – Enforce that, for each spectrum, at most one PSM is included in the positive set during each training iteration. Note that if the user only provides one PSM per spectrum, then this option will have no effect. Default = false.
- --estimation-method mix-max|tdc|peptide-level – Specify the method used to estimate q-values. The mix-max procedure or target-decoy competition apply to PSMs. The peptide-level option eliminates any PSM for which there exists a better scoring PSM involving the same peptide, and then uses decoys to assign confidence estimates. Default = tdc.
- --score <string> – Specify the column (for tab-delimited input) or tag (for XML input) used as input to the q-value estimation procedure. If this parameter is unspecified, then the program searches for "xcorr score", "evalue" (comet), "exact p-value" score fields in this order in the input file. Default = <empty>.
- --sidak T|F – Adjust the score using the Sidak adjustment and reports them in a new column in the output file. Note that this adjustment only makes sense if the given scores are p-values, and that it requires the presence of the "distinct matches/spectrum" feature for each PSM. Default = false.
- --top-match-in <integer> – Specify the maximum rank to allow when parsing results files. Matches with ranks higher than this value will be ignored (a value of zero allows matches with any rank). Default = 0.
- --combine-charge-states T|F – Specify this parameter to T in order to combine charge states with peptide sequencesin peptide-centric search. Works only if estimation-method = peptide-level. Default = false.
- --combine-modified-peptides T|F – Specify this parameter to T in order to treat peptides carrying different or no modifications as being the same. Works only if estimation = peptide-level. Default = false.
- --parsimony none|simple|greedy – Perform a parsimony analysis on the proteins, and report a "parsimony rank" column in the output file. This column contains integers indicating the protein's rank in a list sorted by spectral counts. If the parsimony analysis results in two proteins being merged, then their parsimony rank is the same. In such a case, the rank is assigned based on the largest spectral count of any protein in the merged meta-protein. The "simple" parsimony algorithm only merges two proteins A and B if the peptides identified in protein A are the same as or a subset of the peptides identified in protein B. The "greedy" parsimony algorithm does additional merging, by identifying the longest protein (i.e., the protein with the most peptides) that contains one or more shared peptides. The shared peptides are assigned to the identified protein and removed from any other proteins that contain them, and the process is then repeated. Note that, with this option, some proteins end up being assigned no peptides at all; these orphan proteins are not reported in the output. Default = none.
- --threshold <float> – Only consider PSMs with a threshold value. By default, q-values are thresholded using a specified threshold value. This behavior can be changed using the --custom-threshold and --threshold-min parameters. Default = 0.01.
- --threshold-type none|qvalue|custom – Determines what type of threshold to use when filtering matches. none : read all matches, qvalue : use calculated q-value from percolator, custom : use --custom-threshold-name and --custom-threshold-min parameters. Default = qvalue.
- --input-ms2 <string> – MS2 file corresponding to the psm file. Required to measure the SIN. Ignored for NSAF, dNSAF and EMPAI. Default = <empty>.
- --unique-mapping T|F – Ignore peptides that map to multiple proteins. Default = false.
- --quant-level protein|peptide – Quantification at protein or peptide level. Default = protein.
- --measure RAW|NSAF|dNSAF|SIN|EMPAI – Type of analysis to make on the match results: (RAW|NSAF|dNSAF|SIN|EMPAI). With exception of the RAW metric, the database of sequences need to be provided using --protein-database. Default = NSAF.
- --custom-threshold-name <string> – Specify which field to apply the threshold to. The direction of the threshold (<= or >=) is governed by --custom-threshold-min. By default, the threshold applies to the q-value, specified by "percolator q-value", "decoy q-value (xcorr)". Default = <empty>.
- --custom-threshold-min T|F – When selecting matches with a custom threshold, custom-threshold-min determines whether to filter matches with custom-threshold-name values that are greater-than or equal (F) or less-than or equal (T) than the threshold. Default = true.
- --mzid-use-pass-threshold T|F – Use mzid's passThreshold attribute to filter matches. Default = false.
- --protein-database <string> – The name of the file in FASTA format. Default = <empty>.
Identifying PPIDs in MS1 spectra
- --max-persist <float> – Ignore PPIDs that persist for longer than this length of time in the MS1 spectra. The unit of time is whatever unit is used in your data file (usually minutes). These PPIDs are considered contaminants. Default = 2.
- --persist-tolerance <float> – Set the mass tolerance (+/-ppm) for finding PPIDs in consecutive MS1 scans. Default = 10.
- --gap-tolerance <integer> – Allowed gap size when checking for PPIDs across consecutive MS1 scans. Default = 1.
- --scan-tolerance <integer> – Total number of MS1 scans over which a PPID must be observed to be considered real. Gaps in persistence are allowed by setting --gap-tolerance. Default = 3.
- --bullseye-max-mass <float> – Only consider PPIDs below this maximum mass in daltons. Default = 8000.
- --bullseye-min-mass <float> – Only consider PPIDs above this minimum mass in daltons. Default = 600.
Matching PPIDs to MS2 spectra
- --exact-match T|F – When true, require an exact match (as defined by --exact-tolerance) between the center of the precursor isolation window in the MS2 scan and the base isotopic peak of the PPID. If this option is set to false and no exact match is observed, then attempt to match using a wider m/z tolerance. This wider tolerance is calculated using the PPID's monoisotopic mass and charge (the higher the charge, the smaller the window). Default = false.
- --exact-tolerance <float> – Set the tolerance (+/-ppm) for --exact-match. Default = 10.
- --retention-tolerance <float> – Set the tolerance (+/-units) around the retention time over which a PPID can be matches to the MS2 spectrum. The unit of time is whatever unit is used in your data file (usually minutes). Default = 0.5.
Peptide properties
- --clip-nterm-methionine T|F – When set to T, for each protein that begins with methionine, tide-index will put two copies of the leading peptide into the index, with and without the N-terminal methionine. Default = false.
- --isotopic-mass average|mono – Specify the type of isotopic masses to use when calculating the peptide mass. Default = mono.
- --max-length <integer> – The maximum length of peptides to consider. Default = 50.
- --max-mass <float> – The maximum mass (in Da) of peptides to consider. Default = 7200.
- --min-length <integer> – The minimum length of peptides to consider. Default = 6.
- --min-mass <float> – The minimum mass (in Da) of peptides to consider. Default = 200.
Amino acid modifications
- --cterm-peptide-mods-spec <string> – Specify peptide c-terminal modifications. See nterm-peptide-mods-spec for syntax. Default = <empty>.
- --cterm-protein-mods-spec <string> – [[html:Specifies C-terminal static and variable mass modifications on proteins.Mod specification syntax is the same as for peptide mods (see nterm-peptide-mods-spec option),but these mods are applied only to peptide C-terminals that are also protein terminals.If variable modification are provided for both peptide and protein terminal, they will be applied one at a time. Default = <empty>.
- --max-mods <integer> – The maximum number of modifications that can be applied to a single peptide. Default = 255.
- --min-mods <integer> – The minimum number of modifications that can be applied to a single peptide. Default = 0.
- --mod-precision <integer> – Set the precision for modifications as written to .txt files. Default = 4.
- --mods-spec <string> – The general form of a modification specification has three components, as exemplified by 1STY+79.966331.
  The three components are: [max_per_peptide]residues[+/-]mass_change
  In the example, max_per_peptide is 1, residues are STY, and mass_change is +79.966331. To specify a static modification, the number preceding the amino acid must be omitted; i.e., C+57.02146 specifies a static modification of 57.02146 Da to cysteine. Note that Tide allows at most one modification per amino acid. Also, the default modification (C+57.02146) will be added to every mods-spec string unless an explicit C+0 is included. Default = C+57.02146.
- --nterm-peptide-mods-spec <string> – Specify peptide n-terminal modifications. Like --mods-spec, this specification has three components, but with a slightly different syntax. The max_per_peptide can be either "1", in which case it defines a variable terminal modification, or missing, in which case the modification is static. The residues field indicates which amino acids are subject to the modification, with the residue X corresponding to any amino acid. Finally, added_mass is defined as before. Default = <empty>.
- --nterm-protein-mods-spec <string> – [[html:Same as cterm-protein-mods-spec, but for the protein N-terminal. Default = <empty>.
- --auto-modifications T|F – Automatically infer modifications from the spectra themselves. Default = false.
Decoy database generation
- --allow-dups T|F – Prevent duplicate peptides between the target and decoy databases. When set to "F", the program keeps all target and previously generated decoy peptides in memory. A shuffled decoy will be re-shuffled multiple times to avoid duplication. If a non-duplicated peptide cannot be generated, the decoy is skipped entirely. When set to "T", every decoy is added to the database without checking for duplication. This option reduces the memory requirements significantly. Default = false.
- --decoy-format none|shuffle|peptide-reverse – Include a decoy version of every peptide by shuffling or reversing the target sequence or protein. In shuffle or peptide-reverse mode, each peptide is either reversed or shuffled, leaving the N-terminal and C-terminal amino acids in place. Note that peptides appear multiple times in the target database are only shuffled once. In peptide-reverse mode, palindromic peptides are shuffled. Also, if a shuffled peptide produces an overlap with the target or decoy database, then the peptide is re-shuffled up to 5 times. Note that, despite this repeated shuffling, homopolymers will appear in both the target and decoy database. Default = shuffle.
- --keep-terminal-aminos N|C|NC|none – When creating decoy peptides using decoy-format=shuffle or decoy-format=peptide-reverse, this option specifies whether the N-terminal and C-terminal amino acids are kept in place or allowed to be shuffled or reversed. For a target peptide "EAMPK" with decoy-format=peptide-reverse, setting keep-terminal-aminos to "NC" will yield "EPMAK"; setting it to "C" will yield "PMAEK"; setting it to "N" will yield "EKPMA"; and setting it to "none" will yield "KPMAE". Default = NC.
- --num-decoys-per-target <integer> – The number of decoys to generate per target. When set to a value n, then with concat=F tide-search will output one target and n decoys. The resulting files can be used to run the "average target-decoy competition" method in assign-confidence. This parameter only applies when decoy-format=shuffle and should always be used in combination with allow-dups=T. Default = 1.
- --seed <string> – When given a unsigned integer value seeds the random number generator with that value. When given the string "time" seeds the random number generator with the system time. Default = 1.
Enzymatic digestion
- --custom-enzyme <string> – Specify rules for in silico digestion of protein sequences. Overrides the enzyme option. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as [RK]|{P}. AspN cuts after any residue but only before D which is represented as [X]|[D]. To prevent the sequences from being digested at all, use {X}|{X}. Default = <empty>.
- --digestion full-digest|partial-digest|non-specific-digest – Specify whether every peptide in the database must have two enzymatic termini (full-digest) or if peptides with only one enzymatic terminus are also included (partial-digest). Default = full-digest.
- --enzyme no-enzyme|trypsin|trypsin/p|chymotrypsin|elastase|clostripain|cyanogen-bromide|iodosobenzoate|proline-endopeptidase|staph-protease|asp-n|lys-c|lys-n|arg-c|glu-c|pepsin-a|elastase-trypsin-chymotrypsin|lysarginase|custom-enzyme – Specify the enzyme used to digest the proteins in silico. Available enzymes (with the corresponding digestion rules indicated in parentheses) include no-enzyme ([X]|[X]), trypsin ([RK]|{P}), trypsin/p ([RK]|[]), chymotrypsin ([FWYL]|{P}), elastase ([ALIV]|{P}), clostripain ([R]|[]), cyanogen-bromide ([M]|[]), iodosobenzoate ([W]|[]), proline-endopeptidase ([P]|[]), staph-protease ([E]|[]), asp-n ([]|[D]), lys-c ([K]|{P}), lys-n ([]|[K]), arg-c ([R]|{P}), glu-c ([DE]|{P}), pepsin-a ([FL]|{P}), elastase-trypsin-chymotrypsin ([ALIVKRWFY]|{P}), lysarginase ([]|[KR]). Specifying --enzyme no-enzyme yields a non-enzymatic digest. Warning: the resulting index may be quite large. Default = trypsin.
- --missed-cleavages <integer> – Maximum number of missed cleavages per peptide to allow in enzymatic digestion. Default = 0.
Precursor selection
- --auto-precursor-window false|warn|fail – Automatically estimate optimal value for the precursor-window parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default = false.
- --max-precursor-charge <integer> – The maximum charge state of a spectra to consider in search. Default = 5.
- --min-precursor-charge <integer> – The minimum charge state of a spectra to consider in search. Default = 1.
- --precursor-window <float> – Tolerance used for matching peptides to spectra. Peptides must be within +/- 'precursor-window' of the spectrum value. The precursor window units depend upon precursor-window-type. Default = 50.
- --precursor-window-type mass|mz|ppm – Specify the units for the window that is used to select peptides around the precursor mass location (mass, mz, ppm). The magnitude of the window is defined by the precursor-window option, and candidate peptides must fall within this window. For the mass window-type, the spectrum precursor m+h value is converted to mass, and the window is defined as that mass +/- precursor-window. If the m+h value is not available, then the mass is calculated from the precursor m/z and provided charge. The peptide mass is computed as the sum of the monoisotopic amino acid masses plus 18 Da for the terminal OH group. The mz window-type calculates the window as spectrum precursor m/z +/- precursor-window and then converts the resulting m/z range to the peptide mass range using the precursor charge. For the parts-per-million (ppm) window-type, the spectrum mass is calculated as in the mass type. The lower bound of the mass window is then defined as the spectrum mass * (1.0 + (precursor-window / 1000000)) and the upper bound is defined as spectrum mass * (1.0 - (precursor-window / 1000000)). Default = ppm.
Search parameters
- --auto-mz-bin-width false|warn|fail – Automatically estimate optimal value for the mz-bin-width parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default = false.
- --deisotope <float> – Perform a simple deisotoping operation across each MS2 spectrum. For each peak in an MS2 spectrum, consider lower m/z peaks. If the current peak occurs where an expected peak would lie for any charge state less than the charge state of the precursor, within mass tolerance, and if the current peak is of lower abundance, then the peak is removed. The value of this parameter is the mass tolerance, in units of parts-per-million. If set to 0, no deisotoping is performed. Default = 0.
- --fragment-tolerance <float> – Mass tolerance (in Da) for scoring pairs of peaks when creating the residue evidence matrix. This parameter only makes sense when score-function is 'residue-evidence' or 'both'. Default = 0.02.
- --isotope-error <string> – List of positive, non-zero integers. Default = <empty>.
- --min-peaks <integer> – The minimum number of peaks a spectrum must have for it to be searched. Default = 20.
- --mz-bin-offset <float> – In the discretization of the m/z axes of the observed and theoretical spectra, this parameter specifies the location of the left edge of the first bin, relative to mass = 0 (i.e., mz-bin-offset = 0.xx means the left edge of the first bin will be located at +0.xx Da). Default = 0.4.
- --mz-bin-width <float> – Before calculation of the XCorr score, the m/z axes of the observed and theoretical spectra are discretized. This parameter specifies the size of each bin. The exact formula for computing the discretized m/z value is floor((x/mz-bin-width) + 1.0 - mz-bin-offset), where x is the observed m/z value. For low resolution ion trap ms/ms data 1.0005079 and for high resolution ms/ms 0.02 is recommended. Default = 0.02.
- --override-charges T|F – If this is set to T, then all spectra are searched in all charge states from min-charge to max-charge. Otherwise, the default behavior is to search with all charge states only if a spectrum has no charge or charge=0. Default = false.
- --remove-precursor-peak T|F – If true, all peaks around the precursor m/z will be removed, within a range specified by the --remove-precursor-tolerance option. Default = false.
- --remove-precursor-tolerance <float> – This parameter specifies the tolerance (in Th) around each precursor m/z that is removed when the --remove-precursor-peak option is invoked. Default = 1.5.
- --scan-number <string> – A single scan number or a range of numbers to be searched. Range should be specified as 'first-last' which will include scans 'first' and 'last'. Default = <empty>.
- --score-function xcorr|combined-p-values – Function used for scoring PSMs. 'xcorr' is the original scoring function used by SEQUEST;`combined-p-values` combined (1) exact-p-value: a calibrated version of XCorr that uses dynamic programming and (2) residue-evidence-pvalue: a valibarated version of the ResEV that considers pairs of peaks, rather than single peaks; Default = xcorr.
- --skip-preprocessing T|F – Skip preprocessing steps on spectra. Default = F. Default = false.
- --spectrum-max-mz <float> – The highest spectrum m/z to search in the ms2 file. Default = 1e+09.
- --spectrum-min-mz <float> – The lowest spectrum m/z to search in the ms2 file. Default = 0.
- --use-flanking-peaks T|F – Include flanking peaks around singly charged b and y theoretical ions. Each flanking peak occurs in the adjacent m/z bin and has half the intensity of the primary peak. Default = false.
- --use-neutral-loss-peaks T|F – Controls whether neutral loss ions are considered in the search. For XCorr, the loss of ammonia (NH3, 17.0086343 Da) is applied to singly charged b- and y-ions, and the loss of water (H2O; 18.0091422) is applied to b-ions. If the precursor charge is >=3, then a doubly-charged version of each ion is added. For XCorr p-value, three types of neutral losses are included. Loss of ammonia and water are applied to b- and y-ions, and a carbon monoxide loss (CO, 27.9949) is also applied to b-ions. Higher charge fragments are included for all possible charges less than the precursor charge. All neutral loss peaks have an intensity 1/10 of the primary peak. Neutral losses are not yet implemented for the res-ev score function. Default = true.
Protein inference options
- --picked-protein <string> – Use the picked protein-level FDR to infer protein probabilities, provide the fasta file as the argument to this flag. Default = <empty>.
- --protein-enzyme no_enzyme|elastase|pepsin|proteinasek|thermolysin|trypsinp|chymotrypsin|lys-n|lys-c|arg-c|asp-n|glu-c|lysarginase|trypsin – Type of enzyme Default = trypsin.
- --protein-report-duplicates T|F – If multiple database proteins contain exactly the same set of peptides, then Percolator will randomly discard all but one of the proteins. If this option is set, then the IDs of these duplicated proteins will be reported as a comma-separated list. Not available for Fido. Default = false.
- --protein-report-fragments T|F – By default, if the peptides associated with protein A are a proper subset of the peptides associated with protein B, then protein A is eliminated and all the peptides are considered as evidence for protein B. Note that this filtering is done based on the complete set of peptides in the database, not based on the identified peptides in the search results. Alternatively, if this option is set and if all of the identified peptides associated with protein B are also associated with protein A, then Percolator will report a comma-separated list of protein IDs, where the full-length protein B is first in the list and the fragment protein A is listed second. Not available for Fido. Default = false.
Fido options
- --fido-alpha <float> – Specify the probability with which a present protein emits an associated peptide. Set by grid search (see --fido-gridsearch-depth parameter) if not specified. Default = 0.
- --fido-beta <float> – Specify the probability of the creation of a peptide from noise. Set by grid search (see --fido-gridsearch-depth parameter) if not specified. Default = 0.
- --fido-empirical-protein-q T|F – Estimate empirical p-values and q-values for proteins using target-decoy analysis. Default = false.
- --fido-fast-gridsearch <float> – Apply the specified threshold to PSM, peptide and protein probabilities to obtain a faster estimate of the alpha, beta and gamma parameters. Default = 0.
- --fido-gamma <float> – Specify the prior probability that a protein is present in the sample. Set by grid search (see --fido-gridsearch-depth parameter) if not specified. Default = 0.
- --fido-gridsearch-depth <integer> – Set depth of the grid search for alpha, beta and gamma estimation. The values considered, for each possible value of the --fido-gridsearch-depth parameter, are as follows:
  - 0: alpha = {0.01, 0.04, 0.09, 0.16, 0.25, 0.36, 0.5}; beta = {0.0, 0.01, 0.15, 0.025, 0.035, 0.05, 0.1}; gamma = {0.1, 0.25, 0.5, 0.75}.
  - 1: alpha = {0.01, 0.04, 0.09, 0.16, 0.25, 0.36}; beta = {0.0, 0.01, 0.15, 0.025, 0.035, 0.05}; gamma = {0.1, 0.25, 0.5}.
  - 2: alpha = {0.01, 0.04, 0.16, 0.25, 0.36}; beta = {0.0, 0.01, 0.15, 0.030, 0.05}; gamma = {0.1, 0.5}.
  - 3: alpha = {0.01, 0.04, 0.16, 0.25, 0.36}; beta = {0.0, 0.01, 0.15, 0.030, 0.05}; gamma = {0.5}.
  Default = 0.
- --fido-gridsearch-mse-threshold <float> – Q-value threshold that will be used in the computation of the MSE and ROC AUC score in the grid search. Default = 0.05.
- --fido-no-split-large-components T|F – Do not approximate the posterior distribution by allowing large graph components to be split into subgraphs. The splitting is done by duplicating peptides with low probabilities. Splitting continues until the number of possible configurations of each subgraph is below 2^18 Default = false.
- --fido-protein-truncation-threshold <float> – To speed up inference, proteins for which none of the associated peptides has a probability exceeding the specified threshold will be assigned probability = 0. Default = 0.01.
- --protein T|F – Use the Fido algorithm to infer protein probabilities. Must be true to use any of the Fido options. Default = false.
Database
- --decoy_search <integer> – 0=no, 1=concatenated search, 2=separate search. Default = 0.
- --peff_format <integer> – 0=normal FASTA format,1=PEFF PSI-MOD modifications and amino acid variants,2=PEFF Unimod modifications and amino acid variants,3=PEFF PSI-MOD modifications, skipping amino acid variants,4=PEFF Unimod modifications, skipping amino acid variants,5=PEFF amino acid variants, skipping PEFF modifications. Default = 0.
- --peff_obo <string> – A full or relative path to the OBO file used with a PEFF search. Supported OBO formats are PSI-Mod and Unimod OBO files. Which OBO file you use depends on your PEFF input file. This parameter is ignored if "peff_format = 0". There is no default value if this parameter is missing. Default = <empty>.
CPU threads
- --num-threads <integer> – 0=poll CPU to set num threads; else specify num threads directly. Default = 1.
- --num_threads <integer> – 0=poll CPU to set num threads; else specify num threads directly. Default = 0.
Masses
- --peptide_mass_tolerance <float> – Controls the mass tolerance value. The mass tolerance is set at +/- the specified number i.e. an entered value of "1.0" applies a -1.0 to +1.0 tolerance. The units of the mass tolerance is controlled by the parameter "peptide_mass_units". Default = 3.
- --peptide_mass_tolerance_lower <float> – Controls the lower bound of the precursor mass tolerance value.The units of the mass tolerance is controlled by the parameter "peptide_mass_units". Default = -3.
- --peptide_mass_tolerance_upper <float> – Controls the upper bound of the precursor mass tolerance value.The units of the mass tolerance is controlled by the parameter "peptide_mass_units". Default = 3.
- --auto_peptide_mass_tolerance false|warn|fail – Automatically estimate optimal value for the peptide_mass_tolerance parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default = false.
- --peptide_mass_units <integer> – 0=amu, 1=mmu, 2=ppm. Default = 0.
- --mass_type_parent <integer> – 0=average masses, 1=monoisotopic masses. Default = 1.
- --mass_type_fragment <integer> – 0=average masses, 1=monoisotopic masses. Default = 1.
- --precursor_tolerance_type <integer> – 0=singly charged peptide mass, 1=precursor m/z. Default = 0.
- --isotope_error <integer> – 0=off, 1=0/1 (C13 error), 2=0/1/2, 3=0/1/2/3, 4=-1/0/1/2/3, 5=-1/0/1, 6=-3/-2/-1/0/+1/+2/+3, 7=-8/-4/0/+4/+8 (for +4/+8 stable isotope labeling). Default = 0.
Search enzyme
- --search_enzyme_number <integer> – Specify a search enzyme from the end of the parameter file. Default = 1.
- --search_enzyme2_number <integer> – Specify a second search enzyme from the end of the parameter file. Default = 0.
- --num_enzyme_termini <integer> – valid values are 1 (semi-digested), 2 (fully digested), 8 N-term, 9 C-term. Default = 2.
- --allowed_missed_cleavage <integer> – Maximum value is 5; for enzyme search. Default = 2.
Fragment ions
- --fragment_bin_tol <float> – Binning to use on fragment ions. Default = 1.000507.
- --fragment_bin_offset <float> – Offset position to start the binning (0.0 to 1.0). Default = 0.4.
- --auto_fragment_bin_tol false|warn|fail – Automatically estimate optimal value for the fragment_bin_tol parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default = false.
- --theoretical_fragment_ions <integer> – 0=default peak shape, 1=M peak only. Default = 1.
- --use_A_ions <integer> – Controls whether or not A-ions are considered in the search (0 - no, 1 - yes). Default = 0.
- --use_B_ions <integer> – Controls whether or not B-ions are considered in the search (0 - no, 1 - yes). Default = 1.
- --use_C_ions <integer> – Controls whether or not C-ions are considered in the search (0 - no, 1 - yes). Default = 0.
- --use_X_ions <integer> – Controls whether or not X-ions are considered in the search (0 - no, 1 - yes). Default = 0.
- --use_Y_ions <integer> – Controls whether or not Y-ions are considered in the search (0 - no, 1 - yes). Default = 1.
- --use_Z_ions <integer> – Controls whether or not Z-ions are considered in the search (0 - no, 1 - yes). Default = 0.
- --use_Z1_ions <integer> – Controls whether or not Z1-ions are considered in the search (0 - no, 1 - yes). Default = 0.
- --use_NL_ions <integer> – 0=no, 1= yes to consider NH3/H2O neutral loss peak. Default = 1.
mzXML/mzML parameters
- --scan_range <string> – Start and scan scan range to search; 0 as first entry ignores parameter. Default = 0 0.
- --precursor_charge <string> – Precursor charge range to analyze; does not override mzXML charge; 0 as first entry ignores parameter. Default = 0 0.
- --override_charge <integer> – Specifies the whether to override existing precursor charge state information when present in the files with the charge range specified by the "precursor_charge" parameter. Default = 0.
- --ms_level <integer> – MS level to analyze, valid are levels 2 or 3. Default = 2.
- --activation_method ALL|CID|ECD|ETD+SA|ETD|PQD|HCD|IRMPD – Specifies which scan types are searched. Default = ALL.
Miscellaneous parameters
- --clip_nterm_methionine <integer> – 0=leave sequences as-is; 1=also consider sequence w/o N-term methionine. Default = 0.
- --decoy_prefix <string> – Specifies the prefix of the protein names that indicates a decoy. Default = decoy_.
- --digest_mass_range <string> – MH+ peptide mass range to analyze. Default = 600.0 5000.0.
- --equal_I_and_L <integer> – This parameter controls whether the Comet treats isoleucine (I) and leucine (L) as the same/equivalent with respect to a peptide identification. 0 treats I and L as different, 1 treats I and L as the same. The default value is "1" Default = 1.
- --mass_offsets <string> – Specifies one or more mass offsets to apply. This value(s) are effectively subtracted from each precursor mass such that peptides that are smaller than the precursor mass by the offset value can still be matched to the respective spectrum. Default = <empty>.
- --max_duplicate_proteins <integer> – defines the maximum number of proteins (identifiers/accessions) to report. If a peptide is present in 6 total protein sequences, there is one (first) reference protein and 5 additional duplicate proteins. This parameter controls how many of those 5 additional duplicate proteins are reported.If "decoy_search = 2" is set to report separate target and decoy results, this parameter will be applied to the target and decoy outputs separately. If set to "-1", there will be no limit on the number of reported additional proteins. The default value is "20" if this parameter is missing. Default = 20.
- --max_fragment_charge <integer> – Set maximum fragment charge state to analyze (allowed max 5). Default = 3.
- --max_precursor_charge <integer> – Set maximum precursor charge state to analyze (allowed max 9). Default = 6.
- --num_results <integer> – Number of search hits to store internally. Default = 50.
- --nucleotide_reading_frame <integer> – 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six. Default = 0.
- --output_suffix <string> – Specifies the suffix string that is appended to the base output name for the pep.xml, pin.xml, txt and sqt output files. Default = <empty>.
- --peptide_length_range <string> – Defines the length range of peptides to search. This parameter has two integer values. The first value is the minimum length cutoff and the second value is the maximum length cutoff. Only peptides within the specified length range are analyzed. The maximum peptide length that Comet can analyze is 63. The default values are "1 50". Default = 6 50.
- --precursor_NL_ions <string> – Controls whether or not precursor neutral loss peaks are considered in the xcorr scoring. If left blank, this parameter is ignored. To consider precursor neutral loss peaks, add one or more neutral loss mass value separated by a space. Each entered mass value will be subtracted from the experimentral precursor mass and resulting neutral loss m/z values for all charge states (from 1 to precursor charge) will be analyzed. As these neutral loss peaks are analyzed along side fragment ion peaks, the fragment tolerance settings (fragment_bin_tol, fragment_bin_offset, theoretical_fragment_ion) apply to the precursor neutral loss peaks. The default value is blank/unused. Default = <empty>.
- --spectrum_batch_size <integer> – Maximum number of spectra to search at a time; 0 to search the entire scan range in one loop. Default = 20000.
- --text_file_extension <string> – Specifies the a custom extension for output text file. Default = <empty>.
- --resolve_fullpaths <integer> – Controls whether or not to resolve the full paths of the input files. 0=Comet will not resolve full paths, 1=Comet will resolve full paths. Default = 0.
- --pinfile_protein_delimiter <string> – The default delimiter for the protein field is a tab. This parameter allows one to specify a different character or string If this parameter is left blank or is missing, the default tab delimitter is used. Default = <empty>.
Spectral processing
- --minimum_peaks <integer> – Minimum number of peaks in spectrum to search. Default = 10.
- --minimum_intensity <float> – Minimum intensity value to read in. Default = 0.
- --remove_precursor_peak <integer> – 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD). Default = 0.
- --remove_precursor_tolerance <float> – +- Da tolerance for precursor removal. Default = 1.5.
- --clear_mz_range <string> – For iTRAQ/TMT type data; will clear out all peaks in the specified m/z range. Default = 0.0 0.0.
Variable modifications
- --variable_mod01 <string> – Up to 15 variable modifications are supported. Each modification is specified using seven entries: "<mass> <residues> <type> <max> <distance> <terminus> <force>". Type is 0 for static mods and non-zero for variable mods. Note that that if you set the same type value on multiple modification entries, Comet will treat those variable modifications as a binary set. This means that all modifiable residues in the binary set must be unmodified or modified. Multiple binary sets can be specified by setting a different binary modification value. Max is an integer specifying the maximum number of modified residues possible in a peptide for this modification entry. Distance specifies the distance the modification is applied to from the respective terminus: -1 = no distance contraint; 0 = only applies to terminal residue; N = only applies to terminal residue through next N residues. Terminus specifies which terminus the distance constraint is applied to: 0 = protein N-terminus; 1 = protein C-terminus; 2 = peptide N-terminus; 3 = peptide C-terminus.Force specifies whether peptides must contain this modification: 0 = not forced to be present; 1 = modification is required. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod02 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod03 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod04 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod05 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod06 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod07 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod08 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod09 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod10 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod11 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod12 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod13 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod14 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --variable_mod15 <string> – See syntax for variable_mod01. Default = 0.0 null 0 3 -1 0 0.
- --auto_modifications T|F – Automatically infer modifications from the spectra themselves. Default = false.
- --max_variable_mods_in_peptide <integer> – Specifies the total/maximum number of residues that can be modified in a peptide. Default = 5.
- --require_variable_mod 0|1 – Controls whether the analyzed peptides must contain at least one variable modification. Default = 0.
- --protein_modlist_file <string> – Specify a full or relative path to a protein modifications file. If this entry points to a modifications file, Comet will parse the modification numbers and protein strings from the file and limit the application of the specified variable modifications to the sequence entries that match the protein string. Default = <empty>.
Static modifications
- --add_Cterm_peptide <float> – Specifiy a static modification to the c-terminus of all peptides. Default = 0.
- --add_Nterm_peptide <float> – Specify a static modification to the n-terminus of all peptides. Default = 0.
- --add_Cterm_protein <float> – Specify a static modification to the c-terminal peptide of each protein. Default = 0.
- --add_Nterm_protein <float> – Specify a static modification to the n-terminal peptide of each protein. Default = 0.
- --add_A_alanine <float> – Specify a static modification to the residue A. Default = 0.
- --add_C_cysteine <float> – Specify a static modification to the residue C. Default = 57.021464.
- --add_D_aspartic_acid <float> – Specify a static modification to the residue D. Default = 0.
- --add_E_glutamic_acid <float> – Specify a static modification to the residue E. Default = 0.
- --add_F_phenylalanine <float> – Specify a static modification to the residue F. Default = 0.
- --add_G_glycine <float> – Specify a static modification to the residue G. Default = 0.
- --add_H_histidine <float> – Specify a static modification to the residue H. Default = 0.
- --add_I_isoleucine <float> – Specify a static modification to the residue I. Default = 0.
- --add_K_lysine <float> – Specify a static modification to the residue K. Default = 0.
- --add_L_leucine <float> – Specify a static modification to the residue L. Default = 0.
- --add_M_methionine <float> – Specify a static modification to the residue M. Default = 0.
- --add_N_asparagine <float> – Specify a static modification to the residue N. Default = 0.
- --add_O_pyrrolysine <float> – Specify a static modification to the residue O. Default = 0.
- --add_P_proline <float> – Specify a static modification to the residue P. Default = 0.
- --add_Q_glutamine <float> – Specify a static modification to the residue Q. Default = 0.
- --add_R_arginine <float> – Specify a static modification to the residue R. Default = 0.
- --add_S_serine <float> – Specify a static modification to the residue S. Default = 0.
- --add_T_threonine <float> – Specify a static modification to the residue T. Default = 0.
- --add_U_selenocysteine <float> – Specify a static modification to the residue U. Default = 0.
- --add_V_valine <float> – Specify a static modification to the residue V. Default = 0.
- --add_W_tryptophan <float> – Specify a static modification to the residue W. Default = 0.
- --add_Y_tyrosine <float> – Specify a static modification to the residue Y. Default = 0.
- --add_B_user_amino_acid <float> – Specify a static modification to the residue B. Default = 0.
- --add_J_user_amino_acid <float> – Specify a static modification to the residue J. Default = 0.
- --add_X_user_amino_acid <float> – Specify a static modification to the residue X. Default = 0.
- --add_Z_user_amino_acid <float> – Specify a static modification to the residue Z. Default = 0.
Indexing
- --fragindex_skipreadprecursors <integer> – This parameter controls whether or not Comet reads all precursors from the input files. It uses this information to limit the peptides that are included in the fragment ion index. Default = 0.
- --fragindex_num_spectrumpeaks <integer> – This parameter defines the number of mass/intensity pairs that would be queried against the fragment ion index Default = 100.
- --fragindex_min_ions_score <integer> – This parameter sets the minimum number fragment ions a peptide must match against the fragmention index in order to proceed to xcorr scoring. Default = 3.
- --fragindex_min_ions_report <integer> – This parameter sets the minimum number fragment ions a peptide must match against the fragment on index in order to report this peptide in the output Default = 3.
- --fragindex_min_fragmentmass <float> – This parameter defines the minimum fragment ion mass to include in the fragment ion index. Default = 200.
- --fragindex_max_fragmentmass <float> – This parameter defines the maximum fragment ion mass to include in the fragment ion index. Default = 2000.
General options
- --only-psms T|F – Report results only at the PSM level. This flag causes Percolator to skip the step that selects the top-scoring PSM per peptide; hence, peptide-level results are left out and only PSM-level results are reported. Default = false.
- --search-input auto|separate|concatenated – Specify the type of target-decoy search. Using 'auto', percolator attempts to detect the search type automatically. Using 'separate' specifies two searches: one against target and one against decoy protein db. Using 'concatenated' specifies a single search on concatenated target-decoy protein db. Default = auto.
- --tdc T|F – Use target-decoy competition to assign q-values and PEPs. When set to F, the mix-max method, which estimates the proportion pi0 of incorrect target PSMs, is used instead. Default = true.
SVM training options
- --c-neg <float> – Penalty for mistake made on negative examples. If not specified, then this value is set by cross validation over {0.1, 1, 10}. Default = 0.
- --c-pos <float> – Penalty for mistakes made on positive examples. If this value is set to 0, then it is set via cross validation over the values {0.1, 1, 10}, selecting the value that yields the largest number of PSMs identified at the q-value threshold set via the --test-fdr parameter. Default = 0.
- --maxiter <integer> – Maximum number of iterations for training. Default = 10.
- --percolator-seed <string> – When given a unsigned integer value seeds the random number generator with that value. When given the string "time" seeds the random number generator with the system time. Default = 1.
- --quick-validation T|F – Quicker execution by reduced internal cross-validation. Default = false.
- --static T|F – Use the provided initial weights as a static model. If used, the --init-weights option must be specified. Default = false.
- --subset-max-train <integer> – Only train Percolator on a subset of PSMs, and use the resulting score vector to evaluate the other PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal. Default = 0.
- --test-each-iteration T|F – Measure performance on test set each iteration. Default = false.
- --test-fdr <float> – False discovery rate threshold used in selecting hyperparameters during internal cross-validation and for reporting the final results. Default = 0.01.
- --train-fdr <float> – False discovery rate threshold to define positive examples in training. Default = 0.01.
SVM feature input options
- --default-direction <string> – In its initial round of training, Percolator uses one feature to induce a ranking of PSMs. By default, Percolator will select the feature that produces the largest set of target PSMs at a specified FDR threshold (cf. --train-fdr). This option allows the user to specify which feature is used for the initial ranking, using the name as a string from this table. The name can be preceded by a hyphen (e.g. "-XCorr") to indicate that a lower value is better. Default = <empty>.
- --init-weights <string> – Read the unnormalized initial weights from the third line of the given file. This can be the output of the --output-weights option from a previous Percolator analysis. Note that the weights must be in the same order as features in the PSM input file(s) Default = <empty>.
- --klammer T|F – Use retention time features calculated as in "Improving tandem mass spectrum identification using peptide retention time prediction across diverse chromatography conditions" by Klammer AA, Yi X, MacCoss MJ and Noble WS. (Analytical Chemistry. 2007 Aug 15;79(16):6111-8.). Default = false.
- --output-weights T|F – Output final weights to a file named "percolator.weights.txt". Default = false.
- --override T|F – By default, Percolator will examine the learned weights for each feature, and if the weight appears to be problematic, then percolator will discard the learned weights and instead employ a previously trained, static score vector. This switch allows this error checking to be overriden. Default = false.
- --unitnorm T|F – Use unit normalization (i.e., linearly rescale each PSM's feature vector to have a Euclidean length of 1), instead of standard deviation normalization. Default = false.
param-medic options
- --pm-charges <string> – Precursor charge states to consider MS/MS spectra from, in measurement error estimation, provided as comma-separated values. Default = 0,2,3,4.
- --pm-max-frag-mz <float> – Maximum fragment m/z value to use in measurement error estimation. Default = 1800.
- --pm-max-precursor-delta-ppm <float> – Maximum ppm distance between precursor m/z values to consider two scans potentially generated by the same peptide for measurement error estimation. Default = 50.
- --pm-max-precursor-mz <float> – Minimum precursor m/z value to use in measurement error estimation. Default = 1800.
- --pm-max-scan-separation <integer> – Maximum number of scans two spectra can be separated by in order to be considered potentially generated by the same peptide, for measurement error estimation. Default = 1000.
- --pm-min-common-frag-peaks <integer> – Number of the most-intense peaks that two spectra must share in order to potentially be generated by the same peptide, for measurement error estimation. Default = 20.
- --pm-min-frag-mz <float> – Minimum fragment m/z value to use in measurement error estimation. Default = 150.
- --pm-min-peak-pairs <integer> – Minimum number of peak pairs (for precursor or fragment) that must be successfully paired in order to attempt to estimate measurement error distribution. Default = 200.
- --pm-min-precursor-mz <float> – Minimum precursor m/z value to use in measurement error estimation. Default = 400.
- --pm-min-scan-frag-peaks <integer> – Minimum fragment peaks an MS/MS scan must contain to be used in measurement error estimation. Default = 40.
- --pm-pair-top-n-frag-peaks <integer> – Number of fragment peaks per spectrum pair to be used in fragment error estimation. Default = 5.
- --pm-top-n-frag-peaks <integer> – Number of most-intense fragment peaks to consider for measurement error estimation, per MS/MS spectrum. Default = 30.
Input and output
- --fileroot <string> – The fileroot string will be added as a prefix to all output file names. Default = <empty>.
- --output-dir <string> – The name of the directory where output files will be created. Default = crux-output.
- --overwrite T|F – Replace existing files if true or fail when trying to overwrite a file if false. Default = false.
- --spectrum-format |ms2|bms2|cms2|mgf – The format to write the output spectra to. If empty, the spectra will be output in the same format as the MS2 input. Default = <empty>.
- --parameter-file <string> – A file containing parameters. See the parameter documentation page for details. Default = <empty>.
- --verbosity <integer> – Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.
- --decoy-prefix <string> – Specifies the prefix of the protein names that indicate a decoy. Default = decoy_.
- --mass-precision <integer> – Set the precision for masses and m/z written to sqt and text files. Default = 4.
- --peptide-list T|F – Create in the output directory a text file listing of all the peptides in the database, along with their corresponding decoy peptides, neutral masses and proteins, one per line. Default = false.
- --temp-dir <string> – The name of the directory where temporary files will be created. If this parameter is blank, then the system temporary directory will be used Default = <empty>.
- --concat T|F – When set to T, target and decoy search results are reported in a single file, and only the top-scoring N matches (as specified via --top-match) are reported for each spectrum, irrespective of whether the matches involve target or decoy peptides. Default = false.
- --mzid-output T|F – Output an mzIdentML results file to the output directory. Default = false.
- --mztab-output T|F – Output results in mzTab file to the output directory. Default = false.
- --pepxml-output T|F – Output a pepXML results file to the output directory. Default = false.
- --pin-output T|F – Output a Percolator input (PIN) file to the output directory. Default = false.
- --precision <integer> – Set the precision for scores written to sqt and text files. Default = 8.
- --print-search-progress <integer> – Show search progress by printing every n spectra searched. Set to 0 to show no search progress. Default = 1000.
- --spectrum-parser pwiz|mstoolkit – Specify the parser to use for reading in MS/MS spectra. The default, ProteoWizard parser can read the MS/MS file formats listed here. The alternative is MSToolkit parser. If the ProteoWizard parser fails to read your files properly, you may want to try the MSToolkit parser instead. Default = pwiz.
- --sqt-output T|F – Outputs an SQT results file to the output directory. Note that if sqt-output is enabled, then compute-sp is automatically enabled and cannot be overridden. Default = false.
- --store-index <string> – When providing a FASTA file as the index, the generated binary index will be stored at the given path. This option has no effect if a binary index is provided as the index. Default = <empty>.
- --store-spectra <string> – Specify the name of the file where the binarized fragmentation spectra will be stored. Subsequent runs of crux tide-search will execute more quickly if provided with the spectra in binary format. The filename is specified relative to the current working directory, not the Crux output directory (as specified by --output-dir). This option is not valid if multiple input spectrum files are given. Default = <empty>.
- --top-match <integer> – Specify the number of matches to report for each spectrum. Default = 5.
- --txt-output T|F – Output a tab-delimited results file to the output directory. Default = true.
- --use-z-line T|F – Specify whether, when parsing an MS2 spectrum file, Crux obtains the precursor mass information from the "S" line or the "Z" line. Default = true.
- --output_mzidentmlfile <integer> – 0=no, 1=yes write mzIdentML file. Default = 0.
- --output_pepxmlfile <integer> – 0=no, 1=yes write pep.xml file. Default = 1.
- --output_percolatorfile <integer> – 0=no, 1=yes write percolator file. Default = 0.
- --output_sqtfile <integer> – 0=no, 1=yes write sqt file. Default = 0.
- --output_sqtstream <integer> – 0=no, 1=yes write sqt file. Default = 0.
- --output_txtfile <integer> – 0=no, 1=yes write tab-delimited text file. Default = 1.
- --num_output_lines <integer> – num peptide results to show. Default = 5.
- --decoy-xml-output T|F – Include decoys (PSMs, peptides, and/or proteins) in the XML output. Default = false.
- --feature-file-out T|F – Output the computed features in tab-delimited Percolator input (.pin) format. The features will be normalized, using either unit norm or standard deviation normalization (depending upon the value of the unit-norm option). Default = false.
- --pout-output T|F – Output a Percolator pout.xml format results file to the output directory. Default = false.
- --list-of-files T|F – Specify that the search results are provided as lists of files, rather than as individual files. Default = false.

pipeline

Usage:

Description:

Input:

Output:

Options:

pipeline options

Identifying PPIDs in MS1 spectra

Matching PPIDs to MS2 spectra

Peptide properties

Amino acid modifications

Decoy database generation

Enzymatic digestion

Precursor selection

Search parameters

Protein inference options

Fido options

Database

CPU threads

Masses

Search enzyme

Fragment ions

mzXML/mzML parameters

Miscellaneous parameters

Spectral processing

Variable modifications

Static modifications

Indexing

General options

SVM training options

SVM feature input options

param-medic options

Input and output