comet
Usage:
crux comet [options] <input spectra> <database_name>
Description:
This command searches a protein database with a set of spectra, assigning peptide sequences to the observed spectra. This search engine was developed by Jimmy Eng at the University of Washington Proteomics Resource.
Although its history goes back two decades, the Comet search engine was first made publicly available in August 2012 on SourceForge. Comet is multithreaded and supports multiple input and output formats.
"Comet: an open source tandem mass spectrometry sequence database search tool." Eng JK, Jahan TA, Hoopmann MR. Proteomics. 2012 Nov 12. doi: 10.1002/pmic201200439
Input:
input spectra
– The name of one or more files from which to parse the spectra. Valid formats include mzXML, mzML, mz5, raw, ms2, and cms2. Files in mzML or mzXML may be compressed with gzip. RAW files can be parsed only under windows and if the appropriate libraries were included at compile time. Multiple files can be included on the command line (space delimited), prior to the name of the database.database_name
– A full or relative path to the sequence database, in FASTA or PEFF format, to search. Example databases include RefSeq or UniProt. The database can contain amino acid sequences or nucleic acid sequences. If sequences are amino acid sequences, set the parameter "nucleotide_reading_frame = 0". If the sequences are nucleic acid sequences, you must instruct Comet to translate these to amino acid sequences. Do this by setting nucleotide_reading_frame" to a value between 1 and 9.
Output:
The program writes files to the folder crux-output
by default. The name of the output folder can be set by the user using the --output-dir
option. The following files will be created:
comet.target.txt
– a tab-delimited text file containing the target PSMs. See txt file format for a list of the fields.comet.params.txt
– a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.comet.log.txt
– a log file containing a copy of all messages that were printed to standard error.
Options:
-
Database
--decoy_search <integer>
– 0=no, 1=concatenated search, 2=separate search. Default =0
.--peff_format <integer>
– 0=normal FASTA format,1=PEFF PSI-MOD modifications and amino acid variants,2=PEFF Unimod modifications and amino acid variants,3=PEFF PSI-MOD modifications, skipping amino acid variants,4=PEFF Unimod modifications, skipping amino acid variants,5=PEFF amino acid variants, skipping PEFF modifications. Default =0
.--peff_obo <string>
– A full or relative path to the OBO file used with a PEFF search. Supported OBO formats are PSI-Mod and Unimod OBO files. Which OBO file you use depends on your PEFF input file. This parameter is ignored if "peff_format = 0". There is no default value if this parameter is missing. Default =<empty>
.
-
CPU threads
--num_threads <integer>
– 0=poll CPU to set num threads; else specify num threads directly. Default =0
.
-
Masses
--peptide_mass_tolerance <float>
– Controls the mass tolerance value. The mass tolerance is set at +/- the specified number i.e. an entered value of "1.0" applies a -1.0 to +1.0 tolerance. The units of the mass tolerance is controlled by the parameter "peptide_mass_units". Default =3
.--auto_peptide_mass_tolerance false|warn|fail
– Automatically estimate optimal value for the peptide_mass_tolerance parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default =false
.--peptide_mass_units <integer>
– 0=amu, 1=mmu, 2=ppm. Default =0
.--mass_type_parent <integer>
– 0=average masses, 1=monoisotopic masses. Default =1
.--mass_type_fragment <integer>
– 0=average masses, 1=monoisotopic masses. Default =1
.--precursor_tolerance_type <integer>
– 0=singly charged peptide mass, 1=precursor m/z. Default =0
.--isotope_error <integer>
– 0=off, 1=0/1 (C13 error), 2=0/1/2, 3=0/1/2/3, 4=--8/-4/0/4/8 (for +4/+8 labeling), 5=-1/0/1/2/3. Default =0
.
-
Search enzyme
--search_enzyme_number <integer>
– Specify a search enzyme from the end of the parameter file. Default =1
.--search_enzyme2_number <integer>
– Specify a second search enzyme from the end of the parameter file. Default =0
.--num_enzyme_termini <integer>
– valid values are 1 (semi-digested), 2 (fully digested), 8 N-term, 9 C-term. Default =2
.--allowed_missed_cleavage <integer>
– Maximum value is 5; for enzyme search. Default =2
.
-
Fragment ions
--fragment_bin_tol <float>
– Binning to use on fragment ions. Default =1.000507
.--fragment_bin_offset <float>
– Offset position to start the binning (0.0 to 1.0). Default =0.4
.--auto_fragment_bin_tol false|warn|fail
– Automatically estimate optimal value for the fragment_bin_tol parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default =false
.--theoretical_fragment_ions <integer>
– 0=default peak shape, 1=M peak only. Default =1
.--use_A_ions <integer>
– Controls whether or not A-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_B_ions <integer>
– Controls whether or not B-ions are considered in the search (0 - no, 1 - yes). Default =1
.--use_C_ions <integer>
– Controls whether or not C-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_X_ions <integer>
– Controls whether or not X-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_Y_ions <integer>
– Controls whether or not Y-ions are considered in the search (0 - no, 1 - yes). Default =1
.--use_Z_ions <integer>
– Controls whether or not Z-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_Z1_ions <integer>
– Controls whether or not Z1-ions are considered in the search (0 - no, 1 - yes). Default =0
.--use_NL_ions <integer>
– 0=no, 1= yes to consider NH3/H2O neutral loss peak. Default =1
.
-
mzXML/mzML parameters
--scan_range <string>
– Start and scan scan range to search; 0 as first entry ignores parameter. Default =0 0
.--precursor_charge <string>
– Precursor charge range to analyze; does not override mzXML charge; 0 as first entry ignores parameter. Default =0 0
.--override_charge <integer>
– Specifies the whether to override existing precursor charge state information when present in the files with the charge range specified by the "precursor_charge" parameter. Default =0
.--ms_level <integer>
– MS level to analyze, valid are levels 2 or 3. Default =2
.--activation_method ALL|CID|ECD|ETD+SA|ETD|PQD|HCD|IRMPD
– Specifies which scan types are searched. Default =ALL
.
-
Miscellaneous parameters
--clip_nterm_methionine <integer>
– 0=leave sequences as-is; 1=also consider sequence w/o N-term methionine. Default =0
.--decoy_prefix <string>
– Specifies the prefix of the protein names that indicates a decoy. Default =decoy_
.--digest_mass_range <string>
– MH+ peptide mass range to analyze. Default =600.0 5000.0
.--equal_I_and_L <integer>
– This parameter controls whether the Comet treats isoleucine (I) and leucine (L) as the same/equivalent with respect to a peptide identification. 0 treats I and L as different, 1 treats I and L as the same. The default value is "1" Default =1
.--mass_offsets <string>
– Specifies one or more mass offsets to apply. This value(s) are effectively subtracted from each precursor mass such that peptides that are smaller than the precursor mass by the offset value can still be matched to the respective spectrum. Default =<empty>
.--max_duplicate_proteins <integer>
– defines the maximum number of proteins (identifiers/accessions) to report. If a peptide is present in 6 total protein sequences, there is one (first) reference protein and 5 additional duplicate proteins. This parameter controls how many of those 5 additional duplicate proteins are reported.If "decoy_search = 2" is set to report separate target and decoy results, this parameter will be applied to the target and decoy outputs separately. If set to "-1", there will be no limit on the number of reported additional proteins. The default value is "20" if this parameter is missing. Default =20
.--max_fragment_charge <integer>
– Set maximum fragment charge state to analyze (allowed max 5). Default =3
.--max_index_runtime <integer>
– Sets the maximum indexed database search run time for a scan/query. Valid values are integers 0 or higher representing the maximum run time in milliseconds. As Comet loops through analyzing peptides from the database index file, it checks the cummulative run time of that spectrum search after each peptide is analyzed. If the run time exceeds the value set for this parameter, the search is aborted and the best peptide result analyzed up to that point is returned. To have no maximum search time, set this parameter value to "0". The default value is "0". Default =0
.--max_precursor_charge <integer>
– Set maximum precursor charge state to analyze (allowed max 9). Default =6
.--num_results <integer>
– Number of search hits to store internally. Default =50
.--nucleotide_reading_frame <integer>
– 0=proteinDB, 1-6, 7=forward three, 8=reverse three, 9=all six. Default =0
.--output_suffix <string>
– Specifies the suffix string that is appended to the base output name for the pep.xml, pin.xml, txt and sqt output files. Default =<empty>
.--peptide_length_range <string>
– Defines the length range of peptides to search. This parameter has two integer values. The first value is the minimum length cutoff and the second value is the maximum length cutoff. Only peptides within the specified length range are analyzed. The maximum peptide length that Comet can analyze is 63. The default values are "1 63". Default =1 63
.--precursor_NL_ions <string>
– Controls whether or not precursor neutral loss peaks are considered in the xcorr scoring. If left blank, this parameter is ignored. To consider precursor neutral loss peaks, add one or more neutral loss mass value separated by a space. Each entered mass value will be subtracted from the experimentral precursor mass and resulting neutral loss m/z values for all charge states (from 1 to precursor charge) will be analyzed. As these neutral loss peaks are analyzed along side fragment ion peaks, the fragment tolerance settings (fragment_bin_tol, fragment_bin_offset, theoretical_fragment_ion) apply to the precursor neutral loss peaks. The default value is blank/unused. Default =<empty>
.--skip_researching <integer>
– For '.out' file output only, 0=search everything again, 1=don't search if .out exists. Default =1
.--spectrum_batch_size <integer>
– Maximum number of spectra to search at a time; 0 to search the entire scan range in one loop. Default =0
.--text_file_extension <string>
– Specifies the a custom extension for output text file. Default =<empty>
.--explicit_deltacn <integer>
– 0=Comet deltaCn reported between the top peptide and the first dissimilar peptide, 1=Comet deltaCn reported between the top two peptides. Default =0
.--old_mods_encoding <integer>
– 0=Comet will use mass based modification encodings, 1=Comet will use the old character based modification encodings. Default =0
.
-
Spectral processing
--minimum_peaks <integer>
– Minimum number of peaks in spectrum to search. Default =10
.--minimum_intensity <float>
– Minimum intensity value to read in. Default =0
.--remove_precursor_peak <integer>
– 0=no, 1=yes, 2=all charge reduced precursor peaks (for ETD). Default =0
.--remove_precursor_tolerance <float>
– +- Da tolerance for precursor removal. Default =1.5
.--clear_mz_range <string>
– For iTRAQ/TMT type data; will clear out all peaks in the specified m/z range. Default =0.0 0.0
.
-
Variable modifications
--variable_mod01 <string>
– Up to 9 variable modifications are supported. Each modification is specified using seven entries: "<mass> <residues> <type> <max> <distance> <terminus> <force>." Type is 0 for static mods and non-zero for variable mods. Note that that if you set the same type value on multiple modification entries, Comet will treat those variable modifications as a binary set. This means that all modifiable residues in the binary set must be unmodified or modified. Multiple binary sets can be specified by setting a different binary modification value. Max is an integer specifying the maximum number of modified residues possible in a peptide for this modification entry. Distance specifies the distance the modification is applied to from the respective terminus: -1 = no distance contraint; 0 = only applies to terminal residue; N = only applies to terminal residue through next N residues. Terminus specifies which terminus the distance constraint is applied to: 0 = protein N-terminus; 1 = protein C-terminus; 2 = peptide N-terminus; 3 = peptide C-terminus.Force specifies whether peptides must contain this modification: 0 = not forced to be present; 1 = modification is required. Default =0.0 null 0 4 -1 0 0
.--variable_mod02 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod03 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod04 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod05 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod06 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod07 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod08 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--variable_mod09 <string>
– See syntax for variable_mod01. Default =0.0 null 0 4 -1 0 0
.--auto_modifications T|F
– Automatically infer modifications from the spectra themselves. Default =false
.--max_variable_mods_in_peptide <integer>
– Specifies the total/maximum number of residues that can be modified in a peptide. Default =5
.--require_variable_mod <integer>
– Controls whether the analyzed peptides must contain at least one variable modification. Default =0
.
-
Static modifications
--add_Cterm_peptide <float>
– Specifiy a static modification to the c-terminus of all peptides. Default =0
.--add_Nterm_peptide <float>
– Specify a static modification to the n-terminus of all peptides. Default =0
.--add_Cterm_protein <float>
– Specify a static modification to the c-terminal peptide of each protein. Default =0
.--add_Nterm_protein <float>
– Specify a static modification to the n-terminal peptide of each protein. Default =0
.--add_A_alanine <float>
– Specify a static modification to the residue A. Default =0
.--add_C_cysteine <float>
– Specify a static modification to the residue C. Default =57.021464
.--add_D_aspartic_acid <float>
– Specify a static modification to the residue D. Default =0
.--add_E_glutamic_acid <float>
– Specify a static modification to the residue E. Default =0
.--add_F_phenylalanine <float>
– Specify a static modification to the residue F. Default =0
.--add_G_glycine <float>
– Specify a static modification to the residue G. Default =0
.--add_H_histidine <float>
– Specify a static modification to the residue H. Default =0
.--add_I_isoleucine <float>
– Specify a static modification to the residue I. Default =0
.--add_K_lysine <float>
– Specify a static modification to the residue K. Default =0
.--add_L_leucine <float>
– Specify a static modification to the residue L. Default =0
.--add_M_methionine <float>
– Specify a static modification to the residue M. Default =0
.--add_N_asparagine <float>
– Specify a static modification to the residue N. Default =0
.--add_O_pyrrolysine <float>
– Specify a static modification to the residue O. Default =0
.--add_P_proline <float>
– Specify a static modification to the residue P. Default =0
.--add_Q_glutamine <float>
– Specify a static modification to the residue Q. Default =0
.--add_R_arginine <float>
– Specify a static modification to the residue R. Default =0
.--add_S_serine <float>
– Specify a static modification to the residue S. Default =0
.--add_T_threonine <float>
– Specify a static modification to the residue T. Default =0
.--add_U_selenocysteine <float>
– Specify a static modification to the residue U. Default =0
.--add_V_valine <float>
– Specify a static modification to the residue V. Default =0
.--add_W_tryptophan <float>
– Specify a static modification to the residue W. Default =0
.--add_Y_tyrosine <float>
– Specify a static modification to the residue Y. Default =0
.--add_B_user_amino_acid <float>
– Specify a static modification to the residue B. Default =0
.--add_J_user_amino_acid <float>
– Specify a static modification to the residue J. Default =0
.--add_X_user_amino_acid <float>
– Specify a static modification to the residue X. Default =0
.--add_Z_user_amino_acid <float>
– Specify a static modification to the residue Z. Default =0
.
-
param-medic options
--pm-min-precursor-mz <float>
– Minimum precursor m/z value to use in measurement error estimation. Default =400
.--pm-max-precursor-mz <float>
– Minimum precursor m/z value to use in measurement error estimation. Default =1800
.--pm-min-frag-mz <float>
– Minimum fragment m/z value to use in measurement error estimation. Default =150
.--pm-max-frag-mz <float>
– Maximum fragment m/z value to use in measurement error estimation. Default =1800
.--pm-min-scan-frag-peaks <integer>
– Minimum fragment peaks an MS/MS scan must contain to be used in measurement error estimation. Default =40
.--pm-max-precursor-delta-ppm <float>
– Maximum ppm distance between precursor m/z values to consider two scans potentially generated by the same peptide for measurement error estimation. Default =50
.--pm-charges <string>
– Precursor charge states to consider MS/MS spectra from, in measurement error estimation, provided as comma-separated values. Default =0,2,3,4
.--pm-top-n-frag-peaks <integer>
– Number of most-intense fragment peaks to consider for measurement error estimation, per MS/MS spectrum. Default =30
.--pm-pair-top-n-frag-peaks <integer>
– Number of fragment peaks per spectrum pair to be used in fragment error estimation. Default =5
.--pm-min-common-frag-peaks <integer>
– Number of the most-intense peaks that two spectra must share in order to potentially be generated by the same peptide, for measurement error estimation. Default =20
.--pm-max-scan-separation <integer>
– Maximum number of scans two spectra can be separated by in order to be considered potentially generated by the same peptide, for measurement error estimation. Default =1000
.--pm-min-peak-pairs <integer>
– Minimum number of peak pairs (for precursor or fragment) that must be successfully paired in order to attempt to estimate measurement error distribution. Default =200
.
-
Input and output
--fileroot <string>
– The fileroot string will be added as a prefix to all output file names. Default =<empty>
.--output-dir <string>
– The name of the directory where output files will be created. Default =crux-output
.--overwrite T|F
– Replace existing files if true or fail when trying to overwrite a file if false. Default =false
.--parameter-file <string>
– A file containing parameters. See the parameter documentation page for details. Default =<empty>
.--verbosity <integer>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default =30
.--output_mzidentmlfile <integer>
– 0=no, 1=yes write mzIdentML file. Default =0
.--output_pepxmlfile <integer>
– 0=no, 1=yes write pep.xml file. Default =1
.--output_percolatorfile <integer>
– 0=no, 1=yes write percolator file. Default =0
.--output_sqtfile <integer>
– 0=no, 1=yes write sqt file. Default =0
.--output_sqtstream <integer>
– 0=no, 1=yes write sqt file. Default =0
.--output_txtfile <integer>
– 0=no, 1=yes write tab-delimited text file. Default =1
.--print_expect_score <integer>
– 0=no, 1=yes to replace Sp with expect in out & sqt. Default =1
.--num_output_lines <integer>
– num peptide results to show. Default =5
.--show_fragment_ions <integer>
– 0=no, 1=yes for out files only. Default =0
.--sample_enzyme_number <integer>
– Sample enzyme which is possibly different than the one applied to the search. Used to calculate NTT & NMC in pepXML output. Default =1
.