tide-index
Usage:
crux tide-index [options] <protein fasta file> <index name>
Description:
Tide is a tool for identifying peptides from tandem mass spectra. It is an independent reimplementation of the SEQUEST® algorithm, which assigns peptides to spectra by comparing the observed spectra to a catalog of theoretical spectra derived from a database of known proteins. Tide's primary advantage is its speed. Our published paper provides more detail on how Tide works. If you use Tide in your research, please cite:
Benjamin J. Diament and William Stafford Noble. "Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra." Journal of Proteome Research. 10(9):3871-9, 2011.
The tide-index
command performs an optional pre-processing step on the protein database, converting it to a binary format suitable for input to the tide-search
command.
Tide considers only the standard set of 21 amino acids. Peptides containing non-amino acid alphanumeric characters (BJXZ) are skipped. Non-alphanumeric characters are ignored completely.
Input:
protein fasta file
– The name of the file in FASTA format from which to retrieve proteins.index name
– The desired name of the binary index.
Output:
The program writes files to the folder crux-output
by default. The name of the output folder can be set by the user using the --output-dir
option. The following files will be created:
index
– A binary index, using the name specified on the command line.tide-index.params.txt
– a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the --parameter-file option for other crux programs.tide-index.log.txt
– a log file containing a copy of all messages that were printed to the screen during execution.
Options:
-
tide-index options
--memory-limit <integer>
– The maximum amount of memory (i.e., RAM), in GB, to be used by tide-index. Default =4
.--auto-modifications-spectra <string>
– Specify the spectra file to be used for modification inference when the auto-modifications option is enabled. Multiple files may be separated by commas. Default =<empty>
.
-
Peptide properties
--clip-nterm-methionine T|F
– When set to T, for each protein that begins with methionine, tide-index will put two copies of the leading peptide into the index, with and without the N-terminal methionine. Default =false
.--isotopic-mass average|mono
– Specify the type of isotopic masses to use when calculating the peptide mass. Default =mono
.--max-length <integer>
– The maximum length of peptides to consider. Default =50
.--max-mass <float>
– The maximum mass (in Da) of peptides to consider. Default =7200
.--min-length <integer>
– The minimum length of peptides to consider. Default =6
.--min-mass <float>
– The minimum mass (in Da) of peptides to consider. Default =200
.
-
Amino acid modifications
--cterm-peptide-mods-spec <string>
– Specify peptide c-terminal modifications. See nterm-peptide-mods-spec for syntax. Default =<empty>
.--cterm-protein-mods-spec <string>
– [[html:Specifies C-terminal static and variable mass modifications on proteins.Mod specification syntax is the same as for peptide mods (see nterm-peptide-mods-spec option),but these mods are applied only to peptide C-terminals that are also protein terminals.If variable modification are provided for both peptide and protein terminal, they will be applied one at a time. Default =<empty>
.--max-mods <integer>
– The maximum number of modifications that can be applied to a single peptide. Default =255
.--min-mods <integer>
– The minimum number of modifications that can be applied to a single peptide. Default =0
.--mod-precision <integer>
– Set the precision for modifications as written to .txt files. Default =4
.--mods-spec <string>
– The general form of a modification specification has three components, as exemplified by 1STY+79.966331.
The three components are: [max_per_peptide]residues[+/-]mass_change
In the example, max_per_peptide is 1, residues are STY, and mass_change is +79.966331. To specify a static modification, the number preceding the amino acid must be omitted; i.e., C+57.02146 specifies a static modification of 57.02146 Da to cysteine. Note that Tide allows at most one modification per amino acid. Also, the default modification (C+57.02146) will be added to every mods-spec string unless an explicit C+0 is included. Default =C+57.02146
.--nterm-peptide-mods-spec <string>
– Specify peptide n-terminal modifications. Like --mods-spec, this specification has three components, but with a slightly different syntax. The max_per_peptide can be either "1", in which case it defines a variable terminal modification, or missing, in which case the modification is static. The residues field indicates which amino acids are subject to the modification, with the residue X corresponding to any amino acid. Finally, added_mass is defined as before. Default =<empty>
.--nterm-protein-mods-spec <string>
– [[html:Same as cterm-protein-mods-spec, but for the protein N-terminal. Default =<empty>
.--auto-modifications T|F
– Automatically infer modifications from the spectra themselves. Default =false
.
-
Decoy database generation
--allow-dups T|F
– Prevent duplicate peptides between the target and decoy databases. When set to "F", the program keeps all target and previously generated decoy peptides in memory. A shuffled decoy will be re-shuffled multiple times to avoid duplication. If a non-duplicated peptide cannot be generated, the decoy is skipped entirely. When set to "T", every decoy is added to the database without checking for duplication. This option reduces the memory requirements significantly. Default =false
.--decoy-format none|shuffle|peptide-reverse
– Include a decoy version of every peptide by shuffling or reversing the target sequence or protein. In shuffle or peptide-reverse mode, each peptide is either reversed or shuffled, leaving the N-terminal and C-terminal amino acids in place. Note that peptides appear multiple times in the target database are only shuffled once. In peptide-reverse mode, palindromic peptides are shuffled. Also, if a shuffled peptide produces an overlap with the target or decoy database, then the peptide is re-shuffled up to 5 times. Note that, despite this repeated shuffling, homopolymers will appear in both the target and decoy database. Default =shuffle
.--keep-terminal-aminos N|C|NC|none
– When creating decoy peptides using decoy-format=shuffle or decoy-format=peptide-reverse, this option specifies whether the N-terminal and C-terminal amino acids are kept in place or allowed to be shuffled or reversed. For a target peptide "EAMPK" with decoy-format=peptide-reverse, setting keep-terminal-aminos to "NC" will yield "EPMAK"; setting it to "C" will yield "PMAEK"; setting it to "N" will yield "EKPMA"; and setting it to "none" will yield "KPMAE". Default =NC
.--num-decoys-per-target <integer>
– The number of decoys to generate per target. When set to a value n, then with concat=F tide-search will output one target and n decoys. The resulting files can be used to run the "average target-decoy competition" method in assign-confidence. This parameter only applies when decoy-format=shuffle and should always be used in combination with allow-dups=T. Default =1
.--seed <string>
– When given a unsigned integer value seeds the random number generator with that value. When given the string "time" seeds the random number generator with the system time. Default =1
.
-
Enzymatic digestion
--custom-enzyme <string>
– Specify rules for in silico digestion of protein sequences. Overrides the enzyme option. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as [RK]|{P}. AspN cuts after any residue but only before D which is represented as [X]|[D]. To prevent the sequences from being digested at all, use {X}|{X}. Default =<empty>
.--digestion full-digest|partial-digest|non-specific-digest
– Specify whether every peptide in the database must have two enzymatic termini (full-digest) or if peptides with only one enzymatic terminus are also included (partial-digest). Default =full-digest
.--enzyme no-enzyme|trypsin|trypsin/p|chymotrypsin|elastase|clostripain|cyanogen-bromide|iodosobenzoate|proline-endopeptidase|staph-protease|asp-n|lys-c|lys-n|arg-c|glu-c|pepsin-a|elastase-trypsin-chymotrypsin|lysarginase|custom-enzyme
– Specify the enzyme used to digest the proteins in silico. Available enzymes (with the corresponding digestion rules indicated in parentheses) include no-enzyme ([X]|[X]), trypsin ([RK]|{P}), trypsin/p ([RK]|[]), chymotrypsin ([FWYL]|{P}), elastase ([ALIV]|{P}), clostripain ([R]|[]), cyanogen-bromide ([M]|[]), iodosobenzoate ([W]|[]), proline-endopeptidase ([P]|[]), staph-protease ([E]|[]), asp-n ([]|[D]), lys-c ([K]|{P}), lys-n ([]|[K]), arg-c ([R]|{P}), glu-c ([DE]|{P}), pepsin-a ([FL]|{P}), elastase-trypsin-chymotrypsin ([ALIVKRWFY]|{P}), lysarginase ([]|[KR]). Specifying --enzyme no-enzyme yields a non-enzymatic digest. Warning: the resulting index may be quite large. Default =trypsin
.--missed-cleavages <integer>
– Maximum number of missed cleavages per peptide to allow in enzymatic digestion. Default =0
.
-
Input and output
--decoy-prefix <string>
– Specifies the prefix of the protein names that indicate a decoy. Default =decoy_
.--mass-precision <integer>
– Set the precision for masses and m/z written to sqt and text files. Default =4
.--output-dir <string>
– The name of the directory where output files will be created. Default =crux-output
.--overwrite T|F
– Replace existing files if true or fail when trying to overwrite a file if false. Default =false
.--parameter-file <string>
– A file containing parameters. See the parameter documentation page for details. Default =<empty>
.--peptide-list T|F
– Create in the output directory a text file listing of all the peptides in the database, along with their corresponding decoy peptides, neutral masses and proteins, one per line. Default =false
.--temp-dir <string>
– The name of the directory where temporary files will be created. If this parameter is blank, then the system temporary directory will be used Default =<empty>
.--verbosity <integer>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default =30
.