Customization and search options
Crux allows the user the flexibility to change many of the search and analysis parameters. Attributes like the output format and which peptides are selected from the protein database are controlled through numerous options. This page starts with some general information about options and then describe the use of some key crux options.
Introduction to options
A crux command is made up of four parts: executable name, command, options, and required arguments. Let's use a crux tide-search command as an example. Here is the general form.
$ crux tide-search [options] <mass spectra> <peptide index>
In this case, the command is tide-search. This is followed by zero or more optional arguments. Finally, the required arguments, listed above inside angle brackets, include the name of the file containing the spectra to be identified and the name of a peptide index produced previously by the tide-index command.
All of the available options are described for each command on the documentation pages. You can also get a list of available options by running a command with no arguments. For example, the command
$ crux tide-search
will produce output that looks like this:
bash-3.2$ ~/proj/crux/trunk/src/c/crux tide-search FATAL: Expected at least 2 arguments, but found 0 USAGE: crux tide-search [options] <tide spectra file> <tide database index> REQUIRED ARGUMENTS: <tide spectra file> The name of the file from which to parse the fragmentation spectra, in any of the file formats supported by ProteoWizard. Alternatively, the argument may be a binary spectrum file produced by a previous run of crux tide-search using the store-spectra parameter. Multiple files can be included on the command line (space delimited), prior to the name of the database. <tide database> Either a FASTA file or a directory containing a database index created by a previous run of crux tide-index. OPTIONAL ARGUMENTS: [--auto-mz-bin-width false|warn|fail] Automatically estimate optimal value for the mz-bin-width parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default = false. [--auto-precursor-window false|warn|fail] Automatically estimate optimal value for the precursor-window parameter from the spectra themselves. false=no estimation, warn=try to estimate but use the default value in case of failure, fail=try to estimate and quit in case of failure. Default = false. [--compute-sp T|F] Compute the preliminary score Sp for all candidate peptides. Report this score in the output, along with the corresponding rank, the number of matched ions and the total number of ions. This option is recommended if results are to be analyzed by Percolator.If sqt-output is enabled, then compute-sp is automatically enabled and cannot be overridden. Note that the Sp computation requires re-processing each observed spectrum, so turning on this switch involves significant computational overhead. Default = false. [--concat T|F] When set to T, target and decoy search results are reported in a single file, and only the top-scoring N matches (as specified via --top-match) are reported for each spectrum, irrespective of whether the matches involve target or decoy peptides. Default = false. [--deisotope] Perform a simple deisotoping operation across each MS2 spectrum. For each peak in an MS2 spectrum, consider lower m/z peaks. If the current peak occurs where an expected peak would lie for any charge state less than the charge state of the precursor, within mass tolerance, and if the current peak is of lower abundance, then the peak is removed. The value of this parameter is the mass tolerance, in units of parts-per-million. If set to 0, no deisotoping is performed. Default = 0. ...
The first three lines are telling you that you forgot the required arguments and are reminding you what they are. The following lines list all the options (only five of which are shown above). Crux options all begin with two dashes followed by the option name. The name is followed by a space and the appropriate argument. This example increases the verbosity to 40:
$ crux tide-search --verbosity 40 demo.ms2 small-yeast.fasta
Specifying options via a parameter file
One of the options listed, --parameter-file, is available for all Crux commands. The parameter file allows multiple options to be specified in a file. All of the command line options can be put in a parameter file, but the format is slightly different. In the parameter file, the two leading dashes are removed from the option name, and the option name and value must be separated by an equal sign instead of a space:
<option name>=<argument>
The above example, in which we changed the verbosity, would look like this in a parameter file:
verbosity=40
The parameter file allows only one option per line. Lines beginning with "#" are considered comments and are ignored. A sample parameter file can be found here. Command line and parameter file options may be used separately or together. If an option is specified in both places, then value on the command line will be used.
During execution of any Crux command, a parameter file containing the
name and value of all the options for the current operation
will be automatically be saved in the output directory. Note that not
all parameters in the file may have been used in the operation. The
parameter file will be named
<command>.params.txt
,
where <tag>
is the name of the command that was
executed.
Shared options
In addition to --parameter-file
, Crux includes several
other options that are shared across all, or nearly all, Crux commands.
-
--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-output. -
--fileroot <string>
– Thefileroot
string will be added as a prefix to all output file names. Default = none. -
--overwrite <T|F>
– By default, if Crux detects that the output file it is about to produce already exists, then Crux will exit with an error. This option allows Crux to overwrite existing files. -
--verbosity <0-100>
– Specify the verbosity of the current command. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30. -
In addition, many Crux commands include various options of the form
--<format>-output
. These options take Boolean arguments (specified as "T" or "F") and indicate whether output files in the specified format should be produced. For example, in addition to tab-delimited text format,tide-search
can produce output in PepXML, MZid, SQT and PINxml formats.
Changing the indexing and searching parameters
Various options to tide-index control how
the proteins in the database are converted to peptides. These options
fall into several categories, allowing specification of peptide
properties such as minimum and maximum length, enzymatic digestion
rules, decoy database generation and specification of various static
and variable modifications. These options are fully
documented here. Similarly,
the tide-search
documentation describes options for selecting which spectra to score,
the rules for selecting candidate peptides for a given spectrum, and
for deciding what kinds of scores to report.