Customization and search options

Crux allows the user the flexibility to change many of the search and analysis parameters. Attributes like the output format and which peptides are selected from the protein database are controlled through numerous options. This page starts with some general information about options and then describe the use of some key crux options.

Introduction to options

A crux command is made up of four parts: executable name, command, options, and required arguments. Let's use a crux tide-search command as an example. Here is the general form.

$ crux tide-search [options] <mass spectra> <peptide index>

In this case, the command is tide-search. This is followed by zero or more optional arguments. Finally, the required arguments, listed above inside angle brackets, include the name of the file containing the spectra to be identified and the name of a peptide index produced previously by the tide-index command.

All of the available options are described for each command on the documentation pages. You can also get a list of available options by running a command with no arguments. For example, the command

$ crux tide-search

will produce output that looks like this:

bash-3.2$ ~/proj/crux/trunk/src/c/crux tide-search
FATAL: Expected at least 2 arguments, but found 0

USAGE:

   crux tide-search [options] <tide spectra file> <tide database index> 

REQUIRED ARGUMENTS:

  <tide spectra file> The name of the file from which to parse the
  fragmentation spectra, in any of the file formats supported by ProteoWizard.
  Alternatively, the argument may be a binary spectrum file produced
  by a previous run of crux tide-search using the store-spectra parameter.
  Multiple files can be included on the command line (space delimited), prior to
  the name of the database.
  <tide database> Either a FASTA file or a directory containing a database index
  created by a previous run of crux tide-index.

OPTIONAL ARGUMENTS:

  [--auto-mz-bin-width false|warn|fail]
     Automatically estimate optimal value for the mz-bin-width parameter from
     the spectra themselves. false=no estimation, warn=try to estimate but use
     the default value in case of failure, fail=try to estimate and quit in case
     of failure. Default = false.
  [--auto-precursor-window false|warn|fail]
     Automatically estimate optimal value for the precursor-window parameter
     from the spectra themselves. false=no estimation, warn=try to estimate but
     use the default value in case of failure, fail=try to estimate and quit in
     case of failure. Default = false.
  [--compute-sp T|F]
     Compute the preliminary score Sp for all candidate peptides. Report this
     score in the output, along with the corresponding rank, the number of
     matched ions and the total number of ions. This option is recommended if
     results are to be analyzed by Percolator.If sqt-output is enabled, then
     compute-sp is automatically enabled and cannot be overridden. Note that the
     Sp computation requires re-processing each observed spectrum, so turning on
     this switch involves significant computational overhead. Default = false.
  [--concat T|F]
     When set to T, target and decoy search results are reported in a single
     file, and only the top-scoring N matches (as specified via --top-match) are
     reported for each spectrum, irrespective of whether the matches involve
     target or decoy peptides. Default = false.
  [--deisotope ]
     Perform a simple deisotoping operation across each MS2 spectrum. For each
     peak in an MS2 spectrum, consider lower m/z peaks. If the current peak
     occurs where an expected peak would lie for any charge state less than the
     charge state of the precursor, within mass tolerance, and if the current
     peak is of lower abundance, then the peak is removed.  The value of this
     parameter is the mass tolerance, in units of parts-per-million.  If set to
     0, no deisotoping is performed. Default = 0.
...
		

The first three lines are telling you that you forgot the required arguments and are reminding you what they are. The following lines list all the options (only five of which are shown above). Crux options all begin with two dashes followed by the option name. The name is followed by a space and the appropriate argument. This example increases the verbosity to 40:

$ crux tide-search --verbosity 40 demo.ms2 small-yeast.fasta

Specifying options via a parameter file

One of the options listed, --parameter-file, is available for all Crux commands. The parameter file allows multiple options to be specified in a file. All of the command line options can be put in a parameter file, but the format is slightly different. In the parameter file, the two leading dashes are removed from the option name, and the option name and value must be separated by an equal sign instead of a space:

<option name>=<argument>

The above example, in which we changed the verbosity, would look like this in a parameter file:

verbosity=40

The parameter file allows only one option per line. Lines beginning with "#" are considered comments and are ignored. A sample parameter file can be found here. Command line and parameter file options may be used separately or together. If an option is specified in both places, then value on the command line will be used.

During execution of any Crux command, a parameter file containing the name and value of all the options for the current operation will be automatically be saved in the output directory. Note that not all parameters in the file may have been used in the operation. The parameter file will be named <command>.params.txt, where <tag> is the name of the command that was executed.

Shared options

In addition to --parameter-file, Crux includes several other options that are shared across all, or nearly all, Crux commands.

Changing the indexing and searching parameters

Various options to tide-index control how the proteins in the database are converted to peptides. These options fall into several categories, allowing specification of peptide properties such as minimum and maximum length, enzymatic digestion rules, decoy database generation and specification of various static and variable modifications. These options are fully documented here. Similarly, the tide-search documentation describes options for selecting which spectra to score, the rules for selecting candidate peptides for a given spectrum, and for deciding what kinds of scores to report.