Release notes for Crux
Version 3.2May 20, 2018
- Added the residue-evidence (res-ev), residue-evidence p-value (res-ev p-value), and combined p-value score functions to tide-index and cascade-search, and modified assign-confidence and percolator to handle these score functions. More details are available in "Combining high resolution and exact calibration to boost statistical power: A well-calibrated score function for high-resolution MS2 data".
- Implemented the "average target-decoy competition" protocol in
assign-confidence. More details are available in "Progressive calibration and averaging for tandem mass spectrometry statistical confidence estimation: Why settle for a single decoy?".
- Added the localize-modification command to find the most likely locations of post-translational modifications on peptides from a set of peptide-spectrum matches.
- Improved handling of modifications in tide-index, allowing for indices containing many different types of modifications.
- Added 'equal-I-and-L' option to tide-index and spectral-counts commands. This directs the commands to treat I and L as equivalent amino acids.
- Percolator and make-pin can now accept multiple files on the command line.
- Updated Comet to version 2018.01 rev. 0.
- Implemented improvements to I/O speed for MzIdent format.
- Crux distributions are now identified using an abbreviated unique string from the version control system.
- Added support for mzML to qranker and barista.
- Modified tide-search to reports peak counts separately for each thread.
- Added "deisotope" option to tide-search.
- Added "max-charge-feature" option to make-pin so that the user can control how many charge features are created.
- Improved tide-index so that it will automatically recognize when too many temporary files will be required and to index modified peptides using an alternate method.
- Added the "mod-precision" option to control the number of digits used to represent post-translational modifications in various output files.
- Modified cascade-search so that the q-value threshold is not applied to the final database in the cascade.
- Tide-index will now clip methionines at the begining of proteins that do not have an internal clevage site.
- Enabled search-for-xlinks to search multiple spectra files, in any of the file formats supported by Proteowizard. Support for the
- Added the print-progress, concat, and mono-link options to search-for-xlinks and simplified the mod option.
- Changed the tide-search precursor-window option upper bound from 100 to 1,000,000,000 to allow open modification searching.
- Added PepXML support for mod_nterm_mass and mod_cterm_mass on modification_info.
- Removed the
feature-file-inoption to Percolator and added
- Improved handling of the istopic-mass parameter in search-for-xlinks.
- Allow amino acids J (leucine or isoleucine), O (pyrrolysine, and U (selenocysteine) with the
- Added target/decoy column to tide-search and search-for-xlinks output files.
- Fixed mstoolkit to allow building with more recent versions of GCC.
- Fixed bug in tide-search that used total candidate count in output, rather than target candidate count.
- Fixed the percolator picked-protein option.
- Fixed error in handling of the custom-enzyme option.
- Fixed subtract-index so that it performs the subtraction only on the target peptides, and then removes for each subtracted target its corresponding decoy. Previously, the command did a simple subtraction separately on the targets and on the decoys.
- Fixed error that caused subtract-index to ignore the mass-precision option.
- Fixed error in search-for-xlinks that allowed too many missed cleavages.
- Fixed error in search-for-xlinks related to average mass.
- Fix an error in the processing of the peptide-list option to subtract-index.
- Fixed error in search-for-xlinks that attempted to use reversed decoys. search-for-xlinks only supports shuffled decoys.
- Numerous minor bug fixes and improvements to the documentation.
- Added 'How can I search my isotopically labeled data with Tide or Comet' to the FAQ.
- Added 'How do Tide and Comet handle combinations of static and variable modifications, as well as n-terminal modifications?' to the FAQ.
- Updated tutorial for test-fdr in percolator.
Version 3.1January 10, 2017
- Updated Crux to always use the latest version of Percolator on GitHub, rather than the most recent tagged release.
- Added the Param-medic command to perform automated inference of precursor and fragment m/z parameters. Param-medic options are also available for Comet and Tide.
- Added the isotope-error parameter to Tide.
- Added support for several new options to Percolator (search-input, tdc, subset-max-train).
- Allow assign-confidence to rank by Percolator score.
- Added +1 to the numerator of the FDR estimation when carrying out target-decoy competition. This correction was proposed by Barber and Candes, Annals of Statistics, 2015, as well as by He et al., arXiv:1501.00537, 2015.
- Fixed incorrect default value for the cpos parameter to percolator.
- Many other minor bug fixes and enhancements.
Version 3.0August 1, 2016
tide-searchcommand now supports threading. See the
num-threadsparameter for details.
- Added the
pipeline command to run a series of commands, and the
cascade-searchto run searches across a series of databases.
- Renamed the command
assign-confidenceand completely revamped its functionality.
search-for-xlinkscommand offers parameters to control categories of candidates:
- Added the
peptide-centric-searchoption to tide-search.
- Compiling the Mac OS version now requires OS X Yosemite. It can be built from source using the Clang compilers distributed with the latest version of XCode.
- 32-bit and 64-bit Windows versions are now available, built using Visual Studio 2013.
- Updated Percolator to version 2.10.
- Updated Hardklor to version 2.30.
- Updated Comet to version 2016.01 rev. 1.
- The Windows version of Crux can be built with support for vendor specific file formats disabled.
- Added the
- The utilities
psm-convertwere added to Crux.
generate-peptidesapplication has been revamped.
- Fixed a bug in
baristathat occurred when the number of peptides exceeded the number of PSMs.
- Precursor mass selection is fixed in
tide-searchwhen units are m/z.
bullseyenow use standard logging.
- Version number added to log files. Additionally, the version numbers of
boostare shown in the output of the
- Short usage message is displayed on error (the full usage message is still displayed when a command is entered with no arguments).
feature-in-fileparameter added to
percolatorhas been renamed to
cometexits with return code 1 on failure, rather than 0.
- Default value for
- Default value for
xlink-score-methodparameter added to
- deltaLCn column added to
- Bug fixed in reporting flanking amino acids in
Mix-maxprocedure is updated to handle ties among target and decoy scores.
- Default value of
Exact p-valuecalculation takes into account the flanking and neutral loss peaks.
- A bug was fixed in the naming of the Percolator input (.pin) file that is produced by Comet when the
output_percolatorfileoption is turned on. Previously, the file was named "comet.tsv"; now it is "comet.target.pin."
Assign-confidencecan take multiple input files.
cometcommand can take multiple input files.
sidakoption has been added to
assign-confidenceto perform a Sidak adjustment.
peptide-leveloption has been added to
- The refactored XCorr score has been re-scaled (by dividing by 20) to put it into a range that is comparable to that of the original XCorr score.
- The default value of
mz-bin-offsethas been changed from 0.68 to 0.40.
percolatorcommand outputs the
poutXML format again (controlled by the
percolatorcommand can output its native output by using the
- Fixed a bug in
tide-searchcausing it to fail when
remove-precursor-peakremoved all peaks in a spectrum.
tide-searchhas been introduced.
- Fixed a bug in candidate peptide selection when m/z tolerance is used.
- Added the
read-tide-indexcommand, which reads an index produced by
tide-indexand prints a list of peptides it contains.
- Added the
stop-afteroption to print-processed-spectra, which controls the point at which to stop preprocessing.
tide-searchcommand now accepts multiple spectrum input files.
- Added the
use-z-lineoption to specify whether precursor information is taken from the S or Z line when parsing MS2 files using ProteoWizard.
- Added missing options to the
tide-searchcommand may now accept a FASTA database in place of the index, in which case
tide-indexwill be run prior to the search. The
store-indexoption allows the generated index to be saved.
assign-confidencecommand will automatically detect score type if the
scoreparameter is not specified, searching for xcorr, e-value, or exact p-values.
cascade-searchwill report the source index database of the peptide for all identified PSMs.
top-matchparameter has been removed from
Assign-confidencehas got new options: "combine-modified-peptide" and "combine-charge-states".
Assign-confidencenow supports top-match>1 in peptide-level filtering mode.
Assign-confidencecan use smoothed-p-value scores for PSM ranking.
Assign-confidencenow does not carry out target-decoy competition in peptide-level filtering mode.
- Added the
- Rearranged estimation methods in
- Added top-match option for
- Modified assign-confidence so that ties during target-decoy competition are broken randomly.
- Changed the Percolator output feature "file" to contain the name of the file containing the spectrum (if available), rather than the name of the file containing the PSM.
- Added to
xlink-score-spectrumparameters of the form "use-a-ion" for a, b, c, x, y, and z-ions.
- Various other minor bug/leak fixes and performance enhancements.
- Added the
allow-dupsoption to tide-index.
Version 2.1October 8, 2014
tide-searchcommand now supports calculation of exact p-values via dynamic programming, as described in this article. The p-value calculation is controlled with the
search-for-xlinksnow supports variable modifications and larger protein databases. This new code is disabled by default, but can be turned on by setting the parameter
- The default value of
mz_bin_offsethas been changed to 0.40 from 0.68.
centroidedparameter for the
hardklorcommand is now implemented.
spectrum-formatparameter for the
bullseyecommand works properly.
tide-searchcommand now uses information from the Z line rather than the S line when reading spectrum files in the MS2 file format.
tide-searchcommand now accepts the
spectral-countsnow only accepts fasta files. Parsing of a protein index is no longer supported.
- A bug was fixed in the
spectral-countscommand that prevented
tide-searchoutput files from being parsed correctly.
default-directionparameter now takes a string value of the feature name rather than an integer.
- The parameters
mz-bin-offsethave been added to
- Percolator output now includes, for each PSM, the name of the file in which the spectrum resides.
- The parameters
min-modshave been added to
- The parameters
use-neutral-loss-peakshave been added to
- The calculation of XCorr was simplified in Tide. In the new version,
tide-indexonly generates peptides and sorts them, leaving the generation of theoretical peaks to be done entirely on the fly by
tide-searchcommand now prints search progress reports both to the screen and to the log file. The interval at which progress is printed can be controlled using the
calibrate-scorescommand now outputs files with the stem "calibrate-scores" rather than "qvalue."
use-flanking-peaksparameter now has a default value of false.
- The Comet search engine was updated to version 2014011.
- The MSToolkit parser was updated to the latest version (r73).
versioncommand now outputs the revision number.
- Various improvements in performance and error handling were implemented.
Version 2.0June 6, 2014
- Two new search engines are now included in Crux: Comet and Tide. The old search engine, search-for-matches, has been retired.
- Percolator has been updated to version 2.07.
- Crux now compiles in native Windows, rather than requiring Cygwin. Consequently, Crux running under Windows can parse vendor proprietary formats using the appropriate Proteowizard libraries.
- The "enyzme" parameters "lysc", "lysn", "arg_c", "glue_c", and "pepsin_a" were renamed to "lys-c", "lys-n", "arg-c", "glue-c", and "pepsin-a".
- Percolator outputs decoy files in addition to target files.
- Percolator tab-delimited peptides output now contain a posterior error probability (PEP) column.
- The Percolator tab-delimited peptide output has been corrected to show the PSM with the best score, rather than the worst.
- The Percolator tab-delimited PSMs output "matches/spectrum" value has been corrected.
- A new utility command,
make-pin, is provided to create input files for use by Percolator.
spectrum-max-massoptions have been renamed to
spectrum-parseroption was added to
- A bug was fixed in
baristawhen reading peptides with modifications.
- Fixed the
q-rankertab delimited output so that, for a given peptide, flanking amino acids are reported for each protein that contains that peptide, rather than only reporting one set of flanking amino acids.
- A bug was fixed in
percolator, wherein the first and last two characters were being truncated from peptide sequences when the input was missing the flanking amino acids or when the charge state was not indicated in the PSM IDs.
- The crux spectrum parser has been removed.
- Q-ranker now outputs decoy files.
- The Boost.Random library is now being used for random number generation.
calibrate-scoresno longer requires a protein input.
calibrate-scoresnow only takes the top match per spectrum and charge.
get-ms2-spectrumnow prints z-lines with Bullseye files.
- Various performance and error handling improvements.
Version 1.40May 22, 2013
- Crux Percolator was updated from version 1.05 to the latest release, version 2.04.
The formula that is used to convert fragment m/z values from real numbers into integers was modified. Previously, the conversion was
floor( (x / mz-bin-size) + 0.5 + mz-bin-offset )
The new conversion formula is
floor( (x / mz-bin-size) + 1.0 - mz-bin-offset )
The default values of the two parameters (mz-bin-size and mz-bin-offset) have not changed. The allowed range of the mz-bin-offset parameter was previously [-1,1], but has been changed to [0,1].
To understand the motivation for the changed formula, note that the mz-bin-offset parameter controls the location of bin edges, relative to integer masses on the mass scale. In general, mz-bin-offset should be chosen so that bin edges fall between the expected clusters of fragment masses. For fragments with 1+ charge, masses will cluster near integer values (after dividing by mz-bin-width), and the ideal value of mz-bin-offset would be 0.5. For mixtures of fragments with 1+ and 2+ charges, masses will cluster near integer and half-integer values, so mz-bin-offset near 0.25 or 0.75 would produce bin edges that best avoid the clusters of fragment masses. The old conversion formula did not correctly locate bin edges, given a chosen mz-bin-offset. The new conversion formula produces a proper translation of mz-bin-offset into bin edge locations.
seedparameter is now available to control the seeding of random number generator. The default value is 1.
seedis available as a parameter from all applications. It can be set as a command line option for
- The default value of the
crux search-for-matcheswas incorrectly documented. The default is actually
peptide-shuffle, but the documentation said the default was
protein-shuffle. The documentation has been corrected.
- The incorrectly labeled column header
xcorr rankin barista and q-ranker feature files has been corrected to
- The q-ranker output file
qranker_output.xmlhas been renamed to
- The deltaCn computation has been modified to be consistent with Percolator. Rather than usingdeltaCni = (xcorr1 - xcorri+1) / xcorr1
it is now computed using
deltaCni = (xcorri - xcorri+1) / max(xcorri, 1.0)
- Percolator now accepts inputs in SQT, PepXML, and tab-delimited formats in addition to the PinXML format.
- Flags were added to several commands allowing the user to control whether various results files are created in the output directory:
By default, all of these flags are set to false except
--mzid-output <T|F>(search-for-matches, percolator)
--pepxml-output <T|F>(search-for-matches, q-ranker, barista, percolator)
--txt-output <T|F>(search-for-matches, q-ranker, barista, percolator)
Version 1.39October 6, 2012
- Crux is now released under an Apache license for all users.
- Parsing MS/MS spectra is now supported using Proteowizard (v. 3.0.3950). This allows Crux to handle MS/MS spectra in mzML (1.0 and 1.1), mzXML, MGF, MS2 and CMS2 formats. Use
spectrum-parser=pwizto enable Proteowizard parsing.
- Crux now uses
cmakefor the build process rather than
- The Tide database search software is no longer distributed as part of Crux because its license is not compatible with the new Crux license.
- Support for PIN XML output was added to
- Fixed a bug in
crux create-indexon Windows and Cygin that caused the program to fail with the message:WARNING: Cannot rename directory FATAL: Failed to create index
- The parameter
protein databaseis now an option rather than a required argument for
- mzIdentML file support has been added for
- PepXML file support has been fixed for
- Fixed bug in packedNorm() that sometimes generated incorrect values for the posterior error probability.
- Fixed bug in
crux create-indexon Windows and Cygwin that caused it to fail on Windows and Cygin.
Version 1.38July 20, 2012
- Hardklor and Bullseye, tools for analyzing high-resolution precursor spectra, are now incorporated into Crux.
- Barista has been modified extensively so that it more closely resembles the other tools in Crux:
- Barista and Q-ranker accept search results in tab-delimited format, in addition to SQT format.
- Barista now reports posterior error probabilities, in addition to q-values, for PSMs, peptides and proteins.
- Barista's pep.xml output has been updated to be compatible with the TransProteomic Pipeline (TPP).
sequest-searchcommand has been removed. The functionality of this command is still available using options for the
search-for-matchescommand (see the frequently asked questions list for details).
crux predict-peptide-ionscalculates the mass shift resulting from the nh3 or h2o neutral loss correctly. Previously, a neutral loss resulted in an addition of mass rather than subtraction.
crux predict-peptide-ionsreports ion-type as 'b','y','a', or 'p' rather than a number (i.e. 0, 1, 2, 3).
crux predict-peptide-ionsnow only allows one type of neutral loss modification per ion. Previously, multiple different types of neutral loss could be applied simultaneously to a single ion.
neutral-lossesoption was removed from
crux predict-peptide-ions, due to its redundancy with the supported the
crux predict-peptide-ionssupports 'bya' in the
--primary-ionsparameter, which will generate the a-ion series in addition to the 'b' and 'y' ions.
- The spectrum parsing is updated so that peaks with zero intensity are ignored.
crux spectral-countsnow has an option to compute the raw spectral count.
crux spectral-countsnow has an option to compute dNSAF values.
- A bug in
crux spectral-countswas fixed so that it now properly normalizes the NSAF value.
- Multiple missed cleavages are now handled by
crux search-for-xlinksusing the missed-cleavages parameter. Previously, this was unsupported.
- Ambiguous amino acids are handled differently. 'J' is interpreted to indicate 'I' or 'L'. Peptides containing B, Z, and X are no longer be allowed, and a warning is issued when these amino acid codes are encountered.
- The stand-alone
crux-generate-peptidesapplications have been integrated into the main
- A bug has been fixed that affected the intensity adjustment of observed spectra when the bin width was set much larger or smaller than 1.
- Beginning with version 1.37, q-ranker erroneously included decoy PSMs along with target PSMs in the q-ranker.target.psms.txt file. All PSMs were sorted by q-ranker score so the targets and decoys were mixed together with no accompanying labels. This bug has been fixed so that Q-ranker once again returns only target PSMs.
- The search option
display-summed-masseshas been replaced by
mod-mass-format. With this option, there is a new format for reporting modificiations: the mass in the square braces is that of the preceeding amino acid plus the modification mass. This format is compliant with TPP pep.xml output.
- Modifications are reported differently in the pep.xml header to be compliant with the TPP.
- The command
crux compute-q-valuesnow reports posterior error probabilities in addition to q-values. Consequently, the command was renamed
- Crux now provides an explicit error message when an incorrectly formatted MS2 file is parsed. Previously, crux simply reported that no spectra were found.
- Several additional scores were added to the
- Fixed the build procedure to work on both Lion and pre-Lion versions of OS X.
- Fixed a bug in the normalization of the observed spectra. Previously, after the 10-bin normalization of intensities, peaks with heights below 5% of the maximum were not being removed.
- When a fragmentation spectra file is read, the assumed peptide charge state information can be missing for the scans. In cases where this information has not been provided, Crux will now estimate the possible charge states.
- The feature files for Q-ranker and Barista now contain a header describing each feature.
- Q-ranker and Barista now output all of the fields provided in the input search files.
Version 1.37December 22, 2011
crux barista, a tool for inferring protein identifications.
crux q-rankerwith new outputs and command-line syntax.
- Percolator and compute-q-values now compute and report posterior error probabilities, in addition to q-values.
crux create-indexnow generates and store decoy peptides in the index. Decoys can still be generated on-the-fly when searching .fasta files.
- A pair of options,
nterm-fixed, allow fixed terminal peptide modifications, i.e., a mass shift applied to every peptide on either or both termini.
- A bug in the spectrum processing code was fixed, which was erroneously eliminating a few of the highest m/z fragment peaks. Xcorrs have changed slightly as a result.
- Simplified the
--mz-bin-widthand --mz-offset parameters so that
--xcorr-var-binis no longer needed.
- Indexes now store information about any static mods used when they were created.
- In the tab-delimited output files the 'matches/spectrum' column gives the number of target peptides compared to the spectrum and the decoy file has an additional column, 'decoy matches/spectrum' that gives the number of decoy peptides compared to the spectrum. These two numbers may differ when decoys are not generated on-the-fly.
- The header lines of Barista and QRanker include all of the columns in the search results.
Version 1.36August 4, 2011
- Added support for user-specifiable variable and static amino acid modifications.
- Tide's version of XCorr has been modified so that it is always identical to those of a recent version SEQUEST®.
- Added support for user-specifiable maximum number of missed enzyme cleavage points.
- Tide now performs basic charge state inference when this information is missing from the spectrum input files.
- User-specifiable output formatting, using new the
- Text format with user-specifiable fields.
- SQT format.
- PepXML format.
- Choose whether to display all proteins in which each peptide occurs or just one representative.
- Note that new file formats are created by
tide-index. Index files created by previous versions of Tide should not be used with the new version.
- Q-ranker now reports q-values instead of false discovery rates.
- Corrected several errors in the pep.xml output, related to reporting the spectrum filename, charge, and neutral mass.
- Changed the
missed-cleavagesoption to allow specifying the maximum number of missed cleavages in a peptide rather than specifying none/any.
- Fixed a bug that did not print the spectrum file name correctly in the pep.xml files.
- Improved the error message produced by
crux spectral-countswhen the input PSM file does not contain the required q-values.
- Previously, Crux's theoretical spectrum includes two flanking peaks around each b- and y-ion. The new Boolean flag
use-flanking-peaks, allows these to be either on or off. Flanking peaks are by default in
crux sequest-searchbut are not used by default in
Version 1.35March 23, 2011
- Added the
crux spectral-countscommand for estimating protein quantification.
- The spectrum neutral mass is now calculated from the m+h value provided by the Z-lines of an MS2 file. Also, multiple z-lines of a spectrum that have the same charge are now supported. With these changes,
crux sequest-searchcan now take advantage of accurate precursor masses in MS2 files generated by Bullseye (Hsieh et al. J. Proteome Res. 9(2):1138-43, 2010).
- Made the
pep.xmlfile header produced by Crux compatible with the Transproteomic Pipeline.
- Introduced several changes to improve performance on MacOS X.
- Added an
ntermoption to the link argument of
Version 1.34December 22, 2010
- The default value of the "pi-zero" parameter has been changed from 0.9 to 1.0.
- A new option
--peptide-listhas been added to
crux create-index. When set to T, this option causes an ASCII file of peptides to be included in the output directory. Each line of the file lists the peptide sequence and its neutral mass.
- The default bin offset was returned to 0.68. It had erroneously been changed to 0 in version 1.33.
- A new option
--compute-sphas been added to
crux search-for-matches. When set to true, all candidate peptides will be scored by Sp in addition to xcorr. This option is recommended for generating input to
max-ion-chargeparameter to designate the maximum charge for theoretical ions in
- Modifications may optionally indicate if they prevent cleavage or prevent cross linking.
- Results are printed to .pep.xml files in addition to .txt files.
- Added a
mass-precisionoption which sets how many digits are written for all mass and m/z values in .txt, .sqt, and .ms2 files.
- The columns printed to the .txt files now vary with the
cruxcommand being run and the settings so that unused columns are not printed.
- Fixed a bug that limited what files were used as input for post-search commands (e.g.
crux percolator) based on fileroot. Now files with different fileroots can be analyzed together.
- Fixed a bug that caused searches using fasta files to search peptides multiple times if they appeared in a file multiple times.
Version 1.33July 7, 2010
- The command line syntax for
q-rankerwas modified to allow the programs to read from and write to separate directories.
- A bug in q-ranker and percolator was fixed. This bug was introduced several versions ago, and it led to erroneously many PSMs receiving very small q-values. The problem relates to the scaling of the delta Cn feature, and the fix is a temporary stopgap: we remove the feature entirely. The next version should have the corrected feature in place.
- A new option
--min-peakshas been added to set a filter for the minimum number of peaks a spectrum must have in order for it to be searched. The default minimum is 20 peaks.
- The hardcoded limit for the number of proteins allowed in a database has been removed.
- The hardcoded limit for the number of spectra and the number of peaks/spectrum has been removed.
- A new option
--xcorr-var-binhas been added to toggle the new binning of the m/z axis. The default bin offset was returned back to 0, but the default bin width is still 1.000508. (Note: this erroneous change was subsequently reverted in version 1.34.)
- The option
--max-ion-chargenow places a limit on the maximum charge state of the ions generated for the xcorr theoretical spectra for
crux search-for-xlinks. This change is also supported by
- The option
--no-xvalwas added to q-ranker, allowing faster execution by skipping the internal cross-validation to select hyperparameters.
- The precision of modification masses written to the .txt files now matches the maximum precision given in the parameter file.
prevents cross-linkhave been added to variable modifications and support has been provided for the
Version 1.32July 6, 2010
- Version 1.32 includes changes only to Tide. A new binary is introduced — a modified version of ProteoWizard's msconvert program — that is capable of converting spectrum data from a wide range of file formats into Tide-readable .spectrumrecords input files.
- tide-import-spectra is removed from the distribution. Although it was faster than msconvert, it was specific to ms2 files, whereas msconvert now works with any ProteoWizard-supported file format.
- A standalone utility for reading binary .spectrumrecords files is introduced, called read-spectrumrecords. This program is useful, for example, for visually checking the output of msconvert.
- Demo input files worm-06-10000.spectrumrecords and yeast-02-10000.spectrumrecords now replace their counterparts worm-06-10000.ms2 and yeast-02-10000.ms2. With the introduction of Tide-compatible msconvert, ms2 files are in no way special to Tide anymore. Any ProteoWizard-supported file format can be converted to the .spectrumrecords format for use with Tide. Starting with data files in Tide's spectrumrecords format eliminates the conversion step in the Tide search demo.
- The old demo scripts worm-demo.sh and yeast-demo.sh have each been replaced by a pair of new scripts that demonstrate the indexing and searching steps separately. This is in order to highlight the two-step process and the fact that indexing needs to be done just once for each fasta file, whereafter the index may be reused indefinitely to perform searches.
Version 1.31June 2, 2010
- This version of Crux is released with a demo version of a new search engine, Tide. Tide is an independent reimplementation of the SEQUEST® algorithm. The immediate ancestor of Tide is the Crux
search-for-matchescommand, but Tide has been completely re-engineered to achieve a thousandfold improvement in speed while exactly replicating Crux XCorr scores. Currently, Tide is not fully integrated with Crux, and is available only in binary executable format.
- Two options have been added to
--mz-bin-offset) to allow control of the binning of the m/z axis.
ion-toleranceoption was removed from the search tools.
- The default location of the left edge of the first bin along the m/z axis is 0.68, rather than 0.0. This change makes bin edges less likely to fall near fragment peak locations. The default bin width has also changed from 1.001141 to 1.000508
- The log file now includes information about the date and time that the command was issued, the name of the computer on which the command was run, and the elapsed wall clock time at the end of the run.
- The command line option
--version Thas been replaced with a command
crux versionthat prints the version number to standard output and then exits.
- The header for the feature file produced optionally by
crux q-rankernow includes names for the first two columns, scan number and label (i.e. target or decoy peptide).
Version 1.30May 5, 2010
- A new command,
xlink-search, has been added. This command searches a collection of spectra against a sequence database, finding cross-linked peptide matches. The algorithm was described in the following article:Sean McIlwain, Paul Draghicescu, Pragya Singh, David R. Goodlett and William Stafford Noble. "Detecting cross-linked peptides by searching against a database of cross-linked peptide pairs." Journal of Proteome Research. 2010.
- A new search command,
sequest-search, was added. The original command,
search-for-matches, behaves as before except that no .sqt files are printed and no Sp scoring is performed. The new
sequest-searchemulates SEQUEST® searching. It first scores all candidate peptides with the Sp score, then ranks and filters the results by that score, scoring the remaining candidates with xcorr. Results are printed to .txt and .sqt files.
- The .mzXML file format is now supported for crux search-for-matches and crux sequest-search when the
--use-mstoolkitoption is set to TRUE.
- The .csm output files from
search-for-matchesare no longer produced and post-search operations take .txt files as input.
precursor-window-typefor selecting windows of type mass, m/z, or ppm.
feature-fileoption is now true/false instead of taking a file name. The file is named '<fileroot>.<qranker| percolator>.features.txt'. A header with the column names was added to the file.
- The new option,
--scan-number, performs a search on a specified subset of the spectra in the given file.
- When only one decoy file is produced, the name is now 'decoy.txt' instead of 'decoy-1.txt'
- In the .txt files, protein names are now followed by the start index of the peptide.
- Modified sequences are reported differently in the .txt files. Instead of modifications being represented by symbols (*,@,#, etc.) they are indicated with the mass shift of the modification within square brackets. As before, the modification information follows the residue that is modified. If multiple modifications appear on one residue, the masses may either be summed together or printed separately in a comma separated list. This behavior is controlled by the
- The modified decoy sequences are generated differently. Before, all modified peptides were generated and each one was shuffled to create a decoy peptide. Now a peptide is shuffled once for each peptide_mod and all modifications are applied to that same shuffled peptide.
- The 'percolator rank' column is now based on percolator scores instead of percolator q-values. For cases where two PSMs have different percolator scores but the same q-value, the ranks will reflect the score differences.
- The Weibull parameters used for computing p-values are now printed to the .txt files.
- The reporting of DeltaCn has changed. For PSMs of rank i, deltaCn used to computed be as deltaCn_i = (xcorr_0 - xcorr_i) / xcorr_0. Thus, deltaCn for the top-ranked PSM was always 0. Now deltaCn_i = (xcorr_0 - xcorr_i+1) / xcorr_0. This change only affects
sequest-search. It also means that
percolatoruse the correct DeltaCn values.
- A new command,
print-processed-spectra, performs XCorr-style pre-processing on all spectra in a given file.
- A small bug was fixed, in which even if the user requested average mass, the search programs still used monoisotopic mass in the calculation of XCorr and Sp.
- Protein fasta files are now parsed protein-by-protein so that parsing large files does not use excessive amounts of memory.
- A limit was removed on how many peptides a protein can produce.
- Crux is now built with g++ instead of gcc.
Version 1.22September 16, 2009
- Crux is now distributed in two versions, a full version that is covered by the same type of license as before (free to non-profit users, and via a licensing fee to commercial users), as well as a stripped-down version that is released under an open source license. The stripped-down version does not include the database search functionality but does include all of the post-processing tools. We are unable to release the entire Crux package under an open source license due to intellectual property issues. Both versions of Crux are available via the Crux web page: http://noble.gs.washington.edu/proj/crux/.
- A new tool, q-ranker, is available for estimating peptide-spectrum match q-values. This tool was described in the following article:Marina Spivak, Jason Weston, Leon Bottou, Lukas Käll and William Stafford Noble. "Improvements to the Percolator algorithm for peptide identification from shotgun proteomics data sets." Journal of Proteome Research.
- Version 1.05 of
percolatorhas now been integrated into the Crux source tree. A separate installation of
percolatoris no longer needed for basic
percolatorfunctionality. Note, however, that
percolatorremains under active development. You may therefore wish to install the current, stand-alone version of
percolatorand run it separately to take advantage of new features.
- The internal normalization of the observed spectra has been modifed to drop those peaks whose intensity is less than 1/20 of the maximum intensity in the spectrum. This brings the xcorr score for
cruxinto closer agreement with the xcorr score for SEQUEST®.
- Compute-q-values now generates three different q-values (1) from p-values using an analytical null model, (2) from decoys and xcorr using an empirical null model, or (3) from decoys and p-values using an empirical null model. All three types of q-values are computed when p-values and decoys are present in the search results.
- A parameter file is now automatically written to the output directory.
- A log file recording messages sent to stderr has been added for
--use-mz-windowparameter is now available for
search-for-matches. When enabled, peptides must be within +/- 'm/z-window' of the spectrum m/z. The m/z-window value is taken from
- A numerical bug in the Weibull p-value calculation was fixed, which had previously caused occasional erroneous NaNs to be output.
- The Weibull estimated p-values generated by
search-for-matchesare now returned as p-values instead of as -log(p-value). The corresponding q-values returned from
compute-q-valuesare also now returned without the -log transform.
--precisionoption has been changed to control the total number of significant digits printed instead of the number of digits after the decimal point. The default precision has changed from 6 to 8.
- The parameters estimated for the Weibull distribution (used for computing p-values) now use the xcorrs from all PSMs for a spectrum instead of a random selection of 500.
- The estimation of Weibull distribution parameters requires a minimum number of scored PSMs. In the previous version, spectra with fewer PSMs than the minimum were not given a p-value. Crux will now generate extra decoys until there are enough scores.
- The p-values for decoy PSMs are now generated from the same Weibull distribution parameters as are used for the targets of the same spectrum.
Version 1.21May 14, 2009
- The output for
percolatorhas been revised extensively.
cruxwill now create a directory, and all output files will be created in that directory. By default the directory will be named
crux-output, but this can be changed using the new
The output files for
The output files for
The output files for
filerootoption has been added. This option is used to specify a string which will be added as a prefix to all output files.
- The option
cleavageswas replaced with two options,
enzymewhich specifies the name of an enzyme (e.g.trypsin) and
digestionwhich indicates the degree of specificity, partial or full digest. The full list of available enzymes is in the html docs and in the usage statement. See also
- The option
custom-enzymeallows users to define arbitrary digestion rules. This overrides the
enzymeoption. Syntax for the custom digestion rule is the same the syntax used by X!Tandem and is described in the html docs.
- The number of PSMs per spectrum printed to the output files is now controlled by one option,
top-match. This makes
- It is now possible to control how many decoy sequences are generated and in which file(s) they are returned. There is a new option,
num-decoys-per-target, which can be used to generate more than one shuffled peptide per spectrum. This replaces
- A new option,
decoy-locationhas been introduced. The three possible values are 'target-file' where all PSMs (target and decoys) are sorted together for each spectrum and returned in one file, 'one-decoy-file' where target PSMs are printed to one file and all decoys are printed to another, and 'separate-decoy-files' where there are as many decoy files as there are decoys per target.
- Protein names for decoy matches are now prepended with 'rand_' in the SQT files as in 'L rand_Y45678'.
- The option
unique-peptidesonly applies to
crux-generate-matches. Each peptide is stored in the index exactly once with references to all protein sources. Searches with fasta files print each peptide only once.
- The precision of the masses and scores printed to the sqt and text files can now be specified by the user. The default precision changed from 2 to 6.
- Search progress is now reported by printing every 10th spectrum that is searched. The verbosity can be adjusted with the parameter
- Decoy (shuffled) sequences now keep the first and last residue the same as the target sequence that was shuffled to produce it. This is a reversion to previous behavior.
- It is now possible to skip the Sp score and score all PSMs with xcorr. The default procedure is still to score all peptides for one spectrum with Sp, rank by Sp, and eliminate all but the best-ranking PSMs (by default, the top 500). The remaining PSMs are scored by xcorr, re-ranked by xcorr and the top results returned. By setting max-rank-preliminary=0, the Sp scoring is skipped and xcorr is computed for all PSMs.
- A new parameter
reverse-sequencecan be used to generate decoy peptides by reversing them rather than shuffling. The first and last residues are left unmoved. If the sequence is a palindrome , then a decoy will be generated by shuffling and a note to that effect will be printed at the DETAILED INFO level of output (verbosity = 40).
- P-values are now computed for decoy peptides.
- The algorithm used to calculated the xcorr score has been modified so that xcorr score will be in better agreement with scores generated by SEQUEST®.
Version 1.20January 6, 2009
- Generating peptides and searching with up to eleven different dynamic modifications is now possible. New options associated with this feature are mod, cmod, nmod, max-mods, max-aas-modified.
- The format of the .csm files has changed and files written by older versions of crux are not readable with crux version 1.2.
- When the option
cleavagesis set to
all, peptide generation ignores all tryptic cleavage sites, effectively setting the
TRUEregardless of user settings.
- When one spectrum has identical xcorr scores for different sequences, the rank of all those matches will be the same. Matches with the next highest score will rank one below.
- The options for setting the preliminary and primary score type have been removed and are fixed as Sp and xcorr, respectively. A new option,
compute-p-values=<T | F>, was added to control p-value computation.
- The SQT file contains the spectrum calculated mass instead of observed mass/charge on the S line.
- There is now a test for confirming that the file downloaded from the crux website was not corrupted. See installation instructions for details.
- Calculating a p-value requires a minimum of 40 matches. Spectra with fewer than 40 matches will have p-value scores returned as NaN and a warning will be printed at the DETAILED_INFO level (40) of verbosity.
- Fixed error in generating neutral-loss peaks created as part of the theoretical spectrum.
Version 1.02December 1, 2008
- Three programs,
crux-analyze-matches, were merged into one program named
- Percolator is now truly optional as all Crux programs will build without it.
- Fragment masses can now be calculated as average or mono-isotopic. This is controlled by the
fragment-massoption in the parameter file.
- The name of the score-type option that calculates p-values was changed from xcorr-logp to xcorr-pvalue.
- SQT files have two new lines in the header which describe the arrangement of values in the results.
- HTML documentation was updated to reflect the above changes.
October 15, 2008
- A bug limiting the length of the name of an index file was fixed.
- Modifications were made so that Crux will build with version 1.05 of Percolator. This is the only supported version of Percolator.
- Memory leaks in
--versionoption was added.
March 4, 2008Initial release