assign-confidence

Usage:

crux assign-confidence [options] <target input>

Description:

Given target and decoy scores, estimate a q-value for each target score. The q-value is analogous to a p-value but incorporates false discovery rate multiple testing correction. The q-value associated with a score threshold T is defined as the minimal false discovery rate (FDR) at which a score of T is deemed significant. In this setting, the q-value accounts for the fact that we are analyzing a large collection of scores. For confidence estimation afficionados, please note that this definition of "q-value" is independent of the notion of "positive FDR" as defined in (Storey Annals of Statistics 31:2013-2015:2003).

To estimate FDRs, assign-confidence uses one of two different procedures. Both require that the input contain both target and decoy scores. The default, target-decoy competition (TDC) procedure is described in this article:

Josh E. Elias and Steve P. Gygi. "Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry." Nature Methods. 4(3):207-14, 2007.

Note that assign-confidence implements a variant of the protocol proposed by Elias and Gygi: rather than reporting a list that contains both targets and decoys, assign-confidence reports only the targets. The FDR estimate is adjusted accordingly (by dividing by 2).

The alternative, mix-max procedure is described in this article:

Uri Keich, Attila Kertesz-Farkas and William Stafford Noble. "An improved false discovery rate estimation procedure for shotgun proteomics." Journal of Proteome Research. 14(8):3148-3161, 2015.

Note that the mix-max procedure requires as input calibrated scores, such as Comet E-values or p-values produced using Tide-s exact-p-value option.

The mix-max procedure requires that scores are reported from separate target and decoy searches. Thus, this approach is incompatible with a search that is run using the --concat T option to tide-search or the --decoy_search 2 option to comet. On the other hand, the TDC procedure can take as input searches conducted in either mode (concatenated or separate). If given separate search results and asked to do TDC estimation, assign-confidence will carry out the target-decoy competition as part of the confidence estimation procedure.

In each case, the estimated FDRs are converted to q-values by sorting the scores and then taking, for each score, the minimum of the current FDR and all of the FDRs below it in the ranked list.

If tide-index was used to create multiple decoys per target using the num-decoys-per-target and the estimation-method is set to tdc, then assign-confidence will automatically carry out the average TDC (aTDC) procedure, which aims to reduce decoy-induced variability in the FDR estimates produced by TDC. The aTDC procedure is described in

Uri Keich, Kaipo Tamura, and William Stafford Noble. "Averaging strategy to reduce variability in target-decoy estimates of false discovery rate." Journal of Proteome Research, 18(2):585-593, 2019.

A primer on multiple testing correction can be found here:

William Stafford Noble. "How does multiple testing correction work?" Nature Biotechnology. 27(12):1135-1137, 2009.

Input:

Output:

The program writes files to the folder crux-output by default. The name of the output folder can be set by the user using the --output-dir option. The following files will be created:

Options: