spectral-counts

Usage:

crux spectral-counts [options] <input PSMs>

Description:

Given a collection of scored PSMs, produce a list of proteins or peptides ranked by a quantification score. Spectral-counts supports four types of quantification: Normalized Spectral Abundance Factor (NSAF), Distributed Normalized Spectral Abundance (dNSAF), Normalized Spectral Index (SIN) and Exponentially Modified Protein Abundance Index (emPAI). The NSAF method is from Paoletti et al. (2006). The SIN method is from Griffin et al. (2010). The emPAI method was first described in Ishihama et al (2005). The quantification methods are defined below and in the following paper:

S McIlwain, M Mathews, M Bereman, EW Rubel, MJ MacCoss, and WS Noble. "Estimating relative abundances of proteins from shotgun proteomics data." BMC Bioinformatics. 13:308, 2012.

Protein Quantification

  1. For each protein in a given database, the NSAF score is:
    $$NSAF_N=\frac{S_N/L_N}{\sum_{i=1}^ns_i/L_i}$$
    where:
    • N is protein index
    • SN is the number of peptide spectra matched to the protein
    • LN is the length of protein N
    • n is the total number of proteins in the input database
  2. For each protein in a given database, the dNSAF score is:
    $$NSAF_N=\frac{\frac{uSpc_N+(d)sSpc_N}{uL_N+sL_N}}{\frac{uSpc_i+(d)sSpc_i}{uL_i+sL_i}}$$
    where:
    • N is the protein index
    • uSpcN is the unique number spectra matched to the protein index
    • sSpcN is the shared number peptide spectra matched to the protein index
    • LN is the length of protein N
    • n is the total number of proteins in the input database
    • d is the distribution factor of peptide K to protein N, given by
      $$d=\frac{uSpc_N}{\sum_{i=1}^nuSpc_i}$$
  3. For each protein in a given database, the SIN score is:
    $$SI_N=\frac{\sum_{j=1}^{p_N}(\sum_{k=1}^{s_j}i_k)}{L_N(\sum_{j=1}^nSI_j)}$$
    where:
    • N is protein index
    • pn is the number of unique peptides in protein N
    • sj is the number of spectra assigned to peptide j
    • ik is the total fragment ion intensity of spectrum k
    • LN is the length of protein N
  4. For each protein in a given database, the emPAI score is:
    $$emPAI=10^{\frac{N_{observed}}{N_{observable}}}-1$$
    where:
    • Nobserved is the number of experimentally observed peptides with scores above a specified threshold.
    • Nobservable is the calculated number of observable peptides for the protein given the search constraints.

Peptide Quantification

  1. For each peptide in a given database, the NSAF score is:
    $$NSAF_N=\frac{S_N/L_N}{\sum_{i=1}^ns_i/L_i}$$
    where:
    • N is the peptide index
    • SN is the number spectra matched to peptide N
    • LN is the length of peptide N
    • n is the total number of peptides in the input database
  2. For each peptide in a given database, the SIN score is:
    $$SI_N=\frac{(\sum_{k=1}^{S_N}i_k)}{L_N(\sum_{j=1}^nSI_J)}$$
    where:
    • N is the peptide index
    • SN is the number of spectra assigned to peptide N
    • ik is the total fragment ion intensity of spectrum k
    • LN is the length of peptide N

Input:

Output:

The program writes files to the folder crux-output by default. The name of the output folder can be set by the user using the --output-dir option. The following files will be created:

Options: