Crux Tab-delimited File Format

Tab-delimited file format

Various programs in Crux report their output and read inputs in tab-delimited text format. Each such file consists of a single line containing the tab-separated names of fields, followed by one or more lines giving the corresponding field values.

Name	Description	comet	tide	cascade-search	kojak	percolator			diameter search	spectral-counts
Name	Description	comet	tide	cascade-search	kojak	PSM	peptides	proteins	diameter search	spectral-counts
file	The name of the file containing the scan.		✓	✓					✓
scan	The identifying number of each scan.	✓	✓	✓					✓
charge	The charge state for this spectrum in this PSM.	✓	✓	✓	✓				✓
spectrum precursor m/z	The observed m/z of the spectrum precursor ion.	✓	✓	✓					✓
spectrum neutral mass	The computed mass of the spectrum precursor at the given charge state. This is equal to the precursor m/z minus the mass of a proton (1.00727646677 Da), all multiplied by the charge.	✓	✓	✓					✓
peptide mass	The mass of the peptide sequence, computed as the sum of the amino acid masses plus the mass of water (18.010564684 Da or 18.0153 Da, depending on whether we are using monoisotopic or average mass).	✓	✓	✓					✓
delta_cn	The normalized difference in XCorr for this PSM relative to the next ranked PSM for the same spectrum and charge. The denominator for normalization is the maximum of the current XCorr and 1.0. If `exact-p-value=T`, then the difference is computed between -log(p-value) rather than XCorr, and no normalization is applied.	✓	✓	✓					✓
delta_lcn	Similar to delta_cn, except that the difference is computed with respect to the lowest reported XCorr score for a given spectrum and charge state.		✓						✓
sp score	The SEQUEST-type preliminary score.	✓	✓	✓
sp rank	The rank of this PSM when sorted by Sp score. Note that, in `tide-search`, the Sp score is only computed for PSMs that are reported to the user. Hence, the rank of the Sp score will be in the range from 1 to n, where the value of n is determined by the `--top-match` parameter.	✓	✓	✓
xcorr score	The SEQUEST-type cross correlation score.	✓	✓	✓					✓
exact p-value	The p-value computed as described in "Computing Exact p-values for a Cross-correlation Shotgun Proteomics Score Function."		✓	✓
refactored xcorr	A discretized version of XCorr, used to compute the p-value.		✓	✓
res-ev p-value	The high-resolution p-value computed as described in "Combining High-Resolution and Exact Calibration to Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data."		✓	✓
res-ev score	The high-resolution score computed as described in "Combining High-Resolution and Exact Calibration to Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data."		✓	✓
combined p-value	The high-resolution p-value computed as described in "Combining High-Resolution and Exact Calibration to Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data."		✓	✓
tailor score	A calibrated version of the XCorr score, normalized by dividing by the 99th percentile of the XCorr scores for each spectrum, as described in "Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics."		✓	✓
xcorr rank	The rank of this PSM when sorted by xcorr.	✓	✓	✓					✓
res-ev rank	The rank of this PSM when sorted by res-ev score or by res-ev p-value.		✓	✓
combined p-value rank	The rank of this PSM when sorted by combined p-value.		✓	✓
b/y ions matched	The number of b-ions and y-ions in the peptide that have a corresponding peak in the spectrum.		✓
b/y ions total	The total number of b- and y-ions predicted for this peptide.	✓	✓
total matches/spectrum	The number of candidate peptides in the database found for this spectrum, including duplicates. Note that this is always the number of target candidate peptides, even if the PSM involves a decoy peptide.	✓
distinct matches/spectrum	The number of unique candidate peptides in the database found for this spectrum. Note that this is always the number of target candidate peptides, even if the PSM involves a decoy peptide.		✓	✓					✓
sequence	The peptide sequence.	✓	✓	✓					✓
modifications	Variable modifications applied to the sequence. This is comma-separated list of modifications in the format "position_code_massdiff", e.g. "3_V_15.9949". The "position" field is the position of the peptide residue, where position 1 is the first residue. A position of "0" denotes the previous flanking amino acid and a position of 1 greater than the peptide length denotes the following flanking amino acid. The "code" field can be "S" for a static modification or "V" for a variable modification. The "modifications" string can be appended with "_N" to denote an N-term protein modification, e.g. "1_S_-17.0265_N"; "_n" to denote an N-term peptide modification, e.g. "1_A_42.0146_n"; "_C" to denote a C-term protein modification, e.g. "9_R_356.1882_C"; or "_c" to denote a C-term peptide modification, e.g. "12_K_42.0106_c".		✓	✓					✓
cleavage type	The cleavage rules for generating this peptide based on the user-specified enzyme specificity.	✓							✓
unmodified sequence	The peptide sequence stripped all the modification related information.		✓	✓
protein id	A comma-separated list of proteins in which this peptide appears. Optionally, the protein name may be followed by a number in parentheses giving the start position of the peptide in the protein.	✓	✓	✓					✓	✓
flanking aa	The amino acids that precede and follow this peptide in the parent protein ID. If the peptide occurs in more than one protein, then this column will contain a comma-separated list of pairs of amino acids.	✓	✓	✓					✓
original target sequence	The unmodified target sequence. For a target PSM, the value in this column will be identical to the value in the "sequence" column. For a decoy PSM, this column will contain the corresponding target sequence.		✓
PSMId	Identifier of this peptide-spectrum match. If the PIN file was created by Crux, then the ID will be of the form target_0_8000_2_1, where the components are "target" or "decoy," the file index, scan number, charge, and PSM rank.					✓	✓
score	The discriminant score assigned by percolator.					✓	✓
filename	Name of the file in which the scan was found.					✓	✓	✓
q-value	The q-value assigned by percolator.					✓	✓	✓
posterior_error_prob	The posterior error probability assigned by percolator.					✓	✓	✓
peptide	The peptide sequence.					✓	✓
proteinIds	A comma-separated list of proteins in which this peptide appears.					✓	✓
proteinId	Identifier for this protein or protein group							✓
proteinGroupId	Identifier associated with this protein group							✓
peptidesIds	Peptides belonging to each protein in the group							✓
precursor intensity logrank M0	The log-rank of precursor intensity (for monoisotope) among all observed peaks in MS1 scan								✓
precursor intensity logrank M1	The log-rank of precursor intensity (for M + 1 isotope) among all observed peaks in MS1 scan								✓
precursor intensity logrank M2	The log-rank of precursor intensity (for M +2 isotope) among all observed peaks in MS1 scan								✓
rt-diff	The difference between observed and predicted retention time								✓
dynamic fragment p-value	Fragment matching p-value, as described in "MS Amanda, a Universal Identification Algorithm Optimized for High Accuracy Tandem Mass Spectra". When calculating the score, the m/z range (dependent on the observed peaks) is divided into 10 equal-length segments and the 10 most intense peaks within each bin are preserved.								✓
static fragment p-value	Similar to the static fragment p-value, except that the m/z range is set independently of the observed peaks.								✓
precursor coelution	The normalized dot product between the elution profiles of precursor monoisotope and M+1 and M+2 isotopes								✓
fragment coelution	The normalized dot product between the elution profiles of fragment ions								✓
precursor fragment coelution	The normalized dot product between the elution profiles of precursor isotopes and fragment ions								✓
ensemble score	The aggregated score calculated as the weighted sum of used features								✓
target/decoy	Is this peptide a target or a decoy? Value is "target" or "decoy."								✓
RAW	The raw (unnormalized) count of spectra per peptide									✓
SIN	A protein quantification score									✓
NSAF	The normalized spectral abundace factor									✓
dNSAF	The distributed normalized spectral abundace factor									✓
EMPAI	The exponentially modified protein abundance index									✓
parsimony rank	The protein rank based on its spectral-counts score.									✓
Scan Number	The identifying number of each scan.				✓
Ret Time	The retention time in minutes.				✓
Obs Mass	The observed mass of the precursor ion peak.				✓
PSM Mass	The theoretically computed precursor mass of the PSM.				✓
PPM Error	The difference (in parts per million) between the PSM mass and the Obs Mass.				✓
Score	The Kojak cross-correlation score.				✓
dScore	The difference between the reported PSM score and the next best PSM score.				✓
E-value	The overall E-value of the PSM.				✓
Peptide #1 Score	The Kojak xcorr score of the higher scoring peptide.				✓
Peptide #1 E-value	The E-value of the higher scoring peptide.				✓
Peptide #1	The peptide sequence of the higher scoring peptide.				✓
Linked AA #1	The site of the cross-link in Peptide #1				✓
Protein #1	The protein(s) that generated Peptide #1.				✓
Protein #1 Site	The site(s) of the cross-link in Protein #1				✓
Peptide #2 Score	The Kojak xcorr score of the lower scoring peptide.				✓
Peptide #2 E-value	The E-value of the lower scoring peptide.				✓
Peptide #2	The peptide sequence of the lower scoring peptide.				✓
Linked AA #2	The site of the cross-link in Peptide #2				✓
Protein #2	The protein(s) that generated Peptide #2.				✓
Protein #2 Site	The site(s) of the cross-link in Protein #2				✓
Linker Mass	The mass contribution of the cross-linker for cross-linked PSMs or zero otherwise.				✓