SQT file format
The SQT file format is used to record the matches between MS/MS spectra and a sequence database. A full description of the SQT file format may be found in:McDonald, W.H. et al. "MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications." Rapid Communications in Mass Spectrometry. 18, 2162-2168 (2004).
An SQT file consists of a header followed by one or more spectrum matches. The header and match data are broken into records, one record per line. Fields within a record are separated by white space.
A sample SQT file may be found here.
Each line in the header must begin with an
This is followed by a field label, and then a field value,
all separated by white space.
The field label must be one of the labels listed below,
while the field value can be an aribrary string.
A typical header is shown below.
H SQTGenerator Crux H SQTGeneratorVersion 1.0 H Comment Crux was written by... H Comment ref... H StartTime Tue Dec 2 15:22:50 2008 H EndTime H Database test_crux_index/test-binary-fasta H DBSeqLength ? H DBLocusCount 4 H PrecursorMasses average H FragmentMasses mono H Alg-PreMasTol 3.0 H Alg-FragMassTol 0.50 H Alg-XCorrMode 0 H Comment preliminary algorithm sp H Comment final algorithm xcorr H StaticMod C=160.139 H Alg-DisplayTop 5 H EnzymeSpec tryptic
The following field labels must appear in the header:
|SQTGenerator||The name of the software used to create the SQT file|
|SQTGeneratorVersion||The version number of the SQTGenerator software|
|Database||Path to sequence database used to generate the SQT file|
|FragmentMasses||Were average or mono-istopic residue masses used to predict the fragment ion mass|
|PrecursorMasses||Were average or mono-istopic residue masses used to predict the precursor ion mass|
|StartTime||Time when SQT file was created|
|StaticMod||Non-standard amino-acid masses used in identification (repeat this record if there are multiple non-standard masses)|
|DynamicMod||List of dynamic modifications used in identification|
The following field labels are optional, and may appear in the header:
|Comment||Remarks. Multiple comment lines are allowed|
|DBSeqLength||Number of aminio acids in sequence database|
|DBLocusCount||Number of proteins in sequence database|
|DBMD5Sum||MD5 checksum of sequence database|
|SortedBy||Name of field use to sort spectra|
|Alg-||Field names begining with Alg- are algorithm specific|
|*||Other field names are allowed, but may not contain white-space|
Each spectrum match begins with a spectrum (S) record, each S record is followed by one or more match (M) records, and each M record is followed by one or more Locus (L) records. The S record contains the scan number and other details for the spectum, the M records contains the highest scoring peptide matches for the parent spectrum, and the L records give the names of proteins containing the parent match peptide.
S 45894 45894 2 1 maccoss007 2038.59 9199.5 147.1 153628 M 1 27 2040.244 0.0000 1.5881 245.6 11 34 V.YKCAADKQDATVVELTNL.T U L YCR102C M 2 68 2038.265 0.0116 1.5698 208.4 11 36 S.TQSGIVAEQALLHSLNENL.S U L YGR080W M 3 34 2039.247 0.1582 1.3369 239.3 11 36 I.NEKTSPALVIPTPDAENEI.S U L YLR035C M 4 322 2040.365 0.1699 1.3183 160.0 9 36 I.LKESKSVQPGKAIPDIIES.P U L YJL126W M 5 74 2039.453 0.2288 1.2248 203.6 10 32 D.MISVDLKTPLVIFKCHH.G U L YAL002W M 65 1 2041.246 0.4179 0.9245 370.2 13 32 S.CCGLSLPGLHDLLRHYE.E U L YLR403W S 45904 45904 3 1 maccoss007 2834.54 10103.7 246.4 152668 M 1 237 2833.059 0.0000 1.9175 273.1 20 108 N.NSGSDTVDPLAGLNNLRNSIKSAGNGME.N U L YDR505C M 2 223 2834.278 0.1390 1.6510 274.8 18 96 G.HLSRISNIDDILISMRMDAFDSLIG.Y U L YLR247C M 3 52 2835.100 0.1503 1.6292 324.1 20 96 S.KSTTEPIQLNNKHDLHLGQELTEST.V U L YDR098C-B L YDR365W-B L YER138C L YGR027W-B
Differences between Crux's SQT format and the published version
- Relative to the original specification of SQT, Crux expects an additional field [total ion intensity] in the "S" lines.
- In the original specification, the FragmentMasses and PrecursorMasses header lines used the values "AVG" or "MONO", but Crux uses "average" and "mono".