SQT file format

The SQT file format is used to record the matches between MS/MS spectra and a sequence database. A full description of the SQT file format may be found in:

McDonald, W.H. et al. "MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications." Rapid Communications in Mass Spectrometry. 18, 2162-2168 (2004).

An SQT file consists of a header followed by one or more spectrum matches. The header and match data are broken into records, one record per line. Fields within a record are separated by white space.

A sample SQT file may be found here.

Header details

Each line in the header must begin with an H. This is followed by a field label, and then a field value, all separated by white space. The field label must be one of the labels listed below, while the field value can be an aribrary string. A typical header is shown below.

H	SQTGenerator Crux
H	SQTGeneratorVersion 1.0
H	Comment Crux was written by...
H	Comment ref...
H	StartTime	Tue Dec  2 15:22:50 2008
H	EndTime                               
H	Database	test_crux_index/test-binary-fasta
H	DBSeqLength	?
H	DBLocusCount	4
H	PrecursorMasses	average
H	FragmentMasses	mono
H	Alg-PreMasTol	3.0
H	Alg-FragMassTol	0.50
H	Alg-XCorrMode	0
H	Comment	preliminary algorithm sp
H	Comment	final algorithm xcorr
H	StaticMod	C=160.139
H	Alg-DisplayTop	5
H	EnzymeSpec	tryptic

The following field labels must appear in the header:

Required header field labels
Field Label Description
SQTGeneratorThe name of the software used to create the SQT file
SQTGeneratorVersionThe version number of the SQTGenerator software
DatabasePath to sequence database used to generate the SQT file
FragmentMasses Were average or mono-istopic residue masses used to predict the fragment ion mass
PrecursorMasses Were average or mono-istopic residue masses used to predict the precursor ion mass
StartTimeTime when SQT file was created
StaticMod Non-standard amino-acid masses used in identification (repeat this record if there are multiple non-standard masses)
DynamicMod List of dynamic modifications used in identification

The following field labels are optional, and may appear in the header:

Optional header field labels
Field Label Description
Comment Remarks. Multiple comment lines are allowed
DBSeqLength Number of aminio acids in sequence database
DBLocusCount Number of proteins in sequence database
DBMD5Sum MD5 checksum of sequence database
SortedBy Name of field use to sort spectra
Alg- Field names begining with Alg- are algorithm specific
* Other field names are allowed, but may not contain white-space

Match details

Each spectrum match begins with a spectrum (S) record, each S record is followed by one or more match (M) records, and each M record is followed by one or more Locus (L) records. The S record contains the scan number and other details for the spectum, the M records contains the highest scoring peptide matches for the parent spectrum, and the L records give the names of proteins containing the parent match peptide.

S	45894	45894	2	1	maccoss007	2038.59	9199.5	147.1	153628
M	  1	 27	2040.244	0.0000	 1.5881	 245.6	 11	34	    V.YKCAADKQDATVVELTNL.T	U
M	  2	 68	2038.265	0.0116	 1.5698	 208.4	 11	36	    S.TQSGIVAEQALLHSLNENL.S	U
M	  3	 34	2039.247	0.1582	 1.3369	 239.3	 11	36	    I.NEKTSPALVIPTPDAENEI.S	U
M	  4	322	2040.365	0.1699	 1.3183	 160.0	  9	36	    I.LKESKSVQPGKAIPDIIES.P	U
M	  5	 74	2039.453	0.2288	 1.2248	 203.6	 10	32	    D.MISVDLKTPLVIFKCHH.G	U
M	 65	  1	2041.246	0.4179	 0.9245	 370.2	 13	32	    S.CCGLSLPGLHDLLRHYE.E	U
S	45904	45904	3	1	maccoss007	2834.54	10103.7	246.4	152668
M	  1	237	2833.059	0.0000	 1.9175	 273.1	 20	108	        N.NSGSDTVDPLAGLNNLRNSIKSAGNGME.N	U
M	  2	223	2834.278	0.1390	 1.6510	 274.8	 18	96	        G.HLSRISNIDDILISMRMDAFDSLIG.Y	U
M	  3	 52	2835.100	0.1503	 1.6292	 324.1	 20	96	  S.KSTTEPIQLNNKHDLHLGQELTEST.V	U

Record types for scans
Symbol Description Generic form Required
S Spectrum S [low scan] [high scan] [charge] [process time]
[server] [experimental neutral mass] [total ion intensity] [lowest Sp] [# of seq. matched]
M Match M [rank by Xcorr] [rank by Sp] [calculated mass]
[DeltaCN] [Xcorr] [Sp] [matched ions] [expected ions] [sequence matched]
[validation status U = unknown, Y = yes, N = no, M = Maybe]
L Locus L [locus name] [description if available] yes

Differences between Crux's SQT format and the published version