SQT file format
The SQT file format is used to record the matches between MS/MS spectra and a sequence database. A full description of the SQT file format may be found in:
McDonald, W.H. et al. "MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications." Rapid Communications in Mass Spectrometry. 18, 2162-2168 (2004).An SQT file consists of a header followed by one or more spectrum matches. The header and match data are broken into records, one record per line. Fields within a record are separated by white space.
A sample SQT file may be found here.
Header details
Each line in the header must begin with an H
.
This is followed by a field label, and then a field value,
all separated by white space.
The field label must be one of the labels listed below,
while the field value can be an aribrary string.
A typical header is shown below.
H SQTGenerator Crux H SQTGeneratorVersion 1.0 H Comment Crux was written by... H Comment ref... H StartTime Tue Dec 2 15:22:50 2008 H EndTime H Database test_crux_index/test-binary-fasta H DBSeqLength ? H DBLocusCount 4 H PrecursorMasses average H FragmentMasses mono H Alg-PreMasTol 3.0 H Alg-FragMassTol 0.50 H Alg-XCorrMode 0 H Comment preliminary algorithm sp H Comment final algorithm xcorr H StaticMod C=160.139 H Alg-DisplayTop 5 H EnzymeSpec tryptic
The following field labels must appear in the header:
Field Label | Description |
---|---|
SQTGenerator | The name of the software used to create the SQT file |
SQTGeneratorVersion | The version number of the SQTGenerator software |
Database | Path to sequence database used to generate the SQT file |
FragmentMasses | Were average or mono-istopic residue masses used to predict the fragment ion mass |
PrecursorMasses | Were average or mono-istopic residue masses used to predict the precursor ion mass |
StartTime | Time when SQT file was created |
StaticMod | Non-standard amino-acid masses used in identification (repeat this record if there are multiple non-standard masses) |
DynamicMod | List of dynamic modifications used in identification |
The following field labels are optional, and may appear in the header:
Field Label | Description | |
---|---|---|
Comment | Remarks. Multiple comment lines are allowed | |
DBSeqLength | Number of aminio acids in sequence database | |
DBLocusCount | Number of proteins in sequence database | |
DBMD5Sum | MD5 checksum of sequence database | |
SortedBy | Name of field use to sort spectra | |
Alg- | Field names begining with Alg- are algorithm specific | |
* | Other field names are allowed, but may not contain white-space |
Match details
Each spectrum match begins with a spectrum (S) record, each S record is followed by one or more match (M) records, and each M record is followed by one or more Locus (L) records. The S record contains the scan number and other details for the spectum, the M records contains the highest scoring peptide matches for the parent spectrum, and the L records give the names of proteins containing the parent match peptide.
S 45894 45894 2 1 maccoss007 2038.59 9199.5 147.1 153628 M 1 27 2040.244 0.0000 1.5881 245.6 11 34 V.YKCAADKQDATVVELTNL.T U L YCR102C M 2 68 2038.265 0.0116 1.5698 208.4 11 36 S.TQSGIVAEQALLHSLNENL.S U L YGR080W M 3 34 2039.247 0.1582 1.3369 239.3 11 36 I.NEKTSPALVIPTPDAENEI.S U L YLR035C M 4 322 2040.365 0.1699 1.3183 160.0 9 36 I.LKESKSVQPGKAIPDIIES.P U L YJL126W M 5 74 2039.453 0.2288 1.2248 203.6 10 32 D.MISVDLKTPLVIFKCHH.G U L YAL002W M 65 1 2041.246 0.4179 0.9245 370.2 13 32 S.CCGLSLPGLHDLLRHYE.E U L YLR403W S 45904 45904 3 1 maccoss007 2834.54 10103.7 246.4 152668 M 1 237 2833.059 0.0000 1.9175 273.1 20 108 N.NSGSDTVDPLAGLNNLRNSIKSAGNGME.N U L YDR505C M 2 223 2834.278 0.1390 1.6510 274.8 18 96 G.HLSRISNIDDILISMRMDAFDSLIG.Y U L YLR247C M 3 52 2835.100 0.1503 1.6292 324.1 20 96 S.KSTTEPIQLNNKHDLHLGQELTEST.V U L YDR098C-B L YDR365W-B L YER138C L YGR027W-B
Symbol | Description | Generic form | Required |
---|---|---|---|
S | Spectrum | S [low scan] [high scan] [charge] [process time]
|
yes |
M | Match | M [rank by Xcorr] [rank by Sp] [calculated mass]
|
yes |
L | Locus | L [locus name] [description if available] |
yes |
Differences between Crux's SQT format and the published version
- Relative to the original specification of SQT, Crux expects an additional field [total ion intensity] in the "S" lines.
- In the original specification, the FragmentMasses and PrecursorMasses header lines used the values "AVG" or "MONO", but Crux uses "average" and "mono".