Crux Template

barista.xml

The barista.xml is an XML files that used to record four main parts:

< proteins > ... < /proteins >
< subset_proteins > ... < /subset_proteins >
< peptides > ... < /peptides >
< psms > ... < /psms >

Proteins:contains the ranked list of groups of indistinguishable target proteins. Each protein entry includes the following fields:
1. protein group: a number that identifies a group.
2. q-value: The minimal protein-level false discovery rate at which this protein is deemed significant. This q-value is computed based on the ranking of the proteins induced by the Barista score.
3. score: The score assigned to the proteins by Barista. Higher values correspond to more confident identifications.
4. protein_ids: proteins in the protein group.
5. alternative_peptide_id: peptides are considered indistinguishable if they have identical amino acid sequences or they differ only by I/L or T/S in the same position in the peptide. If the peptides shared by the group are not identical, they are listed immediately after the proteins they belong to.
6. peptide_ids: belong to each of the proteins in the group.
For example, suppose that

protein_a has peptide KLEAEVEALKK // 'L' in second position

and peptide VLGAK

protein_b has peptide KIEAEVEALKK // 'I' in second position
and peptide VLGAK

Then the xml entry could look like this
```
< proteins >
< q_value > 0 < /q_value >
< score > 8.9 > < /score >
< protein_group group_id="1" >
< protein_ids >
< protein_id > protein_a < /protein_id >
< alternative_peptide_id > KLEAEVEALKK < /alternative_peptide_id >
< protein_id > protein_b < /protein_id >
< alternative_peptide_id > KIEAEVEALKK < /alternative_peptide_id >
< /protein_ids >
< peptide_ids >
< peptide_id > VLGAK < /peptide_id >
< /peptide_ids >
< /protein_group >
< /proteins >
		
```

Subset proteins contains groups of indistinguishable proteins, which constitute a subset of some group in the proteins section in terms of the peptides identified in these proteins. Each entry includes
1. group id and parent group id: the identifier of the group and the identifier of the protein group which has the superset of the peptide set belonging the current group
2. protein_ids : proteins that belong to the group
3. peptide_ids: peptides that belong to the proteins in the group.

Peptides contains a ranked list of target peptides. Each peptide entry includes:
1. peptide: Peptide amino acid sequence.
2. q-value: The minimal peptide-level false discovery rate at which this peptide is deemed significant. This q-value is computed based on the ranking of the peptides induced by the Barista score.
3. score: The score assigned to the peptide by Barista. Higher values correspond to more confident identifications.
4. main_psm_id: The PSM identifier based on which the peptide received its score. A peptide score is the maximum over all the PSMs that contain this peptide.
5. psm_ids: The identifiers of all the PMS that contain this peptide .
6. protein_ids: All the proteins that contain this peptide and were infered based on some PSMs from the database search.

PSMs contains ranked list of target peptide-spectrum matches. The following columns are included:
1. psm_id: PSM identifier.
2. q-value: The minimal PSM-level false discovery rate at which this PSM is deemed significant. This q-value is computed based on the ranking of the PSMs induced by the Barista score.
3. score: The score assigned to the PSM by Barista. Higher values correspond to more confident identifications.
4. scan: the scan number
5. charge: the inferred charge state
6. precursor_mass: precrusor mass as recorded during the MS1 scan
7. peptide: the peptide sequence
8. filename: name of the file in which the PSM appears