barista.xml
The barista.xml is an XML files that used to record four main parts:
< proteins > ... < /proteins > < subset_proteins > ... < /subset_proteins > < peptides > ... < /peptides > < psms > ... < /psms >
-
Proteins:contains the ranked list of groups of indistinguishable
target proteins. Each protein entry includes the following fields:
- protein group: a number that identifies a group.
- q-value: The minimal protein-level false discovery rate at which this protein is deemed significant. This q-value is computed based on the ranking of the proteins induced by the Barista score.
- score: The score assigned to the proteins by Barista. Higher values correspond to more confident identifications.
- protein_ids: proteins in the protein group.
- alternative_peptide_id: peptides are considered indistinguishable if they have identical amino acid sequences or they differ only by I/L or T/S in the same position in the peptide. If the peptides shared by the group are not identical, they are listed immediately after the proteins they belong to.
- peptide_ids: belong to each of the proteins in the group.
For example, suppose that- protein_a has peptide KLEAEVEALKK       // 'L' in second position
- and peptide VLGAK
- protein_b has peptide KIEAEVEALKK       // 'I' in second position
- and peptide VLGAK
Then the xml entry could look like this
< proteins > < q_value > 0 < /q_value > < score > 8.9 > < /score > < protein_group group_id="1" > < protein_ids > < protein_id > protein_a < /protein_id > < alternative_peptide_id > KLEAEVEALKK < /alternative_peptide_id > < protein_id > protein_b < /protein_id > < alternative_peptide_id > KIEAEVEALKK < /alternative_peptide_id > < /protein_ids > < peptide_ids > < peptide_id > VLGAK < /peptide_id > < /peptide_ids > < /protein_group > < /proteins >
-
Subset proteins contains groups of indistinguishable proteins, which constitute a subset of some group in the proteins section in terms of the peptides identified in these proteins. Each entry includes
- group id and parent group id: the identifier of the group and the identifier of the protein group which has the superset of the peptide set belonging the current group
- protein_ids : proteins that belong to the group
- peptide_ids: peptides that belong to the proteins in the group.
-
Peptides contains a ranked list of target peptides. Each peptide entry includes:
- peptide: Peptide amino acid sequence.
- q-value: The minimal peptide-level false discovery rate at which this peptide is deemed significant. This q-value is computed based on the ranking of the peptides induced by the Barista score.
- score: The score assigned to the peptide by Barista. Higher values correspond to more confident identifications.
- main_psm_id: The PSM identifier based on which the peptide received its score. A peptide score is the maximum over all the PSMs that contain this peptide.
- psm_ids: The identifiers of all the PMS that contain this peptide .
- protein_ids: All the proteins that contain this peptide and were infered based on some PSMs from the database search.
-
PSMs contains ranked list of target peptide-spectrum matches. The following columns are included:
- psm_id: PSM identifier.
- q-value: The minimal PSM-level false discovery rate at which this PSM is deemed significant. This q-value is computed based on the ranking of the PSMs induced by the Barista score.
- score: The score assigned to the PSM by Barista. Higher values correspond to more confident identifications.
- scan: the scan number
- charge: the inferred charge state
- precursor_mass: precrusor mass as recorded during the MS1 scan
- peptide: the peptide sequence
- filename: name of the file in which the PSM appears