barista.xml

The barista.xml is an XML files that used to record four main parts:

< proteins > ... < /proteins >
< subset_proteins > ... < /subset_proteins >
< peptides > ... < /peptides >
< psms > ... < /psms >
		

  1. Proteins:contains the ranked list of groups of indistinguishable target proteins. Each protein entry includes the following fields:
    1. protein group: a number that identifies a group.
    2. q-value: The minimal protein-level false discovery rate at which this protein is deemed significant. This q-value is computed based on the ranking of the proteins induced by the Barista score.
    3. score: The score assigned to the proteins by Barista. Higher values correspond to more confident identifications.
    4. protein_ids: proteins in the protein group.
    5. alternative_peptide_id: peptides are considered indistinguishable if they have identical amino acid sequences or they differ only by I/L or T/S in the same position in the peptide. If the peptides shared by the group are not identical, they are listed immediately after the proteins they belong to.
    6. peptide_ids: belong to each of the proteins in the group.

    For example, suppose that
    protein_a has peptide KLEAEVEALKK       // 'L' in second position
    and peptide VLGAK
    protein_b has peptide KIEAEVEALKK       // 'I' in second position
    and peptide VLGAK

    Then the xml entry could look like this
    < proteins >
    < q_value > 0 < /q_value >
    < score > 8.9 > < /score >
    < protein_group group_id="1" >
    < protein_ids >
    < protein_id > protein_a < /protein_id >
    < alternative_peptide_id > KLEAEVEALKK < /alternative_peptide_id >
    < protein_id > protein_b < /protein_id >
    < alternative_peptide_id > KIEAEVEALKK < /alternative_peptide_id >
    < /protein_ids >
    < peptide_ids >
    < peptide_id > VLGAK < /peptide_id >
    < /peptide_ids >
    < /protein_group >
    < /proteins >
    		
  2. Subset proteins contains groups of indistinguishable proteins, which constitute a subset of some group in the proteins section in terms of the peptides identified in these proteins. Each entry includes

    1. group id and parent group id: the identifier of the group and the identifier of the protein group which has the superset of the peptide set belonging the current group
    2. protein_ids : proteins that belong to the group
    3. peptide_ids: peptides that belong to the proteins in the group.
  3. Peptides contains a ranked list of target peptides. Each peptide entry includes:

    1. peptide: Peptide amino acid sequence.
    2. q-value: The minimal peptide-level false discovery rate at which this peptide is deemed significant. This q-value is computed based on the ranking of the peptides induced by the Barista score.
    3. score: The score assigned to the peptide by Barista. Higher values correspond to more confident identifications.
    4. main_psm_id: The PSM identifier based on which the peptide received its score. A peptide score is the maximum over all the PSMs that contain this peptide.
    5. psm_ids: The identifiers of all the PMS that contain this peptide .
    6. protein_ids: All the proteins that contain this peptide and were infered based on some PSMs from the database search.
  4. PSMs contains ranked list of target peptide-spectrum matches. The following columns are included:

    1. psm_id: PSM identifier.
    2. q-value: The minimal PSM-level false discovery rate at which this PSM is deemed significant. This q-value is computed based on the ranking of the PSMs induced by the Barista score.
    3. score: The score assigned to the PSM by Barista. Higher values correspond to more confident identifications.
    4. scan: the scan number
    5. charge: the inferred charge state
    6. precursor_mass: precrusor mass as recorded during the MS1 scan
    7. peptide: the peptide sequence
    8. filename: name of the file in which the PSM appears