Getting started with Crux

This page will talk about setting up your environment and choosing your input files. Be sure to have completed the software installation first.

Your environment

For this tutorial, we'll assume you are working in a Linux/Unix type shell (Windows Subsystem for Linux or cygwin is a good choice for Windows users) and already know some basic commands for changing directories, listing files, and other simple tasks. To successfully run the sample commands, you'll need to work from a directory for which you have write permission. Anywhere in your home directory should work. Create a new directory and navigate to it.

$ mkdir crux-demo
$ cd crux-demo

We will refer to this directory, 'crux-demo', as the working directory.

Next, make sure that the computer knows where to look for the crux programs. Try this command.

$ which crux

If it returns a single line with a path ending in crux, then you are set. If not, review the installation instructions on setting your $PATH environment variable.

Input file: mass spectra

Included in the crux distribution in the doc/example-files directory are some sample files containing mass spectra. We will use the demo.ms2 for this tutorial. Locate the file and copy it to the current working directory.

The beginning of the file looks like this.

H   CreationDate Thu May 23 19:24:34 2019
H   Extractor   ProteoWizard
H   Extractor version   Xcalibur
H   Source file demo.raw
S   18  18  558.8295
I   RTime   0.1156636
I   BPI 61744.15
I   BPM 178.2829
I   TIC 261041.2
Z   2   1116.652
134.1264 7025.501
145.5224 8164.68
156.1784 7902.604
		
NOTE: There are two kinds of optional lines which may be included for each spectrum. Lines beginning with I are contain information independent of the charge state. Following a Z line may be one beginning with D containing information specific to the above charge state.

The first lines beginning with H are the header lines and contain information about the program that generated the file, the date it was created, and so on. The line starting with S begins the information about the first spectrum. Following the S is the scan number (twice) and the m/z of the precursor ion. The lines beginning with Z list the possible charge states of the spectrum (in this case 2) and the mass of the peptide at that charge state. Following the Z lines is the list of peaks for the spectrum. The subsequent spectra in the file repeat this pattern of S line, Z line(s), and peak list. demo.ms2 contains 71,631 spectra.

Input file: protein database (fasta file)

The second input file you will need is a protein database. This file is in fasta format and contains a list of proteins you expect to find in your sample and their sequences. A sample fasta file is also comes with the distribution in doc/example-files. We will use small-yeast.fasta. Copy it to your current working directory.

The beginning of the file looks like this.

>YBL030C PET9 SGDID:S000000126, Chr II from 164000-163044, reverse complement, Verified ORF, "Major ADP/ATP carrier of the mitochondrial inner membrane, exchanges cytosolic ADP for mitochondrially synthesized ATP; required for viability in many common lab strains carrying a mutation in the polymorphic SAL1 gene"
MSSNAQVKTPLPPAPAPKKESNFLIDFLMGGVSAAVAKTAASPIERVKLLIQNQDEML
KQGTLDRKYAGILDCFKRTATQEGVISFWRGNTANVIRYFPTQALNFAFKDKIKAMFG
FKKEEGYAKWFAGNLASGGAAGALSLLFVYSLDYARTRLAADSKSSKKGGARQFNGLI
DVYKKTLKSDGVAGLYRGFLPSVVGIVVYRGLYFGMYDSLKPLLLTGSLEGSFLASFL
LGWVVTTGASTCSYPLDTVRRRMMMTSGQAVKYDGAFDCLRKIVAAEGVGSLFKGCGA
NILRGVAGAGVISMYDQLQMILFGKKFK 
		

Lines beginning with > give the name of a protein. The first word is the protein name followed by an optional description of any length. The following lines contain the protein sequence. Proteins may or may not be separated by blank lines. small-yeast.fasta contains 56 proteins.