Tutorial: Creating neighbor-joining trees with jackknife analysis in Paup. NOTE: I HAVE ONLY TESTED THESE ON A FEW ALIGNMENTS - PLEASE LET ME KNOW IF THERE ARE ANY PROBLEMS! THANKS FOR HEPLING ME WORK THE BUGS OUT OF THESE PROGRAMS. PAUP is a fast and powerful phylogenetics program by David Swofford (http://paup.csit.fsu.edu/). We have access to a command-line version of this program in our nun accounts. GCG also has a driver program for PAUP called PAUPSEARCH (see the GCG manual). PAUP can do a lot of things, but we usually just want to make the same one or two types of tree. To make things easier, I've written a program that converts fasta format sequence alignments to NEXUS format (the format PAUP requires) and appends some PAUP commands to create a PAUP execution script. Take a look at the execution scripts once you've created them to get an idea of what the PAUP commands look like. Here are instructions for creating trees in PAUP. Because there's a bug in PAUP that prevents node labels on bootstrap trees from being written properly, I've written the program to perform a jackknife analysis instead - this is another data resampling method that gives results similar to bootstrapping. paupboot.py is set to make neighbor-joining (nj) trees according to the GTR (General Time-Reversible) model of evolution with a gamma distribution of rates, aka REV-gamma. I have indicated UNIX commands with a $ prompt; for example, $ ls means type 'ls' at the prompt (it doesn't mean to type '$ ls'). Output from commands is indicated with an '-->'. All commands should be typed on a single line. You will find the following programs useful (*see below). Instructions are available by typing -h after the program name. Command line syntax is very similar to GCG (see the examples). paupboot.py writes PAUP execution scripts fastaview.py view FASTA format alignments as interleaved and numbered trim.py remove sub-regions of an alignment treeparser.py modify existing trees. Examples use the alignment below (you can copy-paste it into a new file called test.fasta) like this: $ cat > test.fasta [paste the alignment at the prompt, hit return then Ctrl+D] or like this $ pico test.fasta [paste the alignment at the prompt, hit Ctrl+X and follow the instructions] >s1 ATGAGAGTGAGGGAGATCAAGAGGAATTATCAGCTCCTATGGAGATGGGGCATCATGCTC CTTGGGATATTAATGATCTGTAAT---GAACAATTATGG >s2 ATGAGAGTGAAGGGGATCAGGAGGAATTGTCAGCACTGGTGGAAATGGGGCATCATGCTC CTTGGGATATTAATGATCTGTAATGCTGAACAATTGTGG >s3 ATGAGAGTGAAGGGGATCAGGAGGAATTGTCAGCGCTGGTGGAAATGGGGCATCATGCTC CTTGGGATATTAATGATCTGTAATGCTGAACAATTGTGG >s4 ATGAGAGTGAAGGGGATCAGGAGGAATTATCAGCACTGGTGGAAATGGGGCATCATGCTC CTTGGGATATTAATGATCTGTAATGCTGAACAATTGTGG >s5 ATGAAAGTGAAGGAGACCAAGAGGAATTGGCAGCGCTTGTGGAGATGGGGCATCATGCTC CTTGGGATGTTGATGATCTGTAGTGCAGAAAAATTGTGG 1. Align your sequences and put them into FASTA format. Make sure your sequence names don't contain non-alphanumeric characters other than '_' (underscore). (This is a good general rule to follow when naming sequences). Remove poorly-aligned regions with lots of gaps with trim.py. (for example, $ trim.py buh.fasta -r=1-100 -out=buh_trimmed.fasta will write the first 100 characters of each sequence in buh.fasta to buh_trimmed.fasta). ************************** 2. Create the PAUP execution script: The following command will create an execution script for a neighbor-joining tree. $ paupboot.py test.fasta -m=nj --> Writing test.njscript Or you can do a jackknife analysis (100 replicates): $ paupboot.py test.fasta -m=jack --> Writing test.jackscript This second analysis will label nodes according to the number of jackknife replicates that contained the same set of sequences branching from that node. This is a measure of our confidence in those branches. ************************** 3. Run PAUP You may have to add PAUP to your account by typing $ ipm paup Once you have done this, execute the paup scripts. The jackknife analysis can take a while for large alignments. $ paup test.njscript ... --> 1 tree saved to file "~/020607_testpaup/test_nj.pauptrees" $ paup test.jackscript ... --> 1 tree saved to file "~/020607_testpaup/test_jack.pauptrees" ************************** 4. (Optional) Combine the results from these two analyses: PAUP has no mechanism for showing the results of a jackknife analysis and branch lengths on the same tree. You can combine the analyses using treeparser.py. This command will transfer the node labels from test_jack.pauptrees to test_nj.pauptrees, and will write a new file. $ treeparser.py -intree=test_nj.pauptrees -nodelab=test_jack.pauptrees -out=test_combo.pauptrees --> Writing test_combo.pauptrees ... ************************** 5. View and save the trees. Treeview will open the output from PAUP and treeparser directly. Treeview is available for both OS 9.x and OSX (maybe also for PC?). Treeview for 9.x has more options. To show the node labels, select "Tree" --> "Show Internal Node Labels" in the program menu. Trees exported from Treeview for 9.x can be opened in Illustrator and modified. 6. Recap: All of the above can be distilled into the following lines: $ paupboot.py test.fasta -m=nj $ paupboot.py test.fasta -m=jack $ paup test.njscript $ paup test.jackscript (the second two lines can be skipped if you add -e to the first two commands) $ treeparser.py -intree=test_nj.pauptrees -nodelab=test_jack.pauptrees That's it! ************************** * you can get access to the programs I have written with this command: $ /afs/isis.unc.edu/home/n/g/nghoffma/public/add_nh_profile This command adds my directories to the search path in your account; You must log out and log back in for the changes to take effect. Don't do this if you have modified the file ~/public/.profile.personal to customize your environment: instead, look at /afs/isis.unc.edu/home/n/g/nghoffma/public/.profile.personal to see what to add to your path. A note on these programs: output from one program can be piped to another. For example, to trim buh.fasta, translate the output, and view the translation in an interleaved format, use this command: $ trim.py buh.fasta -r=1-100 | translate.py | fastaview.py