EESI Laboratory: Arb Tree Generation Tutorial
In order to investigate gene evolution, gene sequences from various organisms are commonly aligned to form a phylogenetic tree. Besides viewing the taxonomic information on the tree, a user may want to visually inspect how the gene product and KEGG pathway that it is associated with evolved, giving greater power to evolutionary hypothesis testing. A software package, such as ARB, has the power to pool this information from Genbank files, but ARB uses the local computer resources to perform the alignment. Therefore, a user may want to use external resources (such as the CIPRES portal on Terragrid), to perform the alignment and then import and link particular information back into ARB to maintain this visualization power.
To accomplish this, we have created a pipeline that includes both ARB and outside resources for analysis of protein families that contain over 10,000 sequences, which require the construction of de novo trees. We have developed custom python scripts and ARB import filter to extract metadata from Genbank records and import this info with an externally-built alignment and phylogenetic tree. Using our scripts, a custom database, which includes all of the sequences and associated meta-data in the study, is imported into an ARB Database using uniqueIDs. The user can then use the ARB suite of tools to manipulate the phylogenetic tree and display the associated metadata.
You can download all the code and files* used to accomplish the same task given a set of alignments/tree and Genbank files. You can learn more about the process by proceeding to the accompanying tutorial located here
*Note: These files accompany the manuscript entitled A Toolkit for ARB to Integrate Custom Databases and Externally Built Phylogenies. To see the tutorial pdf that accompanies the manuscript, click here.