Quick start run

To run a basic run in PHLAWD you essentially just need a file that has some sequences in fasta format to identify the gene region of interest (lets call that rbcL.keep for now) and you need a configuration file. Minimally, the configuration file needs to have the following information

# should match an ncbi taxon name
clade = campanulas
 # searches the descriptions, for multiple separate with spaces or commas
search = rbcL
# the directory and prefix to all output file names 
gene = rbcL2
# the threshold for breaking alignments, higher = less broken up 
mad = 0.01 
# this is the stringency of the coverage for homology
coverage = 0.4 
# same as above but with identity
identity = 0.2 
# location of sqlite genbank database
db = /media/data/pln.db
# file with fasta sequences identifying gene region of interest 
knownfile = rbcL.keep
# number of threads for pthreads 
numthreads = 4 

With this you can then run

PHLAWD assemble configfile

The output will be very verbose, but after everything has finished you will be left with one folder that will be called in this case rbcL2_TEMPFILES and a few files. One file will be rbcL2.db which has all the alignment and sequence information. Two additional files, called here, rbcL2.FINAL.aln and rbcL2.FINAL.aln.rn have the alignments. The rn files have been renamed with human readable names instead of ncbi numbers. The FINAL.aln.rn is probably the file you want. There is also a file called rbcL2.gi which has the taxon ids from genbank, the genbank ids for the sequence that was used and a human readable taxon name.

For more run options please see the run options page.

Fork me on GitHub