Quick start run

To run a basic run in PHLAWD you essentially just need a file that has some sequences in fasta format to identify the gene region of interest (lets call that rbcL.keep for now) and you need a configuration file. Minimally, the configuration file needs to have the following information

clade = campanulids # should match an ncbi taxon name
search = rbcL # searches the descriptions, for multiple separate with spaces or commas
gene = rbcL2 # the directory and prefix to all output file names
mad = 0.01 # the threshold for breaking alignments, higher = less broken up
coverage = 0.4 # this is the stringency of the coverage for homology
identity = 0.2 # same as above but with identity
db = /media/data/pln.db # location of sqlite genbank database
knownfile = rbcL.keep # file with fasta sequences identifying gene region of interest
numthreads = 4 # number of threads for pthreads

With this you can then run

PHLAWD assemble configfile

The output will be very verbose, but after everything has finished you will be left with one folder that will be called in this case rbcL2_TEMPFILES and a few files. One file will be rbcL2.db which has all the alignment and sequence information. Two additional files, called here, rbcL2.FINAL.aln and rbcL2.FINAL.aln.rn have the alignments. The rn files have been renamed with human readable names instead of ncbi numbers. The FINAL.aln.rn is probably the file you want. There is also a file called rbcL2.gi which has the taxon ids from genbank, the genbank ids for the sequence that was used and a human readable taxon name.

For more run options please see the run options page.

Fork me on GitHub