Quick start run
To run a basic run in PHLAWD you essentially just need a file that has some sequences in fasta format to identify the gene region of interest (lets call that rbcL.keep for now) and you need a configuration file. Minimally, the configuration file needs to have the following information
# should match an ncbi taxon name clade = campanulas # searches the descriptions, for multiple separate with spaces or commas search = rbcL # the directory and prefix to all output file names gene = rbcL2 # the threshold for breaking alignments, higher = less broken up mad = 0.01 # this is the stringency of the coverage for homology coverage = 0.4 # same as above but with identity identity = 0.2 # location of sqlite genbank database db = /media/data/pln.db # file with fasta sequences identifying gene region of interest knownfile = rbcL.keep # number of threads for pthreads numthreads = 4
With this you can then run
PHLAWD assemble configfile
The output will be very verbose, but after everything has finished you will be left with one folder that will be called in this case rbcL2_TEMPFILES and a few files. One file will be rbcL2.db which has all the alignment and sequence information. Two additional files, called here, rbcL2.FINAL.aln and rbcL2.FINAL.aln.rn have the alignments. The rn files have been renamed with human readable names instead of ncbi numbers. The FINAL.aln.rn is probably the file you want. There is also a file called rbcL2.gi which has the taxon ids from genbank, the genbank ids for the sequence that was used and a human readable taxon name.
For more run options please see the run options page.