Quick start run
To run a basic run in PHLAWD you essentially just need a file that has some sequences in fasta format to identify the gene region of interest (lets call that rbcL.keep for now) and you need a configuration file. Minimally, the configuration file needs to have the following information
clade = campanulids # should match an ncbi taxon name search = rbcL # searches the descriptions, for multiple separate with spaces or commas gene = rbcL2 # the directory and prefix to all output file names mad = 0.01 # the threshold for breaking alignments, higher = less broken up coverage = 0.4 # this is the stringency of the coverage for homology identity = 0.2 # same as above but with identity db = /media/data/pln.db # location of sqlite genbank database knownfile = rbcL.keep # file with fasta sequences identifying gene region of interest numthreads = 4 # number of threads for pthreads
With this you can then run
PHLAWD assemble configfile
The output will be very verbose, but after everything has finished you will be left with one folder that will be called in this case rbcL2_TEMPFILES and a few files. One file will be rbcL2.db which has all the alignment and sequence information. Two additional files, called here, rbcL2.FINAL.aln and rbcL2.FINAL.aln.rn have the alignments. The rn files have been renamed with human readable names instead of ncbi numbers. The FINAL.aln.rn is probably the file you want. There is also a file called rbcL2.gi which has the taxon ids from genbank, the genbank ids for the sequence that was used and a human readable taxon name.
For more run options please see the run options page.