New feature: advanced queries with SQL
PHLAWD now has the ability to accept raw SQL query strings to use when searching the source database for potential sequences of interest. This provides a great deal more power than the default search functionality that simply matches against a set of terms. Now one can use exclusion, be explicit about the treatment of flanking whitespace, use all available SQL wildcards, and use boolean operators such as AND and OR, among other things.
Of course, with great power comes great responsibility…
For more information, see the manual page for SQL queries.
outlier detection
PHLAWD now has some procedures for outlier detection. The information can be found here. As always, git pull, make, sudo make install.
no more phyutility dependency
The most recent commit takes out the dependency on phyutility. This should make installation that much easier. I am still testing the functionality to make sure it is the same but so far so good. git pull and make and try it out.
New division codes
It can be helpful to combine divisions for the database creation. I have added two codes to do this
met for all metazoan with divisions 1-5
1. pri – primate sequences
2. rod – rodent sequences
3. mam – other mammalian sequences
4. vrt – other vertebrate sequences
5. inv – invertebrate sequences
all for all divisions from 1 – 7 including (be careful, this is more than 30GB)
1. pri – primate sequences
2. rod – rodent sequences
3. mam – other mammalian sequences
4. vrt – other vertebrate sequences
5. inv – invertebrate sequences
6. pln – plant, fungal, and algal sequences
7. bct – bacterial sequences
To use this functionality you need to git pull the latest version.
Uploaded the link to the inv database
I loaded the inv database to the most recent release. Find it here.
Updated the link to the pln database
I updated the pln database to the most recent release. Find it here.
Building Genbank database
I just committed to git the ability to create the GenBank database with the PHLAWD program instead of the python scripts. This is much much faster due to a homemade flat Genbank file parser. To use this, just pull the most recent source (you can see how to install from source here). Then, after compilation and installation, create a file with simple information, lets call it db.setup and it will have
db = pln.db division = pln download
Here, db = pln.db is just the outfile. division = pln is the division as defined by Genbank that you want to build (see all the divisions below). download just means that you want to download the files and you haven’t already done it. Then just run
PHLAWD setupdb db.setup
It will download the necessary files and put them into the database with the name that was specified in db.setup. The only planned additional functionality is more general than division downloads.
The divisions are
1. pri – primate sequences
2. rod – rodent sequences
3. mam – other mammalian sequences
4. vrt – other vertebrate sequences
5. inv – invertebrate sequences
6. pln – plant, fungal, and algal sequences
7. bct – bacterial sequences
8. vrl – viral sequences
9. phg – bacteriophage sequences
10. syn – synthetic sequences
11. una – unannotated sequences
12. est – EST sequences (expressed sequence tags)
13. pat – patent sequences
14. sts – STS sequences (sequence tagged sites)
15. gss – GSS sequences (genome survey sequences)
16. htg – HTG sequences (high-throughput genomic sequences)
17. htc – unfinished high-throughput cDNA sequencing
18. env – environmental sampling sequences
New home
PHLAWD has a new permanent home that is independent of code repository.
Major update
PHLAWD has just undergone close to an entire rewrite. Instead of files, it uses SQLite databases to store the information. It also removes the dependencies on quicktree and phyutility to make for an easier install. Also, I have changed to the autobuild/automake system for easier compilation and installation. I am adding the relevant information to the other pages.

