Sequence similarity
In order to best choose numbers for coverage and identity you can do a number of things. One operation that is helpful is
PHLAWD seqquery runfile.phlawd
This will go through the first steps of calculating best hits from the blast and will report those distributions so you can see them. After this is done, you can look in the genename.seqquery file and it will have two columns that correspond to 1) the best identity score and 2) the coverage for that identity.
You can plot these in any software, but something like R works well.

You can see there is a cluster at the bottom that is low identity and low coverage, and then variable identity with high coverage at the top. You would then set your cutoffs to maybe
coverage = 0.2 identity = 0.2
