Quick Standalone BLAST Setup for Ubuntu Linux

Some people run into trouble trying to setup a standalone version of BLAST using the NCBI instructions. Here a stremalined process will be presented, targeted at Ubuntu.

I assume that you are aware of the paradigms of blast, meaning that there are several executables for searching nucleic acids or proteins and there are different databases you can blast against. Sinon, you should read up on the available search tools  and databases before you attempt to install Blast. NB, throughout this document, I am using protein blast and protein input – changing to nucleotide sequences is trivial as you just change blastp to blastn and ‘prot’ to ‘nt’ in obvious places (and of course you use different queries and target databases).
Without further ado, Blast setup for UNIX.
There are two components for the installation:
  1. Executables (bastn, blastp etc.)
  2. Databases. (nr, nt etc.)
Both are described below with follow-up examples of usage.
Ad.1 The executables can be downloaded and compiled from here (download the source, run ./configure then make and finally make install in the directory of the untarred file). However a much easier way to do it under Ubuntu is:
sudo apt-get install ncbi-blast+
This automatically installs everything. In both cases to check if all went ok, type:
which blastp
If you get a directory such as /usr/local/bin than all went well and that’s where your executables are.
Ad.2 FIrst, you need to decide on where to store the databases. Do this by setting the environment variable:
export BLASTDB=/path/to/blastdbs/of/your/chosing
Now, we can either use one of the ncbi-curated databases or create our own. We will do both.
A) Downloading and using an ncbi-curated database.
The databases can be downloaded using the update_blastdb script. As an example I will download a non redundant protein database which is referred to as ‘nr’:
cd $BLASTDB
sudo update_blastdb --passive --timeout 300 --force --verbose nr
ls *.gz |xargs -n1 tar -xzvf
rm *.gz.*

The penultimate command extracts all the files you have downloaded and the last one removes the downloaded archives.

Now you should be able to use your new database by executing (where somesequence.fasta is your sample query):

blastp -db nr -query somesequence.fasta

Done.

B) Creating your own database.

Firstly, put a bunch of fasta protein sequences into a file called sample.fa

Next, execute the following

makeblastdb -in sample.fa -dbtype 'prot' -out NewDb
mv NewDB* $BLASTDB/

We have now created a blast protein database from your fasta file, called NewDB. The last line simply moves all the blast files to the database directory.

Now you should be able to use your new database by executing (where somesequence.fasta is your sample query):

blastp -db NewDb -query somesequence.fasta

Done.

Afterword

These instructions are the shortest way I could find to get a working stand-alone BLAST application. If you require more info, you can look here.

 

Author

2 thoughts on “Quick Standalone BLAST Setup for Ubuntu Linux

  1. Andrej Pangerčič

    Hey,

    I have small comment regarding this seqment:

    A) Downloading and using an ncbi-curated database.
    The databases can be downloaded using the update_blastdb script. As an example I will download a non redundant protein database which is referred to as ‘nr’:
    cd $BLASTDB
    sudo update_blastdb –passive –timeout 300 –force –verbose nr
    Here you are not runing script that you mentioned above, but you are calling instaled program.

    Secodly please remove sudo, because for loading stuff from ftp to local pc you do not need root access! If you want to run script that you dowloaded, you need to add execute privilege to “update_blastdb.pl” file with this command “chmod u+x update_blastdb.pl” and run it with command:
    ./update_blastdb.pl –passive –timeout 300 –force –verbose nr

    Also one one more question. Is it possible to run blast with just nr.00 and nr.01 and not having whole database dowloaded? I tried tu run it, but I got error that he is missing nr.02. Is there a way to tell him that my database is just two nr arhives long?

    Thanks for sharing this blog and hoping to get reply soon.

Comments are closed.