A tool kit for accessing NCBI's GenBank
NCBI Tool Kit
A tool kit for downloading and curating collections of genomes retrieved from the National Center for Biotechology Information’s public database, GenBank.
- Automatically synchronize your local collection with the latest assembly versions.
- Give FASTAs useful names based on information avaialable in the assembly summary file and the taxonomy dump file.
Requires rsync. Tested only with rsync version 3.1.2 protocol version 31.
pip install ncbitk
git clone https://github.com/andrewsanchez/NCBITK.git python setup.py install
Regardless of which installation method you choose, I recommend using a virtual environment.
Download all GenBank bacteria:
ncbitk [directory] --update
If you have already run NCBITK, the above will also update your local collection, i.e. remove old genomes no longer in the assembly summary and download the latest assembly versions.
Get the status of your collection:
ncbitk [directory] --status
This will tell you how many genomes you have, what is missing from your collection, and how many deprecated genomes are present.