Scrape biological data from websites
Web scrapers to interact with remote databases programatically in Python that makes a local cache of web data with sqlite3 to prevent excessive web traffic.
So far, implemented:
- [Uniprot](http://uniprot.org) by uniprot protein ID. (e.g. ‘Q8BP71’)
- [PubMed](www.ncbi.nlm.nih.gov/pubmed/) by PMID (e.g. ‘24213538’)
##Python 2.7.x and 3.x pip install .
Real unit tests are absent, but you can test basic functionality with python test/not_a_real_test.py.
from bioscraping import PubMedClient
pubmed = PubMedClient()
defaults to writing a file called .bioscraping.pubmed.sqlite.db
Returns text with author and abstract for PMID.
from bioscraping import UniprotClient
uniprot = UniprotClient()
defaults to writing a file called .bioscraping.uniprot.sqlite.db
Returns a dictionary of data parsed from xml.
UniprotClient has a potential race condition and tempfile needs to be implemented before it is safe for concurrent processes. (see TODO)