Skip to main content
Warning: You are using the test version of PyPI. This is a pre-production deployment of Warehouse. Changes made here affect the production instance of TestPyPI (testpypi.python.org).
Help us improve Python packaging - Donate today!

Cuckoo Filter implementation in Python

Project Description

Cuckoo Filter, like Bloom Filter, is a probabilistic data structure for fast, approximate set membership queries, with some small false positive probability. While Bloom Filters are space efficient and are widely used, they do not support deletion of items from the set without rebuilding the entire filter. This can be overcome with several extensions to Bloom Filters such as Counting Bloom Filters, but with significant space overhead.

Cuckoo Filters support adding and removing items dynamically while achieving higher performance than Bloom filters. A Cuckoo Filter is based on partial-key cuckoo hashing that stores only fingerprint of each item inserted. Cuckoo Filters provide higher lookup performance than Bloom Filters and uses less space than Bloom Filters if the target false positive rate is < 3%.

The original research paper Cuckoo Filter: Practically Better Than Bloom by Bin Fan, David G. Andersen, Michael Kaminsky and Michael D. Mitzenmacher describes the data structure in more detail.

Installation

Make sure you have Python (3.5+) installed on your system. If you don’t have it, follow these instructions to install it.

Install cuckoopy using:

$ pip install cuckoopy

Usage

>>> from cuckoopy import CuckooFilter
# Initialize a cuckoo filter with 10000 buckets with bucket size 4 and fingerprint size of 1 byte
>>> cf = CuckooFilter(capacity=10000, bucket_size=4, fingerprint_size=1)

Insert an item into the filter:

>>> cf.insert('Hello!')
True

Lookup an item in the filter:

>>> cf.contains('Hello!')
True
>>> 'Hello!' in cf
True

Delete an item from the filter:

>>> cf.delete('Hello!')
True

Get the size (number of items present) of the filter:

>>> cf.size
4
>>> len(cf)
4

Running tests locally

This project uses pytest for tests. Make sure you have tox installed on your local machine and from the root directory of the project, run:

$ tox

This command runs unit tests in python 3.5 and python 3.6 environments with code coverage details. It also runs pep8 (flake8) checks. To run tox against a specific environment (py35, py36 or pep8), use the -e option.

License

MIT License

Release History

Release History

This version
History Node

0.1.0a0

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting