Graphic and text stem-and-leaf plots
John Tukey’s stem-and-leaf plot first appeared in 1970. Although very useful back then, it cannot handle more than 300 data points and is completely text-based. Stemgraphic is a very easy to use python package providing a solution to these limitations.
A typical stem_graphic output:
For an in depth look at
Stemgraphic requires docopt, matplotlib and pandas. Optionally, having Scipy installed will give you secondary plots and Dask (see requirements_dev.txt for all needed to run all the functional tests) will allow for out of core, big data visualization.
Installation is simple:
pip3 install -U stemgraphic
or from this cloned repository, in the package root:
python3 setup.py install
Persist sample from command line tool (-k filename.pkl or -k filename.csv).
Windows compatible bat file wrapper (stem.bat).
Added full command line access to dask distributed server (-d, -s, use file in ‘’ when using glob / wildcard).
For operations with dask, performance has been increased by 25% in this latest release, by doing a compute once of min, max and count all at once. Count replaces len(x).
Added the companion PDF as it will be presented at PyData Carolinas 2016.
Plenty… but to start:
- back to back and scale calculation
- multivariate support
- provide support for secondary plots with dask
- automatic dense layout
- add a way to provide an alternate function to the sampling
- support for spark rdds and/or sparkling pandas
- create a bokeh version. Ideally rbokeh too.
- interactive version based on the above
- add unit tests
- add feather, hdf5 etc support, particularly on sample persistence