Skip to main content
Warning: You are using the test version of PyPI. This is a pre-production deployment of Warehouse. Changes made here affect the production instance of TestPyPI (
Help us improve Python packaging - Donate today!

SentencePiece python wrapper

Project Description

# SentencePiece Python Wrapper

Python wrapper for SentencePiece with SWIG. This module wrapps sentencepiece::SentencePieceProcessor class with the following modifications: * Encode and Decode methods are re-defined as EncodeAsIds, EncodeAsPieces, DecodeIds and DecodePieces respectevely. * SentencePieceText proto is not supported. * Added __len__ and __getitem__ methods. len(obj) and obj[key] returns vocab size and vocab id respectively.

## Build and Install SentencePiece You need to install SentencePiece before before installing this python wrapper.

Please use pip comand to install SentencePiece python module.

` % pip install sentencepiece `

You can install SentencePiece python module manually as follows: ` % python build % sudo python install `

If you don’t have write permission to the global site-packages directory or don’t want to install into it, please try: ` % python install --user `

## Usage

` % python >>> import sentencepiece as spm >>> sp = spm.SentencePieceProcessor() >>> sp.Load("test/test_model.model") True >>> sp.EncodeAsPieces("This is a test") ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est'] >>> sp.EncodeAsIds("This is a test") [284, 47, 11, 4, 15, 400] >>> sp.DecodePieces(['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est']) 'This is a test' >>> sp.DecodeIds([284, 47, 11, 4, 15, 400]) 'This is a test' >>> sp.GetPieceSize() 1000 >>> sp.IdToPiece(2) '</s>' >>> sp.PieceToId('</s>') 2 >>> len(sp) 1000 >>> sp['</s>'] 2 `

Release History

Release History

This version
History Node


Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
sentencepiece-0.0.0-py2.7-linux-x86_64.egg (99.0 kB) Copy SHA256 Checksum SHA256 2.7 Egg Aug 28, 2017
sentencepiece-0.0.0.tar.gz (182.3 kB) Copy SHA256 Checksum SHA256 Source Aug 28, 2017

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting