Generic package to run SQL/Scala/Python queries/jobs on Databricks using the Databricks REST API
DataBricks REST API
This package requires a DataBricks account, to sign up click here https://databricks.com/try-databricks
This package allows the user to perform the following actions on DataBricks using the REST API:
- Run commands remotely using Scala/Python/Spark SQL
- Create and Run jobs remotely
The following classes are defined:
- DatabricksQuerySet: DatabricksQuerySet is a generic class to run SQL/Scala/Python queries on Databricks using the Databricks REST API (Currently this supports Databricks REST API version 1.2).
- RunJob: Create a job using 2.0 API with a option to run the job.
Authentication and Account
This is required to run the tests and is the recommended way of using this package as it requires authentication
Store password and username as environment variables:
- export DATABRICKS_PASSWORD=”mypassword”
- export DATABRICKS_USERNAME=”myemail”
Required DataBricks Account details:
- export DATABRICKS_DOMAIN=”url-to-databricks-account”
- export ADMIN_EMAIL=”admin-email” this is the email of the admin to receive notification about job
To use DatabricksQuerySet:
- Create an instance of DatabricksQuerySet and provide username and password. ie; dq = DatabricksQuerySet(username, password)
- Run run_databricks_command method to submit a query.
To use RunJob:
- Create an instance of RunJob and provide username, password and job name. ie; job = RunJob(username, password, “Test”)
- Run run_job_now method to execute the job.
** Requires authentication credentials to be configured as environment variables **
Tests basic functionality of DatabricksQuerySet and RunJob classes:
- To run tests: $ python src/tests/test_simple.py