Connect to Query Engines
Overview
Querybook supports all Sqlalchemy compatible query engines by default. Basic functionalities such as query execution, table metadata, and auto-completion are provided out of the box. However, more advanced integrations would require customized code. Overall, the query engines can be categorized into 3-tiers:
Tier | Tier 3 | Tier 2 | Tier 1 |
---|---|---|---|
Summary | Not tested | Tested w/ DB | Used in Production |
Library | Sqlalchemy | Custom/SqlAlchemy | Custom |
Run Queries | ✓ | ✓ | ✓ |
Paginated Result Fetch | ✓ | ✓ | ✓ |
Syntax highlight & Autocomplete | ✓ | ✓ | ✓ |
Query Progress | x | ? | ✓ |
Query Logs | x | ? | ✓ |
Query Metadata | x | ? | ✓ |
Cancel Query | x | ? | ✓ |
User Authentication | x | x | ✓ |
Syntax Error Parsing | x | ? | ✓ |
Service discovery | x | x | ✓ |
Language Specific Autocomplete | x | x | ✓ |
Tier 1 does not mean engines can be used in production everywhere since different companies/org require different kinds of integrations. However, tier 1 databases provide an excellent foundation to extend additional functionalities. Use them as a reference or subclass them via the query engine plugin.
If you have tried any of the tier 3 databases and confirmed it works, please update this doc to let others know.
Query Engine Support
Querybook only supports a few of the Tier 1 & 2 databases by default. When Querybook is launched, it checks with SqlAlchemy to see if any of the databases below are available. If so, the query engine would be automatically available to set up in the Admin UI. Please see the step by step guide below to see an working example.
Step by step guide
In this guide, we will go through adding Amazon Redshift query engine to Querybook. This serves as an example to adding all sqlalchemy-compatible query engines.
- Clone and download the repo
git clone git@github.com:pinterest/querybook.git
cd querybook
- Create a
local.txt
underrequirements/
folder in the project's root directory
touch requirements/local.txt
- Add the required packages
echo -e "sqlalchemy-redshift\nredshift_connector" > requirements/local.txt
- Start the container
make
- Register as a new user and use the demo setup.
- Visit https://localhost:10001/admin/query_engine/ and create a new query engine. Put
redshift
as the language andgeneric-sqlalchemy
as the executor. In theExecutor Params
, put the connection string (as specified by SqlAlchemy) in theConnection_string
field. - Go to https://localhost:10001/admin/environment/1/ and add the Redshift engine under the demo_environment.
- Now you can run queries against the new Redshift engine in https://localhost:10001/demo_environment/adhoc/.
- To include table metadata and autocompletion, you would need to add a metastore. Visit https://localhost:10001/admin/metastore/ and create a new metastore. Use SqlAlchemyMetastoreLoader with the exact connection string used for the query engine. Click on
Save
->CREATE SCHEDULE
->Create Task
. Now click onRun Task
to sync. You can view the progress in theHistory
tab. Wait until it is completed (Should be done in seconds if the number of tables is small). - Go to your query engine page on https://localhost:10001/admin/query_engine/, in the Metastore field, choose the metastore you just created and click
Save
. - Visit https://localhost:10001/demo_environment/adhoc/ again and the auto complete feature should be available. You can also view all tables by clicking on the
Tables
button on the left sidebar and select the specific metastore.
All Query Engines
Note: If the query engine is not included below, but it does have a Sqlalchemy integration, you can still use it in Querybook. Follow the step by step guide with 1 additional step before step 4. Visit <project_root>/querybook/server/lib/query_executor/sqlalchemy.py
and add the query engine to the list variable SQLALCHEMY_SUPPORTED_DIALECTS
, and continue to step 4. If it works, please contribute to Querybook by submitting a PR of your changes.
Query Engine | Tier | Package |
---|---|---|
Apache Drill | 3 | sqlalchemy-drill |
Apache Hive | 1 | pyhive OR -r engines/hive.txt |
Apache Kylin | 3 | kylinpy |
Apache Solr | 3 | sqlalchemy-solr |
Amazon Athena | 3 | pyathena |
Amazon Redshift | 2 | sqlalchemy-redshift redshift_connector OR -r engines/redshift.txt |
BigQuery | 2 | google-cloud-bigquery OR -r engines/bigquery.txt |
ClickHouse | 3 | clickhouse-sqlalchemy clickhouse-driver |
CockroachDB | 3 | sqlalchemy-cockroachdb psycopg2 |
CrateDB | 3 | crate |
Dremio | 3 | sqlalchemy-dremio |
Druid | 2 | pydruid OR -r engines/druid.txt |
ElasticSearch | 3 | elasticsearch-dbapi |
EXASolution | 3 | sqlalchemy-exasol |
Firebird | 3 | sqlalchemy-firebird |
Google Spreasheets | 3 | gsheetsdb |
IBM DB2 | 3 | ibm-db-sa |
Microsoft Access | 3 | sqlalchemy-access |
Microsoft SQL Server | 3 | Included by default |
MySQL | 1 | Included by default |
MonetDB | 3 | sqlalchemy_monetdb |
Oracle | 3 | Included by default |
PostgreSQL | 2 | Included by default |
Presto | 1 | pyhive OR -r engines/presto.txt |
SAP Hana | 3 | sqlalchemy-hana |
Snowflake | 2 | snowflake-sqlalchemy OR -r engines/snowflake.txt |
SQLite | 2 | Included by default |
Teradata Vantage | 3 | teradatasqlalchemy |
Trino | 2 | trino OR -r engines/trino.txt |
Vertica | 3 | sqlalchemy-vertica-python |