Skip to main content

Connect to Query Engines

Overview

Querybook supports all Sqlalchemy compatible query engines by default. Basic functionalities such as query execution, table metadata, and auto-completion are provided out of the box. However, more advanced integrations would require customized code. Overall, the query engines can be categorized into 3-tiers:

TierTier 3Tier 2Tier 1
SummaryNot testedTested w/ DBUsed in Production
LibrarySqlalchemyCustom/SqlAlchemyCustom
Run Queries
Paginated Result Fetch
Syntax highlight & Autocomplete
Query Progressx?
Query Logsx?
Query Metadatax?
Cancel Queryx?
User Authenticationxx
Syntax Error Parsingx?
Service discoveryxx
Language Specific Autocompletexx

Tier 1 does not mean engines can be used in production everywhere since different companies/org require different kinds of integrations. However, tier 1 databases provide an excellent foundation to extend additional functionalities. Use them as a reference or subclass them via the query engine plugin.

If you have tried any of the tier 3 databases and confirmed it works, please update this doc to let others know.

Query Engine Support

Querybook only supports a few of the Tier 1 & 2 databases by default. When Querybook is launched, it checks with SqlAlchemy to see if any of the databases below are available. If so, the query engine would be automatically available to set up in the Admin UI. Please see the step by step guide below to see an working example.

Step by step guide

In this guide, we will go through adding Amazon Redshift query engine to Querybook. This serves as an example to adding all sqlalchemy-compatible query engines.

  1. Clone and download the repo
git clone git@github.com:pinterest/querybook.git
cd querybook
  1. Create a local.txt under requirements/ folder in the project's root directory
touch requirements/local.txt
  1. Add the required packages
echo -e "sqlalchemy-redshift\nredshift_connector" > requirements/local.txt
  1. Start the container
make
  1. Register as a new user and use the demo setup.
  2. Visit https://localhost:10001/admin/query_engine/ and create a new query engine. Put redshift as the language and generic-sqlalchemy as the executor. In the Executor Params, put the connection string (as specified by SqlAlchemy) in the Connection_string field.
  3. Go to https://localhost:10001/admin/environment/1/ and add the Redshift engine under the demo_environment.
  4. Now you can run queries against the new Redshift engine in https://localhost:10001/demo_environment/adhoc/.
  5. To include table metadata and autocompletion, you would need to add a metastore. Visit https://localhost:10001/admin/metastore/ and create a new metastore. Use SqlAlchemyMetastoreLoader with the exact connection string used for the query engine. Click on Save -> CREATE SCHEDULE -> Create Task. Now click on Run Task to sync. You can view the progress in the History tab. Wait until it is completed (Should be done in seconds if the number of tables is small).
  6. Go to your query engine page on https://localhost:10001/admin/query_engine/, in the Metastore field, choose the metastore you just created and click Save.
  7. Visit https://localhost:10001/demo_environment/adhoc/ again and the auto complete feature should be available. You can also view all tables by clicking on the Tables button on the left sidebar and select the specific metastore.

All Query Engines

Note: If the query engine is not included below, but it does have a Sqlalchemy integration, you can still use it in Querybook. Follow the step by step guide with 1 additional step before step 4. Visit <project_root>/querybook/server/lib/query_executor/sqlalchemy.py and add the query engine to the list variable SQLALCHEMY_SUPPORTED_DIALECTS, and continue to step 4. If it works, please contribute to Querybook by submitting a PR of your changes.

Query EngineTierPackage
Apache Drill3sqlalchemy-drill
Apache Hive1pyhive OR -r engines/hive.txt
Apache Kylin3kylinpy
Apache Solr3sqlalchemy-solr
Amazon Athena3pyathena
Amazon Redshift2sqlalchemy-redshift
redshift_connector OR -r engines/redshift.txt
BigQuery2google-cloud-bigquery
OR -r engines/bigquery.txt
ClickHouse3clickhouse-sqlalchemy
clickhouse-driver
CockroachDB3sqlalchemy-cockroachdb
psycopg2
CrateDB3crate
Dremio3sqlalchemy-dremio
Druid2pydruid OR -r engines/druid.txt
ElasticSearch3elasticsearch-dbapi
EXASolution3sqlalchemy-exasol
Firebird3sqlalchemy-firebird
Google Spreasheets3gsheetsdb
IBM DB23ibm-db-sa
Microsoft Access3sqlalchemy-access
Microsoft SQL Server3Included by default
MySQL1Included by default
MonetDB3sqlalchemy_monetdb
Oracle3Included by default
PostgreSQL2Included by default
Presto1pyhive OR -r engines/presto.txt
SAP Hana3sqlalchemy-hana
Snowflake2snowflake-sqlalchemy OR -r engines/snowflake.txt
SQLite2Included by default
Teradata Vantage3teradatasqlalchemy
Trino2trino OR -r engines/trino.txt
Vertica3sqlalchemy-vertica-python