Skip to main content

Connect to a Query Engine

Prerequisites

  • Have the Querybook repository cloned. See Quick Setup.
  • Have a PostgreSQL database ready to connect. It could be either on your localhost or on a remote server.

General Process

  1. Create a query engine for query execution.
  2. Add the query engine to an environment. Create one first if needed.
  3. [Optional but highly recommended] Create a new metastore to associate with the query engine.
info

If you dont have an idea of above concepts of query engine, environment and metastore, please refer to here

Step by Step

Here we'll guide you through the process of adding a query engine for PostgreSQL as an example.

  1. Create a local.txt file under the requirements/ folder in the project's root directory.
touch requirements/local.txt
  1. Check the engine list and find the package it depends on.
  2. If the required package is not included by default, add it to the local.txt file. For PostgreSQL, no additional package is needed. Here is an example for Amazon Redshift:
echo -e "sqlalchemy-redshift\nredshift_connector" > requirements/local.txt
  1. Start the container:
make
  1. Open http://localhost:10001

  2. Sign up as a new user and use the demo setup. The first signed up user will be added as the admin.

  3. Open the admin tool http://localhost:10001/admin

  4. Click Query Engine to add a new query engine

    • Provide a name for the query engine.
    • Select Postgresql as the language.
    • Select sqlalchemy as the executor.
    • Input the connection string, which should look like
      postgresql://<username>:<password>@<server-host>:<port>/<database>
      Please refer to the SqlAlchemy documentation for the connection string format.
    • Select SelectOneChecker as the status checker
    About localhost

    If Querybook and PostgresSQL are both running on the same machine, you'll need some extra change.

    Mac

    Please use host.docker.internal instead of localhost as the server address. e.g. postgresql://<username>:<password>@host.docker.internal:5432/<database>

    Linux

    Before step 4 make

    • update docker-compose.yml to add network_mode=host for below services

      • web
      • worker
      • scheduler
    • update containers/bundled_querybook_config.yaml to use localhost instead of service names

    DATABASE_CONN: mysql+pymysql://test:passw0rd@localhost:3306/querybook2?charset=utf8mb4
    REDIS_URL: redis://localhost:6379/0
    ELASTICSEARCH_HOST: localhost:9200

    Then keep using localhost as the server host in the connection string

  5. Click Test Connection to see if it can connect to the database correctly. If it fails, go back and check the settings above and ensure that the database server is ready for connection. You can use command-line tools like psql to try to connect with the same connection settings.

  6. Click Save to create the engine.

  7. Go to the Environment tab and select demo_environment. You can also create a new environment if you like.

  8. For Query Engines, select postgresql from the dropdown list, and click Add Query Engine.

  9. Open http://localhost:10001/demo_environment/adhoc/. Switch to the new environment if you created a new one in step 11.

  10. Try to write a test query, select postgresql, and run it.

That's it 🎉. Keep reading if you'd like to know how to add a metastore.

  1. Open http://localhost:10001/admin/metastore/
  2. Create a new metastore.
    • Provide a name for the metastore.
    • Select SqlAlchemyMetastoreLoader as the loader.
    • Input the same connection string as the query engine.
      Connection String

      For PostgreSQL, the metastore is the same as the database, so we're using the same connection string for both the metastore and query engine. However, this may not be the case for other engines, such as Hive Metastore + Presto.

  3. Click Create to create the metastore.
  4. On the same page, you will see a section called Update Schedule. Click the button Create Task. This scheduled task is used for syncing the metadata from the metastore to Querybook periodically.
  5. Click Run Task and wait until it completes.
  6. Go to the Query Engine tab and select the new query engine postgresql.
  7. Select postgres_metastore from the dropdown list for the Metastore field and click Save.
  8. Go to the Tables tab on page http://localhost:10001/demo_environment/. Select postgres_metastore from the dropdown list. You'll see the tables synced from the metastore. If you don't see any tables, go back to step 17 and check if the connection string is correct.

Congratulations! You have successfully set up Querybook with a query engine and a metastore. You can now start exploring and analyzing your data with ease.