Table Upload
This is an experimental feature. You can use the plugins for customization but they may break in future updates.
Overview
Users can now upload local CSV files or use a Querybook query execution results to create a SQL table. This is supported by loading the data into a Pandas DataFrame and then exporting it to the desired query engine.
Supported Upload Types
Currently, there are 2 ways to upload the data:
- Via file upload. Supports CSV/TSV and similar variations
- Via reading from query execution results
The upload process currently does not support custom plugins. If you do want to add additional ways to upload tables, please let us know either on Github or Slack.
Supported Ingestion Types
When ingesting data to SQL DB, 3 different types are supported:
Generic SQLAlchemy exporter
This uses Pandas' to_sql feature to convert the data to an insert
statement. This exporter only works with query engines that are based on SQLAlchemy.
Included by default: Yes
Available options: None
S3 CSV exporter
This would upload the Pandas DataFrame as CSV to S3, and create an external table on top of it. Dependencies such as boto3 must be installed.
Included by default: No
Available options:
- s3_path (str, optional): if supplied, will use it as the root path for upload. Must be the full s3 path like s3://bucket/key, the trailing / is optional.
- use_schema_location (boolean, optional): if true, the upload root path is inferred by locationUri specified by the schema/database in HMS. To use this option, the engine must be connected to a metastore that uses HMSMetastoreLoader (or its derived class). if false, it will be created as managed table, whose location will be determined automatically by the query engine.
- table_properties (List[str]): list of table properties passed, this must be query engine specific. Checkout here for examples in SparkSQL: https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html#examples For Trino/Presto, it would be the WITH statement: https://trino.io/docs/current/sql/create-table.html
If neither s3_path nor use_schema_location is supplied, it will be treated same as use_schema_location=False
, and it will be created as managed table.
S3 Parquet exporter
This would upload a Parquet file instead of a CSV file. In addition to dependencies such as boto3, pyarrow
must also be installed.
Included by default: No
Available options: See S3 CSV exporter.
How to use
Step 1: Installation and configuration
Before starting, please make sure the following is prepared:
- Dependencies such as boto3 and pyarrow are installed (if using Parquet exporter). Check out Infra Installation on what version to use.
- A query engine with a query metastore configured.
Now add the table upload exporter you would need to ALL_PLUGIN_TABLE_UPLOAD_EXPORTERS under table_uploader_plugin
. To learn how to use plugins, checkout the plugin guide.
Note the SQLAlchemy exporter is included by default, and S3 exporters need to be added via plugins.
Step 2: Add table uploader to the query engine
Once Querybook is ready, go to the admin Query Engine page (/admin/query_engine/
). Go to the Additional Features
section and choose an appropriate table upload exporter.
Step 3: Upload tables
Once the set up is complete, users who refresh the page should now see a +
button on the top of Tables sidebar. Clicking on it would start the table creation flow.
Alternatively, you can view any query execution and hover over the export
button. If there is any query engine in the environment that supports table upload, there should be an option shown to
let you export the query results directly to a table.