Prerequisites

  • If your ClickHouse security posture requires IP whitelisting, have our data syncing service’s static IP available during the following steps. It will be required in Step 1.

Step 1: Allow access

Create a rule in a security group or firewall settings to whitelist:

  1. incoming connections to your host and port (usually 9440) from the static IP.
  2. outgoing connections from ports 1024 to 65535 to the static IP.

Step 2: Create writer user

Create a database user to perform the writing of the data.

  1. Open a connection to your ClickHouse database.
  2. Create a user for the data transfer by executing the following SQL command.
CREATE USER <username>@'%' IDENTIFIED BY '<some-password>';
  1. Grant user required privileges on the database.
GRANT CREATE, INSERT, DROP, ALTER, OPTIMIZE, SHOW ON <database.*> TO <username>@'%';
grant CREATE TEMPORARY TABLE, S3 on *.* to <username>@'%';

Understanding the CREATE TEMPORARY TABLE, S3 permissions

The CREATE TEMPORARY TABLE and S3 permissions are required to efficiently transfer data to ClickHouse. Under the hood, these permissions are used to stage data in object storage as compressed files, COPY INTO temporary tables, and finally merge into the target tables. By definition, the temporary table will not exist outside of the session.

Step 3: Setup staging bucket

ClickHouse sources require a staging bucket to efficiently transfer data. Configure your staging bucket using one of the following types of ClickHouse supported object storage:

  • S3
  • GCS
  • Implicit

Using the implicit bucket option

ClickHouse supports the ability to configure staging resources with environment credentials. If this setting is enabled on your ClickHouse cluster, you may choose to use the configured implicit staging resources using the implicit option for the staging bucket selection.

Step 4: Add your destination

Securely share your host name, port, cluster, database name, schema name, username, password, and staging bucket details with us to complete the connection.

Understanding the database vs. schema fields (connection database vs. write database)

Depending on the version of your integration, you may be asked for both a database and schema, or a connection database and write database.

  • database (also referred to as connection_database): is the database used to establish the connection with ClickHouse.
  • schema (also referred to as write_database): is the database/schema within which data will be written

These can be (and often are) the same values, but do not need to be.

Using the ClickHouse data

Querying ClickHouse data without duplicates

The resulting ClickHouse tables use the ReplacingMergeTree table engine in order to efficiently upsert changes. To properly query this data, the FINAL keyword must be used when selecting from these tables guarantee duplicates are removed. For example:

SELECT
  *
FROM
  schema.table FINAL
WHERE
  foo = bar
ORDER BY foo
LIMIT 10;