ClickHouse
Configuring your ClickHouse destination.
Prerequisites
- If your ClickHouse security posture requires IP whitelisting, have our data syncing service’s static IP available during the following steps. It will be required in Step 1.
Step 1: Allow access
Create a rule in a security group or firewall settings to whitelist:
- incoming connections to your host and port (usually
9440
) from the static IP. - outgoing connections from ports
1024
to65535
to the static IP.
Step 2: Create writer user
Create a database user to perform the writing of the data.
- Open a connection to your ClickHouse database.
- Create a user for the data transfer by executing the following SQL command.
- Grant user required privileges on the database.
Understanding the CREATE TEMPORARY TABLE, S3
permissions
The CREATE TEMPORARY TABLE
and S3
permissions are required to efficiently transfer data to ClickHouse. Under the hood, these permissions are used to stage data in object storage as compressed files, COPY INTO temporary tables, and finally merge into the target tables. By definition, the temporary table will not exist outside of the session.
Step 3: Setup staging bucket
ClickHouse sources require a staging bucket to efficiently transfer data. Configure your staging bucket using one of the following types of ClickHouse supported object storage:
- S3
- GCS
- Implicit
Using the implicit
bucket option
ClickHouse supports the ability to configure staging resources with environment credentials. If this setting is enabled on your ClickHouse cluster, you may choose to use the configured implicit staging resources using the implicit
option for the staging bucket selection.
Step 4: Add your destination
Securely share your host name, port, cluster, database name, schema name, username, password, and staging bucket details with us to complete the connection.
Understanding the database
vs. schema
fields (connection database
vs. write database
)
Depending on the version of your integration, you may be asked for both a database
and schema
, or a connection database
and write database
.
database
(also referred to asconnection_database
): is the database used to establish the connection with ClickHouse.schema
(also referred to aswrite_database
): is the database/schema within which data will be written
These can be (and often are) the same values, but do not need to be.
Using the ClickHouse data
Querying ClickHouse data without duplicates
The resulting ClickHouse tables use the ReplacingMergeTree table engine in order to efficiently upsert changes. To properly query this data, the FINAL
keyword must be used when selecting from these tables guarantee duplicates are removed. For example: