Prerequisites
- If your ClickHouse security posture requires IP whitelisting, have our data syncing service’s static IP available during the following steps. It will be required in Step 1.
Step 1: Allow access
Create a rule in a security group or firewall settings to whitelist:- incoming connections to your host and port (usually
9440) from the static IP. - outgoing connections from ports
1024to65535to the static IP.
Step 2: Create writer user
Create a database user to perform the writing of the data.- Open a connection to your ClickHouse database.
- Create a user for the data transfer by executing the following SQL command.
- Grant user required privileges on the database.
Understanding the
CREATE TEMPORARY TABLE, S3 permissionsThe CREATE TEMPORARY TABLE and S3 permissions are required to efficiently transfer data to ClickHouse. Under the hood, these permissions are used to stage data in object storage as compressed files, COPY INTO temporary tables, and finally merge into the target tables. By definition, the temporary table will not exist outside of the session.Step 3: Setup staging bucket
ClickHouse sources require a staging bucket to efficiently transfer data. Configure your staging bucket using one of the following types of ClickHouse supported object storage:- S3
- GCS
- Implicit
Using the
implicit bucket optionClickHouse supports the ability to configure staging resources with environment credentials. If this setting is enabled on your ClickHouse cluster, you may choose to use the configured implicit staging resources using the implicit option for the staging bucket selection.Step 4: Add your destination
Securely share your host name, port, cluster, database name, schema name, username, password, and staging bucket details with us to complete the connection.Understanding the
database vs. schema fields (connection database vs. write database)Depending on the version of your integration, you may be asked for both a database and schema, or a connection database and write database.database(also referred to asconnection_database): is the database used to establish the connection with ClickHouse.schema(also referred to aswrite_database): is the database/schema within which data will be written
Using the ClickHouse data
Querying ClickHouse data without duplicatesThe resulting ClickHouse tables use the ReplacingMergeTree table engine in order to efficiently upsert changes. To properly query this data, the
FINAL keyword must be used when selecting from these tables guarantee duplicates are removed. For example: