From 5fa4a29c05ad2ca7858d7d3dbcfae8b0db28b928 Mon Sep 17 00:00:00 2001 From: SamyOubouaziz Date: Mon, 18 Aug 2025 16:17:02 +0200 Subject: [PATCH 1/6] docs(dwh): add bigquery migration guide MTA-6297 --- .../how-to/migrate-from-bigquery.mdx | 90 +++++++++++++++++++ 1 file changed, 90 insertions(+) create mode 100644 pages/data-warehouse/how-to/migrate-from-bigquery.mdx diff --git a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx new file mode 100644 index 0000000000..a550368938 --- /dev/null +++ b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx @@ -0,0 +1,90 @@ +--- +title: How to migrate data from Google BigQuery +description: Learn how to migrate data from Google BigQuery to your Scaleway Data Warehouse for ClickHouse® deployment. +tags: connect migration transfer copy data alternative migrate ClickHouse® integrate integration +dates: + validation: 2025-08-18 + posted: 2025-08-18 +--- +import Requirements from '@macros/iam/requirements.mdx' + +This page explains how to migrate anaytical datasets from Google BigQuery to a Scaleway Data Warehouse for ClickHouse® deployment. THe instructions are based on the [official ClickHouse® guide](https://clickhouse.com/docs/migrations/bigquery/migrating-to-clickhouse-cloud) to migrate from Google BigQuery. + +This documentation exemplifies the migration procedure using the [New York Taxi Data](https://clickhouse.com/docs/getting-started/example-datasets/nyc-taxi) provided by ClickHouse®. + + + +- A Scaleway account logged into the [console](https://console.scaleway.com) +- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization +- A working Google Cloud Provider account with access to BigQuery. +- [Created a Data Warehouse for ClickHouse® deployment](/data-warehouse/how-to/create-deployment/). + +## How to export data from Google BigQuery + +Google BigQuery can only export data to Google CLoud Storage (GCS), so you must copy your data to GCS first, then transfer it from GCS to Scaleway Object Storage before ingesting it to your Data Warehouse for ClickHouse® deployment. + +### Exporting BigQuery data to GCS + +1. Log in to your Google Cloud account, then open BigQuery. + +2. Use the `EXPORT DATA` statement to export tables to GCS in the `Parquet` format. Make sure to replace `your-bucket-name` with your GCS bucket name: + + ```sql + EXPORT DATA OPTIONS ( + uri='gs://your-bucket-name/nyc_taxi_data/*.parquet', + format='PARQUET', + overwrite=true + ) AS + + SELECT * FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2016`; + ``` + + +- The `*` in the bucket URI allows Google BigQuery to shard the export into multiple parts if necessary. +- You must have write access to the specified GCS bucket to perform this action. + + +### Transfering data to Scaleway Object Storage + +To copy data from Google Cloud Storage (GCS) to Scaleway Object Storage, we recommend using [Rclone](https://rclone.org/), as it is compatible with both Google Cloud Storage and Scaleway Object storage, and allows you to easily copy data from a cloud provider to another. + +1. Run the command below to install Rclone, or refer to the [official documentation](https://rclone.org/downloads/) for alternative methods: + + ```sh + curl https://rclone.org/install.sh | sudo bash + ``` + +2. Run the command below to start configuring your GCS remote: + ```sh + rclone config + ``` + +3. Enter the following parameters when prompted: + - Enter `n` to create a new remote. + - Name: `gcs` + - Storage type: `Google Cloud Storage` + - ID and secret (service account JSON recommended) + + Your GCS remote for Rclone is now configured. + +4. Run the command below to start configuring your Scaleway Object Storage remote: + ```sh + rclone config + ``` + +5. Enter the following parameters when prompted: + - Enter `n` to create a new remote. + - Name: `scw` + - Storage type: `s3` + - Provider: `Scaleway` + - Endpoint: `s3.fr-par.scw.cloud` (update according to the selected region) + - API access key and secret key + +6. Run the command below to copy the cotntent of your GCS bucket to your Scaleway Object Storage bucket. Make sure to replace the placeholders with the correct values: + ```sh + rclone copy gcs:your-gcs-bucket scw:your-scw-bucket --progress + ``` + +Your Scaleway Object Storage now contains data exported from Google BigQuery in Parquet format, which can now be ingested into your Data Warehouse for ClickHouse® deployment. + + From 04981f7571ec7ecada27e7b5fd2aa9f5ffd2fdc9 Mon Sep 17 00:00:00 2001 From: SamyOubouaziz Date: Tue, 19 Aug 2025 09:57:02 +0200 Subject: [PATCH 2/6] docs(dwh): update --- .../how-to/migrate-from-bigquery.mdx | 61 +++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx index a550368938..a033175ee1 100644 --- a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx +++ b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx @@ -87,4 +87,65 @@ To copy data from Google Cloud Storage (GCS) to Scaleway Object Storage, we reco Your Scaleway Object Storage now contains data exported from Google BigQuery in Parquet format, which can now be ingested into your Data Warehouse for ClickHouse® deployment. +## Ingesting data into your Data Warehouse for ClickHouse® deployment +1. Connect to your deployment by following the [dedicated documentation](/data-warehouse/how-to/connect-applications/). Alternatively, you can use the ClickHouse® Console from your deployment's **Overview** page. + +2. Run the command below to create a database and a table to store your new data: + + ```sql + CREATE DATABASE IF NOT EXISTS nyc_taxi; + + CREATE TABLE nyc_taxi.trips_small + ( + pickup_datetime DateTime, + dropoff_datetime DateTime, + pickup_ntaname String + -- Add other relevant columns + ) + ENGINE = MergeTree() + ORDER BY pickup_datetime; + ``` + +2. Run the command below Import data from your Scaleway Object Storage bucket. + + ```sql + INSERT INTO nyc_taxi.trips_small + SELECT + trip_id, + pickup_datetime, + dropoff_datetime, + pickup_longitude, + pickup_latitude, + dropoff_longitude, + dropoff_latitude, + passenger_count, + trip_distance, + fare_amount, + extra, + tip_amount, + tolls_amount, + total_amount, + payment_type, + pickup_ntaname, + dropoff_ntaname + FROM s3( + 'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_{0..2}.gz', + 'TabSeparatedWithNames' + ); + ``` + +3. Run the sample query below to make sure your data was properly ingested: + + ```sql + SELECT + pickup_ntaname, + count(*) AS count + FROM nyc_taxi.trips_small + WHERE pickup_ntaname != '' + GROUP BY pickup_ntaname + ORDER BY count DESC + LIMIT 10; + ``` + +Your data is now imported into your Data Warehouse for ClickHouse® deployment. \ No newline at end of file From a2af3e4348c9ae3b15c06343939bb9bc4756b12b Mon Sep 17 00:00:00 2001 From: Samy OUBOUAZIZ Date: Wed, 5 Nov 2025 11:41:23 +0100 Subject: [PATCH 3/6] docs(dwh): update --- pages/data-warehouse/how-to/migrate-from-bigquery.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx index a033175ee1..48e131b891 100644 --- a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx +++ b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx @@ -16,7 +16,7 @@ This documentation exemplifies the migration procedure using the [New York Taxi - A Scaleway account logged into the [console](https://console.scaleway.com) - [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization -- A working Google Cloud Provider account with access to BigQuery. +- A working Google Cloud Provider account with access to BigQuery and Google Cloud Storage. - [Created a Data Warehouse for ClickHouse® deployment](/data-warehouse/how-to/create-deployment/). ## How to export data from Google BigQuery @@ -63,7 +63,7 @@ To copy data from Google Cloud Storage (GCS) to Scaleway Object Storage, we reco - Enter `n` to create a new remote. - Name: `gcs` - Storage type: `Google Cloud Storage` - - ID and secret (service account JSON recommended) + - ID and secret (service account JSON file recommended) Your GCS remote for Rclone is now configured. @@ -77,7 +77,7 @@ To copy data from Google Cloud Storage (GCS) to Scaleway Object Storage, we reco - Name: `scw` - Storage type: `s3` - Provider: `Scaleway` - - Endpoint: `s3.fr-par.scw.cloud` (update according to the selected region) + - Endpoint: `s3.fr-par.scw.cloud` (update according to your preferred region) - API access key and secret key 6. Run the command below to copy the cotntent of your GCS bucket to your Scaleway Object Storage bucket. Make sure to replace the placeholders with the correct values: @@ -130,7 +130,7 @@ Your Scaleway Object Storage now contains data exported from Google BigQuery in pickup_ntaname, dropoff_ntaname FROM s3( - 'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_{0..2}.gz', + '/nyc-taxi/trips_{0..2}.gz', 'TabSeparatedWithNames' ); ``` From 200f4047721f0f2b9d35aef7969c970243c5421d Mon Sep 17 00:00:00 2001 From: Samy OUBOUAZIZ Date: Wed, 5 Nov 2025 13:50:44 +0100 Subject: [PATCH 4/6] docs(dwh): update --- pages/data-warehouse/how-to/migrate-from-bigquery.mdx | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx index 48e131b891..3a24be9f30 100644 --- a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx +++ b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx @@ -58,9 +58,8 @@ To copy data from Google Cloud Storage (GCS) to Scaleway Object Storage, we reco ```sh rclone config ``` - -3. Enter the following parameters when prompted: - - Enter `n` to create a new remote. + +3. Create a new remote, then enter the following parameters when prompted: - Name: `gcs` - Storage type: `Google Cloud Storage` - ID and secret (service account JSON file recommended) @@ -72,8 +71,7 @@ To copy data from Google Cloud Storage (GCS) to Scaleway Object Storage, we reco rclone config ``` -5. Enter the following parameters when prompted: - - Enter `n` to create a new remote. +5. Create a new remote, then enter the following parameters when prompted: - Name: `scw` - Storage type: `s3` - Provider: `Scaleway` @@ -148,4 +146,4 @@ Your Scaleway Object Storage now contains data exported from Google BigQuery in LIMIT 10; ``` -Your data is now imported into your Data Warehouse for ClickHouse® deployment. \ No newline at end of file +Your data is now imported into your Data Warehouse for ClickHouse® deployment. From 001b26aeabb1c7a35a9c3578085eaa0d758f7357 Mon Sep 17 00:00:00 2001 From: Samy OUBOUAZIZ Date: Wed, 5 Nov 2025 13:51:07 +0100 Subject: [PATCH 5/6] docs(dwh): update --- pages/data-warehouse/how-to/migrate-from-bigquery.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx index 3a24be9f30..61511d1dcc 100644 --- a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx +++ b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx @@ -3,8 +3,8 @@ title: How to migrate data from Google BigQuery description: Learn how to migrate data from Google BigQuery to your Scaleway Data Warehouse for ClickHouse® deployment. tags: connect migration transfer copy data alternative migrate ClickHouse® integrate integration dates: - validation: 2025-08-18 - posted: 2025-08-18 + validation: 2025-11-05 + posted: 2025-11-05 --- import Requirements from '@macros/iam/requirements.mdx' From 61b12d0f5c840cc9c1234e6beb773c5e917317db Mon Sep 17 00:00:00 2001 From: SamyOubouaziz Date: Tue, 18 Nov 2025 11:46:49 +0100 Subject: [PATCH 6/6] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Néda <87707325+nerda-codes@users.noreply.github.com> --- .../how-to/migrate-from-bigquery.mdx | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx index 61511d1dcc..7a2ebb849d 100644 --- a/pages/data-warehouse/how-to/migrate-from-bigquery.mdx +++ b/pages/data-warehouse/how-to/migrate-from-bigquery.mdx @@ -16,12 +16,12 @@ This documentation exemplifies the migration procedure using the [New York Taxi - A Scaleway account logged into the [console](https://console.scaleway.com) - [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization -- A working Google Cloud Provider account with access to BigQuery and Google Cloud Storage. -- [Created a Data Warehouse for ClickHouse® deployment](/data-warehouse/how-to/create-deployment/). +- A working Google Cloud Provider account with access to BigQuery and Google Cloud Storage +- [Created a Data Warehouse for ClickHouse® deployment](/data-warehouse/how-to/create-deployment/) ## How to export data from Google BigQuery -Google BigQuery can only export data to Google CLoud Storage (GCS), so you must copy your data to GCS first, then transfer it from GCS to Scaleway Object Storage before ingesting it to your Data Warehouse for ClickHouse® deployment. +Google BigQuery can only export data to Google Cloud Storage (GCS), so you must copy your data to GCS first, then transfer it from GCS to Scaleway Object Storage before ingesting it to your Data Warehouse for ClickHouse® deployment. ### Exporting BigQuery data to GCS @@ -44,9 +44,9 @@ Google BigQuery can only export data to Google CLoud Storage (GCS), so you must - You must have write access to the specified GCS bucket to perform this action. -### Transfering data to Scaleway Object Storage +### Transferring data to Scaleway Object Storage -To copy data from Google Cloud Storage (GCS) to Scaleway Object Storage, we recommend using [Rclone](https://rclone.org/), as it is compatible with both Google Cloud Storage and Scaleway Object storage, and allows you to easily copy data from a cloud provider to another. +To copy data from Google Cloud Storage (GCS) to Scaleway Object Storage, we recommend using [Rclone](https://rclone.org/), as it is compatible with both Google Cloud Storage and Scaleway Object Storage, and allows you to easily copy data from a cloud provider to another. 1. Run the command below to install Rclone, or refer to the [official documentation](https://rclone.org/downloads/) for alternative methods: @@ -78,7 +78,7 @@ To copy data from Google Cloud Storage (GCS) to Scaleway Object Storage, we reco - Endpoint: `s3.fr-par.scw.cloud` (update according to your preferred region) - API access key and secret key -6. Run the command below to copy the cotntent of your GCS bucket to your Scaleway Object Storage bucket. Make sure to replace the placeholders with the correct values: +6. Run the command below to copy the content of your GCS bucket to your Scaleway Object Storage bucket. Make sure to replace the placeholders with the correct values: ```sh rclone copy gcs:your-gcs-bucket scw:your-scw-bucket --progress ``` @@ -105,7 +105,7 @@ Your Scaleway Object Storage now contains data exported from Google BigQuery in ORDER BY pickup_datetime; ``` -2. Run the command below Import data from your Scaleway Object Storage bucket. +2. Run the command below to import data from your Scaleway Object Storage bucket. ```sql INSERT INTO nyc_taxi.trips_small