ADF Pipeline: Save CSV to ADLS and Create Delta Table

Why ADF + Databricks?

Azure Data Factory (ADF) can connect to a wide variety of data sources across different vendors, both on-premises and in the cloud. This makes it an attractive option for enterprises that want flexible orchestration of ETL/ELT pipelines without being locked into a single vendor ecosystem. Databricks, built on Apache Spark, provides significant performance advantages for large-scale data processing through advanced parallelism. Its scalability can lead to cost efficiencies when handling ingestion and transformation of big data.

Overview

This Azure Data Factory pipeline integrates Azure Databricks with Azure Data Lake Storage Gen2 (ADLS Gen2) to provide a best-practice quick start for ingesting raw CSV files and transforming them into Delta tables.
The pipeline implements the Landing → Bronze phase of the Medallion architecture (Landing → Bronze → Silver → Gold), establishing a foundation for scalable and structured data processing.

The pipeline performs two key stages:

Source to Landing
- Retrieves a CSV file from a Microsoft Learn GitHub repository.
- Appends it into the Landing zone of ADLS Gen2 with a unique file name.
Landing to Bronze
- Dynamically creates a Bronze Delta table in Unity Catalog with tracking metadata-columns.

Setup Steps

Below are the steps to prepare your Azure environment so that the ADF pipeline + Databricks notebooks will work correctly.
They follow the pattern:

create ADLS → create Databricks & access connector → grant identities access → import notebooks → import ADF template → wire up connections

Prerequisites & Resource Setup

1. Create an ADLS Gen2 storage account

Enable Hierarchical Namespace when creating.
Note the Storage Account name, Resource Group, etc.

2. Create folders in ADLS

Within the storage account’s container/file system, create:

00_landing/products/
01_bronze/products/

3. Create an Azure Databricks workspace

In Azure Portal (or via CLI/ARM).

4. Create a Databricks Access Connector

In Azure Portal (or via CLI/ARM).
This creates a managed identity / access connector used for secure access.
References: Access Storage with Managed Identities

Permissions / Identity Granting

1. Grant the Access Connector Managed Identity access to ADLS

In Azure Storage account → IAM → Add a role assignment.
Storage Blob Data Contributor (As a minimum. See the reference below for recommended additional access).
Reference: Access control model for MI

2. Enable ADF’s Managed Identity and grant access

Turn on System-Assigned Managed Identity in the ADF resource.
Grant permissions on ADLS (read/write as needed).
Grant permissions on the Databricks workspace.

3. Ensure Databricks has correct permissions

Access Connector / Service Principal / Managed Identity should have rights to:
- ADLS
- Catalog permissions
- Compute (to run jobs)

Import & Wire Up

1. Import Notebooks into Databricks

Download and Import the notebooks (Process Data, Landing to Bronze) from this repository into your Databricks workspace.
Verify:
- Full notebook path (exactly as shown in Databricks) — you will reference this in ADF.
- Run the notebook manually in Databricks to confirm it executes end-to-end.

2. Import ADF Pipeline Template

In Azure Data Factory:
Go to Manage → Templates → Import template / Gallery to import the pipeline.

3. Configure Linked Services & Connections in ADF

Databricks Linked Service
Create a Linked Service to your Databricks workspace.
Authentication options: Databricks access token or Managed Identity.
Ensure the linked service points to the correct workspace URL and has the chosen auth method configured.

4. Set Pipeline Parameters & Map to Databricks Notebook Activity

Define pipeline parameters:

Under the Settings of each notebook activity, reference the notebooks from Databricks using the full path (or use the ADF UI to browse and select from the saved location)
Confirm pipeline parameter names correctly map to notebook base parameters

5. Validate & Run ADF Pipeline

Validate and run the pipeline

Success

You now have a scalable ADF pipeline that can:

Save CSVs to ADLS Gen2 (Landing).
Create a Bronze Delta table for further processing.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
Landing to Bronze.ipynb		Landing to Bronze.ipynb
Process Data.ipynb		Process Data.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ADF Pipeline: Save CSV to ADLS and Create Delta Table

Why ADF + Databricks?

Overview

Setup Steps