turbot · johnsmyth · Sep 22, 2025 · Sep 18, 2025 · Sep 19, 2025 · Sep 19, 2025
diff --git a/docs/collect/configure.md b/docs/collect/configure.md
@@ -15,11 +15,11 @@ Tailpipe [plugins](/docs/collect/plugins) define tables for common log sources a
 
 If your logs are not in a standard format or are not currently supported by a plugin, you can create [custom tables](/docs/collect/custom-tables) to collect data from arbitrary log files and other sources.
 
-Tables are implemented as DuckDB views over the Parquet files.  Tailpipe creates tables (that is, creates views in the `tailpipe.db` database) based on the data and metadata that it discovers in the [workspace](#workspaces), along with the filter rules.
+Tailpipe creates DuckLake tables based on the data and metadata that it discovers in the [workspace](#workspaces), along with the filter rules.
 
-When you run `tailpipe query` or `tailpipe connect`, Tailpipe finds all the tables in the workspace according to the [hive directory layout](/docs/collect/configure#hive-partitioning) and adds a view for the table.  The view definitions will include qualifiers that implement any filter arguments that you specify (`--from`,`--to`,`--index`,`--partition`).
+When you run `tailpipe query` or `tailpipe connect` with any filter arguments that you specify (`--from`,`--to`,`--index`,`--partition`), Tailpipe finds all the tables in the workspace according to the [hive directory layout](/docs/collect/configure#hive-partitioning) and filters the view of the table.
 
-You can see what tables are available with the `tailpipe plugin list` command. 
+You can see what tables are available with the `tailpipe table list` command. 
 
 ## Partitions
 A partition represents data gathered from a [source](/docs/collect/configure#sources). Partitions are defined [in HCL](/docs/reference/config-files/partition) and are required for [collection](/docs/collect/collect).  
@@ -61,20 +61,22 @@ The standard partitioning/hive structure enables efficient queries that only nee
 tp_table=aws_cloudtrail_log
 └── tp_partition=prod
     └── tp_index=default
-        ├── tp_date=2024-12-31
-        │   └── data_20250106140713_740378_0.parquet
-        ├── tp_date=2025-01-01
-        │   └── data_20250106140713_740378_0.parquet
-        ├── tp_date=2025-01-02
-        │   └── snap_20250106140823_952067.parquet
-        ├── tp_date=2025-01-03
-        │   └── snap_20250106140824_011599.parquet
-        ├── tp_date=2025-01-04
-        │   └── data_20250106140752_829722_0.parquet
-        ├── tp_date=2025-01-05
-        │   └── snap_20250106140824_073116.parquet
-        └── tp_date=2025-01-06
-            └── snap_20250106140824_131637.parquet
+        └── year=2024
+            ├── month=7
+            │   └── ducklake-01995d38-7f1e-7867-b7f1-8f523d546353.parquet
+            │   └── ducklake-01995d38-7f75-77ce-a0ec-5972d4d6c7ae.parquet
+            │   └── ducklake-01995d38-7fd2-7365-997d-65a6ad005e83.parquet
+            │   └── ducklake-01995d38-80e5-7185-b15e-5ee808222b73.parquet
+            ├── month=8
+            │   └── ducklake-01995d38-7f1e-7867-b7f1-8f523d546353.parquet
+            │   └── ducklake-01995d38-7f75-77ce-a0ec-5972d4d6c7ae.parquet
+            │   └── ducklake-01995d38-7fd2-7365-997d-65a6ad005e83.parquet
+            │   └── ducklake-01995d38-80e5-7185-b15e-5ee808222b73.parquet
+            ├── month=9
+            │   └── ducklake-01995d38-7f1e-7867-b7f1-8f523d546353.parquet
+            │   └── ducklake-01995d38-7f75-77ce-a0ec-5972d4d6c7ae.parquet
+            │   └── ducklake-01995d38-7fd2-7365-997d-65a6ad005e83.parquet
+            │   └── ducklake-01995d38-80e5-7185-b15e-5ee808222b73.parquet
 ```
 
 

diff --git a/docs/collect/manage-data.md b/docs/collect/manage-data.md
@@ -292,16 +292,16 @@ Plugin:      hub.tailpipe.io/plugins/turbot/aws@latest
 
 
 ## Connecting from Other Tools
-You can connect to your Tailpipe database with the native DuckDB client or other tools and libraries that can connect to DuckDB.  To do so, you can generate a new db file for the connection using `tailpipe connect`:
+You can connect to your Tailpipe database with the native DuckDB client or other tools and libraries that can connect to DuckDB. To do so, you can generate a new SQL script to initialise DuckDB to use the tailpipe database using `tailpipe connect`:
 
 ```bash
 tailpipe connect
 ```
 
-A new DB file will be generated and returned:
+The path to a new SQL script will be returned:
 ```bash
 $ tailpipe connect
-/Users/jsmyth/.tailpipe/data/default/tailpipe_20250409151453.db
+/Users/pskrbasu/.tailpipe/data/default/tailpipe_init_20250918210704.sql
 ```
 
 If you've collected a lot of data and want to optimize your queries for a subset of it, you can pre-filter the database. You can restrict to the most recent 45 days:

diff --git a/docs/query/index.md b/docs/query/index.md
@@ -2,9 +2,16 @@
 title: Query Tailpipe
 ---
 
-# Powered by DuckDB!
+# Powered by DuckDB + DuckLake!
 
-Tailpipe [collects](/docs/collect/collect) logs into a [DuckDB](https://duckdb.org/) database that uses [standard SQL syntax](https://duckdb.org/docs/sql/introduction.html) to query. It's easy to [get started writing queries](/docs/sql), and the [Tailpipe Hub](https://hub.tailpipe.io) provides ***hundreds of example queries*** that you can use or modify for your purposes.  There are [example queries for each table](https://hub.tailpipe.io/plugins/turbot/aws/tables/aws_cloudtrail_log) in every plugin, and you can also [browse, search, and view the queries](https://hub.tailpipe.io/mods/turbot/tailpipe-mod-aws-dections/queries) in every published mod!
+Tailpipe [collects](/docs/collect/collect) logs into open parquet files and catalogs them with [DuckLake](https://ducklake.select/), so you query everything with [standard SQL syntax](https://duckdb.org/docs/sql/introduction.html). This brings a simple "lakehouse" model: open data files, a lightweight metadata catalog, and fast local analytics. 
+
+- Open formats: data is stored as Parquet on disk.
+- Cataloged: DuckLake tracks tables/columns/partitions for efficient queries.
+- Fast by design: partition pruning and vectorized execution via DuckDB.
+- SQL-first: use familiar DuckDB syntax, functions, and tooling.
+
+It's easy to [get started writing queries](/docs/sql), and the [Tailpipe Hub](https://hub.tailpipe.io) provides ***hundreds of example queries*** that you can use or modify for your purposes.  There are [example queries for each table](https://hub.tailpipe.io/plugins/turbot/aws/tables/aws_cloudtrail_log) in every plugin, and you can also [browse, search, and view the queries](https://hub.tailpipe.io/mods/turbot/tailpipe-mod-aws-dections/queries) in every published mod!
 
 
 ## Interactive Query Shell

diff --git a/docs/query/snapshots.md b/docs/query/snapshots.md
@@ -16,7 +16,7 @@ To upload snapshots to Turbot Pipes, you must either [log in via the `powerpipe
 To take a snapshot and save it to [Turbot Pipes](https://turbot.com/pipes/docs), simply add the `--snapshot` flag to your command.  
 
 ```bash
-powerpipe query run "select * from aws_cloudtrail_log order by tp_date desc limit 1000" --snapshot
+powerpipe query run "select * from aws_cloudtrail_log order by tp_timestamp desc limit 1000" --snapshot
 ```
 
 ```bash
@@ -34,13 +34,13 @@ powerpipe benchmark run cloudtrail_log_detections --share
 You can set a snapshot title in Turbot Pipes with the `--snapshot-title` argument.
 
 ```bash
-powerpipe query run "select * from aws_cloudtrail_log order by tp_date desc limit 1000" --share --snapshot-title "Recent Cloudtrail log lines"
+powerpipe query run "select * from aws_cloudtrail_log order by tp_timestamp desc limit 1000" --share --snapshot-title "Recent Cloudtrail log lines"
 ```
 
 If you wish to save the snapshot to a different workspace, such as an org workspace, you can use the `--snapshot-location` argument with `--share` or `--snapshot`:
 
 ```bash
-powerpipe query run "select * from aws_cloudtrail_log order by tp_date desc limit 1000" --share --snapshot-location my-org/my-workspace
+powerpipe query run "select * from aws_cloudtrail_log order by tp_timestamp desc limit 1000" --share --snapshot-location my-org/my-workspace
 
 ```
 

diff --git a/docs/reference/cli/connect.md b/docs/reference/cli/connect.md
@@ -4,7 +4,12 @@ title: tailpipe connect
 
 # tailpipe connect
 
-Return a connection string for a database with a schema determined by the provided parameters.
+Return the path of SQL script to initialise DuckDB to use the tailpipe database.
+
+The generated SQL script contains:
+- DuckDB extension installations (sqlite, ducklake)
+- Database attachment configuration
+- View definitions with optional filters
 
 ## Usage
 ```bash
@@ -32,15 +37,15 @@ tailpipe connect --from 2025-01-01
 ```
 
 ```bash
-/home/jon/.tailpipe/data/default/tailpipe_20250115140447.db
+/Users/pskrbasu/.tailpipe/data/default/tailpipe_init_20250918204456.sql
 ```
 
 > [!NOTE]
-> You can use this connection string with DuckDB to directly query the Tailpipe database.
-To ensure compatibility with tables that include JSON columns, make sure you’re using DuckDB version 1.1.3 or later.
+> You can use this sql script with DuckDB to directly query the Tailpipe database.
+To ensure compatibility with DuckLake features, make sure you’re using DuckDB version 1.4.0 or later.
 > 
 > ```bash
-> duckdb /home/jon/.tailpipe/data/default/tailpipe_20241212134120.db
+>  duckdb -init /Users/pskrbasu/.tailpipe/data/default/tailpipe_init_20250918204456.sql
 > ```
 
 Connect with no filter, show output as json:
@@ -50,6 +55,6 @@ tailpipe connect --output json
 ```
 
 ```bash
-{"database_filepath":"/Users/jonudell/.tailpipe/data/default/tailpipe_20250129204416.db"}
+{"init_script_path":"/Users/pskrbasu/.tailpipe/data/default/tailpipe_init_20250918204828.sql"}
 ```
 
diff --git a/docs/reference/glossary.md b/docs/reference/glossary.md
@@ -30,7 +30,7 @@ A detection is a Tailpipe query, optionally bundled into a benchmark, that runs
 
 ## DuckDB
 
-Tailpipe uses DuckDB, an embeddable column-oriented database. DuckDB reads the Parquet files created by `tailpipe collect` and enables queries against that data.
+Tailpipe uses DuckDB for fast local analytics over Parquet data. DuckLake maintains a lightweight metadata catalog (`metadata.sqlite`) that references the Parquet files collected by Tailpipe, so you query with standard DuckDB SQL while benefiting from partition pruning and a lakehouse-style layout.
 
 ## Format
 A [format](/docs/reference/config-files/format) describe the layout of the source data so that it can be collected into a table.
@@ -40,7 +40,7 @@ A [format type](/docs/reference/config-files/format#format-types) defines the pa
 
 ## Hive
 
-A tree of Parquet files in the Tailpipe workspace (by default,`~/.tailpipe/data/default`). The `tailpipe.db` in `~/.tailpipe/data/default` (and derivatives created by `tailpipe connect`, e.g. `tailpipe_20241212152506.db`) are thin wrappers that materialize views over the Parquet data.
+A tree of Parquet files in the Tailpipe workspace (by default, `~/.tailpipe/data/default`), organized with hive-style partition keys (for example, `tp_table=.../tp_partition=.../tp_index=.../year=YYYY/month=mm`). DuckLake’s catalog (`metadata.sqlite`) points to these files to enable efficient SQL queries.
 
 ## Index
 

diff --git a/docs/sql/index.md b/docs/sql/index.md
@@ -27,7 +27,7 @@ You can **filter** rows where columns only have a specific value:
 ```sql
 select
   tp_partition,
-  tp_date,
+  tp_timestamp,
   aws_region,
   event_type
 from
@@ -41,7 +41,7 @@ or a **range** of values:
 ```sql
 select
   tp_partition,
-  tp_date,
+  tp_timestamp,
   aws_region,
   event_type
 from
@@ -55,7 +55,7 @@ or match a **pattern**:
 ```sql
 select
   tp_partition,
-  tp_date,
+  tp_timestamp,
   aws_region,
   event_type,
   event_name
@@ -70,23 +70,23 @@ You can **filter on multiple columns**, joined by `and` or `or`:
 ```sql
 select
   tp_partition,
-  tp_date,
+  tp_timestamp,
   aws_region,
   event_type,
   event_name
 from
   aws_cloudtrail_log
 where
   event_name = 'UpdateTrail'
-  and tp_date > date '2024-11-06';
+  and tp_timestamp > date '2024-11-06';
 ```
 
 You can **sort** your results:
 
 ```sql
 select
   tp_partition,
-  tp_date,
+  tp_timestamp,
   aws_region,
   event_type,
   event_name
@@ -101,15 +101,15 @@ You can **sort on multiple columns, ascending or descending**:
 ```sql
 select
   tp_partition,
-  tp_date,
+  tp_timestamp,
   aws_region,
   event_type,
   event_name
 from
   aws_cloudtrail_log
 order by
   aws_region asc,
-  tp_date desc;
+  tp_timestamp desc;
 ```
 
 You can group and use standard aggregate functions. You can **count** results:
@@ -147,7 +147,7 @@ or exclude **all but one matching row**:
 ```sql
 select distinct on (event_type)
   tp_partition,
-  tp_date,
+  tp_timestamp,
   aws_region,
   event_type,
   event_name

diff --git a/docs/sql/querying-ips.md b/docs/sql/querying-ips.md
@@ -12,7 +12,7 @@ You can find requests **from a specific IP address**:
 ```sql
 select
   tp_partition,
-  tp_date,
+  tp_timestamp,
   aws_region,
   event_type
 from

diff --git a/docs/sql/tips.md b/docs/sql/tips.md
@@ -20,10 +20,10 @@ select count(*) from aws_cloudtrail_log where partition = 'prod'
 select count(*) from aws_cloudtrail_log where partition = 'prod' and index = 123456789
 ```
 
-*Date*. Each file contains log data for one day. You can filter to include only files for that day.
+*Timestamp*. Filter by timestamp, to efficiently get all matching files.
 
 ```sql
-select count(*) from aws_cloudtrail_log where partition = 'prod' and index = 123456789 and tp_date = '2024-12-01'
+select count(*) from aws_cloudtrail_log where partition = 'prod' and index = 123456789 and tp_timestamp > date '2024-12-01'
 ```
 
 The [hive directory structure](/docs/collect/configure#hive-partitioning) enables you to exclude large numbers of Parquet files.