Skip to content

Commit 185ed85

Browse files
authored
Polish guards of transform. (#2645)
1 parent fa52d4c commit 185ed85

File tree

14 files changed

+224
-266
lines changed

14 files changed

+224
-266
lines changed

docs/cn/guides/40-load-data/04-transform/04-querying-metadata.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ sidebar_label: 元数据
1717

1818
## 查询元数据详细指南
1919

20-
| 文件格式 | 指南 |
21-
| ----------- | ------------------------------------------------------------------------------------ |
22-
| Parquet | [使用元数据查询 Parquet 文件](./00-querying-parquet.md#query-with-metadata) |
23-
| CSV | [使用元数据查询 CSV 文件](./01-querying-csv.md#query-with-metadata) |
24-
| TSV | [使用元数据查询 TSV 文件](./02-querying-tsv.md#query-with-metadata) |
25-
| NDJSON | [使用元数据查询 NDJSON 文件](./03-querying-ndjson.md#query-with-metadata) |
26-
| ORC | [使用元数据查询 ORC 文件](./03-querying-orc.md#query-with-metadata) |
27-
| Avro | [使用元数据查询 Avro 文件](./04-querying-avro.md#query-with-metadata) |
20+
| 文件格式 | 指南 |
21+
| ----------- |--------------------------------------------------------------------|
22+
| Parquet | [使用元数据查询 Parquet 文件](./00-querying-parquet.md#query-with-metadata) |
23+
| CSV | [使用元数据查询 CSV 文件](./01-querying-csv.md#query-with-metadata) |
24+
| TSV | [使用元数据查询 TSV 文件](./02-querying-tsv.md#query-with-metadata) |
25+
| NDJSON | [使用元数据查询 NDJSON 文件](./03-querying-ndjson.md#query-with-metadata) |
26+
| ORC | [使用元数据查询 ORC 文件](./05-querying-orc.md#query-with-metadata) |
27+
| Avro | [使用元数据查询 Avro 文件](./04-querying-avro.md#query-with-metadata) |

docs/cn/guides/40-load-data/04-transform/index.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,11 @@ FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
3636

3737
## 支持的文件格式
3838

39-
| 文件格式 | 返回格式 | 访问方法 | 示例 | 指南 |
40-
| ----------- | ------------ | ------------- | ------- | ----- |
39+
| 文件格式 | 返回格式 | 访问方法 | 示例 | 指南 |
40+
| ----------- | ------------ | ------------- | ------- |-------------------------------------------|
4141
| Parquet | 原生数据类型 | 直接列名 | `SELECT id, name FROM` | [查询 Parquet 文件](./00-querying-parquet.md) |
42-
| ORC | 原生数据类型 | 直接列名 | `SELECT id, name FROM` | [查询 ORC 文件](./03-querying-orc.md) |
43-
| CSV | 字符串值 | 位置引用 `$<position>` | `SELECT $1, $2 FROM` | [查询 CSV 文件](./01-querying-csv.md) |
44-
| TSV | 字符串值 | 位置引用 `$<position>` | `SELECT $1, $2 FROM` | [查询 TSV 文件](./02-querying-tsv.md) |
45-
| NDJSON | Variant 对象 | 路径表达式 `$1:<field>` | `SELECT $1:id, $1:name FROM` | [查询 NDJSON 文件](./03-querying-ndjson.md) |
46-
| Avro | Variant 对象 | 路径表达式 `$1:<field>` | `SELECT $1:id, $1:name FROM` | [查询 Avro 文件](./04-querying-avro.md) |
42+
| ORC | 原生数据类型 | 直接列名 | `SELECT id, name FROM` | [查询 ORC 文件](./05-querying-orc.md) |
43+
| CSV | 字符串值 | 位置引用 `$<position>` | `SELECT $1, $2 FROM` | [查询 CSV 文件](./01-querying-csv.md) |
44+
| TSV | 字符串值 | 位置引用 `$<position>` | `SELECT $1, $2 FROM` | [查询 TSV 文件](./02-querying-tsv.md) |
45+
| NDJSON | Variant 对象 | 路径表达式 `$1:<field>` | `SELECT $1:id, $1:name FROM` | [查询 NDJSON 文件](./03-querying-ndjson.md) |
46+
| Avro | Variant 对象 | 路径表达式 `$1:<field>` | `SELECT $1:id, $1:name FROM` | [查询 Avro 文件](./04-querying-avro.md) |

docs/cn/guides/40-load-data/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Databend 强大的 ETL 能力支持从多种数据源和格式高效加载数据
5252
<summary> ORC </summary>
5353

5454
- [将 ORC 数据导入表](./03-load-semistructured/04-load-orc.md)
55-
- [直接查询 ORC 文件](./04-transform/03-querying-orc.md)
55+
- [直接查询 ORC 文件](./04-transform/05-querying-orc.md)
5656

5757
</details>
5858

docs/en/guides/40-load-data/04-transform/00-querying-parquet.md

Lines changed: 22 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -3,32 +3,12 @@ title: Querying Parquet Files in Stage
33
sidebar_label: Parquet
44
---
55

6-
## Query Parquet Files in Stage
76

8-
Syntax:
9-
```sql
10-
SELECT [<alias>.]<column> [, <column> ...]
11-
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
12-
[(
13-
[<connection_parameters>],
14-
[ PATTERN => '<regex_pattern>'],
15-
[ FILE_FORMAT => 'PARQUET | <custom_format_name>'],
16-
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ],
17-
[ CASE_SENSITIVE => true | false ]
18-
)]
19-
```
20-
21-
:::info Tips
22-
**Query Return Content Explanation:**
7+
## Syntax:
238

24-
* **Return Format**: Column values in their native data types (not variants)
25-
* **Access Method**: Directly use column names `column_name`
26-
* **Example**: `SELECT id, name, age FROM @stage_name`
27-
* **Key Features**:
28-
* No need for path expressions (like `$1:name`)
29-
* No type casting required
30-
* Parquet files contain embedded schema information
31-
:::
9+
- [Query rows as Variants](./index.md#query-rows-as-variants)
10+
- [Query columns by name](./index.md#query-columns-by-name)
11+
- [Query Metadata](./index.md#query-metadata)
3212

3313
## Tutorial
3414

@@ -47,14 +27,14 @@ CONNECTION = (
4727
### Step 2. Create Custom Parquet File Format
4828

4929
```sql
50-
CREATE FILE FORMAT parquet_query_format
51-
TYPE = PARQUET
52-
;
30+
CREATE FILE FORMAT parquet_query_format TYPE = PARQUET;
5331
```
5432
- More Parquet file format options refer to [Parquet File Format Options](/sql/sql-reference/file-format-options#parquet-options)
5533

5634
### Step 3. Query Parquet Files
5735

36+
query with colum names:
37+
5838
```sql
5939
SELECT *
6040
FROM @parquet_query_stage
@@ -63,6 +43,20 @@ FROM @parquet_query_stage
6343
PATTERN => '.*[.]parquet'
6444
);
6545
```
46+
47+
query with path expressions:
48+
49+
50+
```sql
51+
SELECT $1
52+
FROM @parquet_query_stage
53+
(
54+
FILE_FORMAT => 'parquet_query_format',
55+
PATTERN => '.*[.]parquet'
56+
);
57+
```
58+
59+
6660
### Query with Metadata
6761

6862
Query Parquet files directly from a stage, including metadata columns like `METADATA$FILENAME` and `METADATA$FILE_ROW_NUMBER`:
@@ -77,4 +71,4 @@ FROM @parquet_query_stage
7771
FILE_FORMAT => 'parquet_query_format',
7872
PATTERN => '.*[.]parquet'
7973
);
80-
```
74+
```

docs/en/guides/40-load-data/04-transform/01-querying-csv.md

Lines changed: 3 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,10 @@ title: Querying CSV Files in Stage
33
sidebar_label: CSV
44
---
55

6-
## Query CSV Files in Stage
6+
## Syntax:
77

8-
Syntax:
9-
```sql
10-
SELECT [<alias>.]$<col_position> [, $<col_position> ...]
11-
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
12-
[(
13-
[<connection_parameters>],
14-
[ PATTERN => '<regex_pattern>'],
15-
[ FILE_FORMAT => 'CSV| <custom_format_name>'],
16-
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
17-
)]
18-
```
19-
20-
21-
:::info Tips
22-
**Query Return Content Explanation:**
23-
24-
* **Return Format**: Individual column values as strings by default
25-
* **Access Method**: Use positional references `$<col_position>` (e.g., `$1`, `$2`, `$3`)
26-
* **Example**: `SELECT $1, $2, $3 FROM @stage_name`
27-
* **Key Features**:
28-
* Columns accessed by position, not by name
29-
* Each `$<col_position>` refers to a single column, not the whole row
30-
* Type casting required for non-string operations (e.g., `CAST($1 AS INT)`)
31-
* No embedded schema information in CSV files
32-
:::
8+
- [Query columns by position](./index.md#query-columns-by-position)
9+
- [Query Metadata](./index.md#query-metadata)
3310

3411
## Tutorial
3512

docs/en/guides/40-load-data/04-transform/02-querying-tsv.md

Lines changed: 3 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,11 @@ title: Querying TSV Files in Stage
33
sidebar_label: TSV
44
---
55

6-
## Query TSV Files in Stage
6+
## Syntax:
77

8-
Syntax:
9-
```sql
10-
SELECT [<alias>.]$<col_position> [, $<col_position> ...]
11-
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
12-
[(
13-
[<connection_parameters>],
14-
[ PATTERN => '<regex_pattern>'],
15-
[ FILE_FORMAT => 'TSV| <custom_format_name>'],
16-
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
17-
)]
18-
```
19-
20-
21-
:::info Tips
22-
**Query Return Content Explanation:**
8+
- [Query columns by position](./index.md#query-columns-by-position)
9+
- [Query Metadata](./index.md#query-metadata)
2310

24-
* **Return Format**: Individual column values as strings by default
25-
* **Access Method**: Use positional references `$<col_position>` (e.g., `$1`, `$2`, `$3`)
26-
* **Example**: `SELECT $1, $2, $3 FROM @stage_name`
27-
* **Key Features**:
28-
* Columns accessed by position, not by name
29-
* Each `$<col_position>` refers to a single column, not the whole row
30-
* Type casting required for non-string operations (e.g., `CAST($1 AS INT)`)
31-
* No embedded schema information in TSV files
32-
:::
3311

3412
## Tutorial
3513

docs/en/guides/40-load-data/04-transform/03-querying-ndjson.md

Lines changed: 4 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -21,33 +21,10 @@ NDJSON (Newline Delimited JSON) is a JSON-based file format where each line cont
2121
- **Big data compatible**: Widely used in log files, data exports, and ETL pipelines
2222
- **Easy to process**: Each line is an independent JSON object, enabling parallel processing
2323

24-
## Query NDJSON Files in Stage
24+
## Syntax
2525

26-
Syntax:
27-
```sql
28-
SELECT [<alias>.]$1:<column> [, $1:<column> ...]
29-
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
30-
[(
31-
[<connection_parameters>],
32-
[ PATTERN => '<regex_pattern>'],
33-
[ FILE_FORMAT => 'NDJSON| <custom_format_name>'],
34-
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
35-
)]
36-
```
37-
38-
39-
:::info Tips
40-
**Query Return Content Explanation:**
41-
42-
* **Return Format**: Each row as a single variant object (referenced as `$1`)
43-
* **Access Method**: Use path expressions `$1:column_name`
44-
* **Example**: `SELECT $1:title, $1:author FROM @stage_name`
45-
* **Key Features**:
46-
* Must use path notation to access specific fields
47-
* Type casting required for type-specific operations (e.g., `CAST($1:id AS INT)`)
48-
* Each NDJSON line is parsed as a complete JSON object
49-
* Whole row is represented as a single variant object
50-
:::
26+
- [Query rows as Variants](./index.md#query-rows-as-variants)
27+
- [Query Metadata](./index.md#query-metadata)
5128

5229
## Tutorial
5330

@@ -106,34 +83,9 @@ FROM @ndjson_query_stage
10683
```
10784

10885
**Key difference:** The pattern `.*[.]ndjson[.]gz` matches files ending with `.ndjson.gz`. Databend automatically decompresses gzip files during query execution thanks to the `COMPRESSION = AUTO` setting in the file format.
109-
### Query with Metadata
110-
111-
You can also include file metadata in your queries, which is useful for tracking data lineage and debugging:
112-
113-
```sql
114-
SELECT
115-
METADATA$FILENAME,
116-
METADATA$FILE_ROW_NUMBER,
117-
$1:title, $1:author
118-
FROM @ndjson_query_stage
119-
(
120-
FILE_FORMAT => 'ndjson_query_format',
121-
PATTERN => '.*[.]ndjson'
122-
);
123-
```
124-
125-
**Metadata columns explained:**
126-
- `METADATA$FILENAME`: Shows which file each row came from - helpful when querying multiple files
127-
- `METADATA$FILE_ROW_NUMBER`: Shows the line number within the source file - useful for tracking specific records
128-
129-
**Use cases:**
130-
- **Data lineage**: Track which source file contributed each record
131-
- **Debugging**: Identify problematic records by file and line number
132-
- **Incremental processing**: Process only specific files or ranges within files
13386

13487
## Related Documentation
13588

13689
- [Loading NDJSON Files](../03-load-semistructured/03-load-ndjson.md) - How to load NDJSON data into tables
13790
- [NDJSON File Format Options](/sql/sql-reference/file-format-options#ndjson-options) - Complete NDJSON format configuration
138-
- [CREATE STAGE](/sql/sql-commands/ddl/stage/ddl-create-stage) - Managing external and internal stages
139-
- [Querying Metadata](./04-querying-metadata.md) - More details about metadata columns
91+
- [CREATE STAGE](/sql/sql-commands/ddl/stage/ddl-create-stage) - Managing external and internal stages

docs/en/guides/40-load-data/04-transform/04-querying-avro.md

Lines changed: 3 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,32 +3,10 @@ title: Querying Avro Files in Stage
33
sidebar_label: Avro
44
---
55

6-
## Query Avro Files in Stage
6+
## Syntax:
77

8-
Syntax:
9-
```sql
10-
SELECT [<alias>.]$1:<column> [, $1:<column> ...]
11-
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
12-
[(
13-
[<connection_parameters>],
14-
[ PATTERN => '<regex_pattern>'],
15-
[ FILE_FORMAT => 'AVRO'],
16-
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
17-
)]
18-
```
19-
20-
:::info Tips
21-
**Query Return Content Explanation:**
22-
23-
* **Return Format**: Each row as a single variant object (referenced as `$1`)
24-
* **Access Method**: Use path expressions `$1:column_name`
25-
* **Example**: `SELECT $1:id, $1:name FROM @stage_name`
26-
* **Key Features**:
27-
* Must use path notation to access specific fields
28-
* Type casting required for type-specific operations (e.g., `CAST($1:id AS INT)`)
29-
* Avro schema is mapped to variant structure
30-
* Whole row is represented as a single variant object
31-
:::
8+
- [Query rows as Variants](./index.md#query-rows-as-variants)
9+
- [Query Metadata](./index.md#query-metadata)
3210

3311
## Avro Querying Features Overview
3412

docs/en/guides/40-load-data/04-transform/04-querying-metadata.md

Lines changed: 0 additions & 27 deletions
This file was deleted.

0 commit comments

Comments
 (0)