Skip to content

Commit db7a588

Browse files
authored
docs: refresh vector and geo demos (#2980)
* docs: refresh vector and geo demos * docs: polish query demos
1 parent fca1bf7 commit db7a588

File tree

12 files changed

+892
-128
lines changed

12 files changed

+892
-128
lines changed

docs/cn/guides/54-query/00-sql-analytics.md

Lines changed: 204 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@
22
title: SQL 分析
33
---
44

5-
> **场景:** CityDrive 会把所有行车视频写入共享的关系表,分析师因此可以在同一批 `video_id` / `frame_id` 上做过滤、连接与聚合,供后续的 JSON、向量、地理和 ETL 负载共用
5+
> **场景:** CityDrive 将所有行车记录暂存到统一的关系型表中。这些关系型数据(如视频元信息、事件标签)均由后台处理流程从原始行车视频的关键帧中提取而来。这样分析师就可以针对同一批 `video_id` / `frame_id` 数据进行过滤、关联和聚合,并供所有下游业务复用
66
7-
本演练建模了 CityDrive 编目中的关系层,并串起常见的 SQL 积木。这里出现的示例 ID 会在其余指南中再次用到
7+
本指南将对该目录的关系型数据部分进行建模,并重点介绍实用的 SQL 构建模块。这里用到的示例 ID 也会在后续的 JSON、向量、地理空间和 ETL 指南中反复出现
88

99
## 1. 创建基础表
10-
`citydrive_videos` 保存视频级元数据,`frame_events` 记录每段视频里抽出的关键帧
10+
`citydrive_videos` 用于存储视频片段的元数据,`frame_events` 则记录从每个片段中提取出的关键帧(Interesting Frames)
1111

1212
```sql
1313
CREATE OR REPLACE TABLE citydrive_videos (
@@ -41,21 +41,39 @@ INSERT INTO frame_events VALUES
4141
('FRAME-0102', 'VID-20250101-001', 416, '2025-01-01 08:33:54', 'pedestrian', 0.67, 24.8),
4242
('FRAME-0201', 'VID-20250101-002', 298, '2025-01-01 11:12:02', 'lane_merge', 0.74, 48.1),
4343
('FRAME-0301', 'VID-20250102-001', 188, '2025-01-02 09:44:18', 'hard_brake', 0.59, 52.6),
44-
('FRAME-0401', 'VID-20250103-001', 522, '2025-01-03 21:18:07', 'night_lowlight', 0.63, 38.9);
44+
('FRAME-0401', 'VID-20250103-001', 522, '2025-01-03 21:18:07', 'night_lowlight', 0.63, 38.9),
45+
-- 故意保留一个孤立事件,用于演示 NOT EXISTS
46+
('FRAME-0501', 'VID-MISSING-001', 10, '2025-01-04 10:00:00', 'sensor_fault', 0.25, 15.0);
47+
48+
-- 下面的 JOIN 模式需要此表;表结构与“JSON 与搜索”指南中的一致。
49+
CREATE OR REPLACE TABLE frame_metadata_catalog (
50+
doc_id STRING,
51+
meta_json VARIANT,
52+
captured_at TIMESTAMP,
53+
INVERTED INDEX idx_meta_json (meta_json)
54+
);
55+
56+
INSERT INTO frame_metadata_catalog VALUES
57+
('FRAME-0101', PARSE_JSON('{"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":32.4},"detections":{"objects":[{"type":"vehicle","confidence":0.88},{"type":"brake_light","confidence":0.64}]},"media_meta":{"tagging":{"labels":["hard_brake","rain","downtown_loop"]}}}'), '2025-01-01 08:15:21'),
58+
('FRAME-0102', PARSE_JSON('{"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":24.8},"detections":{"objects":[{"type":"pedestrian","confidence":0.92},{"type":"bike","confidence":0.35}]},"media_meta":{"tagging":{"labels":["pedestrian","swerve","crosswalk"]}}}'), '2025-01-01 08:33:54'),
59+
('FRAME-0201', PARSE_JSON('{"scene":{"weather_code":"overcast","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":48.1},"detections":{"objects":[{"type":"lane_merge","confidence":0.74},{"type":"vehicle","confidence":0.41}]},"media_meta":{"tagging":{"labels":["lane_merge","urban"]}}}'), '2025-01-01 11:12:02'),
60+
('FRAME-0301', PARSE_JSON('{"scene":{"weather_code":"clear","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":52.6},"detections":{"objects":[{"type":"vehicle","confidence":0.82},{"type":"hard_brake","confidence":0.59}]},"media_meta":{"tagging":{"labels":["hard_brake","highway"]}}}'), '2025-01-02 09:44:18'),
61+
('FRAME-0401', PARSE_JSON('{"scene":{"weather_code":"lightfog","lighting":"night"},"camera":{"sensor_view":"rear"},"vehicle":{"speed_kmh":38.9},"detections":{"objects":[{"type":"traffic_light","confidence":0.78},{"type":"vehicle","confidence":0.36}]},"media_meta":{"tagging":{"labels":["night_lowlight","traffic_light"]}}}'), '2025-01-03 21:18:07');
4562
```
4663

4764
文档:[CREATE TABLE](/sql/sql-commands/ddl/table/ddl-create-table)[INSERT](/sql/sql-commands/dml/dml-insert)
4865

4966
---
5067

51-
## 2. 只看最新车次
52-
把调查范围控制在最近 3 天的导航路线
68+
## 2. 过滤工作集
69+
将查询范围限定在种子数据中 1 月 1 日至 3 日的快照上,以确保演示查询始终能返回结果
5370

5471
```sql
5572
WITH recent_videos AS (
5673
SELECT *
5774
FROM citydrive_videos
58-
WHERE capture_date >= DATEADD('day', -3, TODAY())
75+
WHERE capture_date >= '2025-01-01'
76+
AND capture_date < '2025-01-04'
5977
)
6078
SELECT v.video_id,
6179
v.route_name,
@@ -69,10 +87,20 @@ ORDER BY flagged_frames DESC;
6987

7088
文档:[DATEADD](/sql/sql-functions/datetime-functions/date-add)[GROUP BY](/sql/sql-commands/query-syntax/query-select#group-by-clause)
7189

90+
示例输出:
91+
92+
```
93+
video_id | route_name | weather | flagged_frames
94+
VID-20250101-001| Downtown Loop | Rain | 2
95+
VID-20250101-002| Port Perimeter | Overcast | 1
96+
VID-20250102-001| Airport Connector | Clear | 1
97+
VID-20250103-001| CBD Night Sweep | LightFog | 1
98+
```
99+
72100
---
73101

74-
## 3. 常见 JOIN 模式
75-
### INNER JOIN:取帧上下文
102+
## 3. 连接模式 (JOIN Patterns)
103+
### INNER JOIN:获取帧上下文
76104
```sql
77105
SELECT f.frame_id,
78106
f.event_tag,
@@ -84,7 +112,18 @@ JOIN citydrive_videos AS v USING (video_id)
84112
ORDER BY f.collected_at;
85113
```
86114

87-
### NOT EXISTS:做 QA
115+
示例输出:
116+
117+
```
118+
frame_id | event_tag | risk_score | route_name | camera_source
119+
FRAME-0101| hard_brake | 0.81 | Downtown Loop | roof_cam
120+
FRAME-0102| pedestrian | 0.67 | Downtown Loop | roof_cam
121+
FRAME-0201| lane_merge | 0.74 | Port Perimeter | front_cam
122+
FRAME-0301| hard_brake | 0.59 | Airport Connector | front_cam
123+
FRAME-0401| night_lowlight | 0.63 | CBD Night Sweep | rear_cam
124+
```
125+
126+
### 反连接 (Anti Join):质量检查 (QA)
88127
```sql
89128
SELECT frame_id
90129
FROM frame_events f
@@ -95,7 +134,14 @@ WHERE NOT EXISTS (
95134
);
96135
```
97136

98-
### LATERAL FLATTEN:展开 JSON 检测
137+
示例输出:
138+
139+
```
140+
frame_id
141+
FRAME-0501
142+
```
143+
144+
### LATERAL FLATTEN:展开嵌套检测结果
99145
```sql
100146
SELECT f.frame_id,
101147
obj.value['type']::STRING AS detected_type,
@@ -107,12 +153,20 @@ WHERE f.event_tag = 'pedestrian'
107153
ORDER BY confidence DESC;
108154
```
109155

156+
示例输出:
157+
158+
```
159+
frame_id | detected_type | confidence
160+
FRAME-0102| pedestrian | 0.92
161+
FRAME-0102| bike | 0.35
162+
```
163+
110164
文档:[JOIN](/sql/sql-commands/query-syntax/query-join)[FLATTEN](/sql/sql-functions/table-functions/flatten)
111165

112166
---
113167

114168
## 4. 车队 KPI 聚合
115-
### 分路线的行为统计
169+
### 按路线统计驾驶行为
116170
```sql
117171
SELECT v.route_name,
118172
f.event_tag,
@@ -124,7 +178,18 @@ GROUP BY v.route_name, f.event_tag
124178
ORDER BY avg_risk DESC, occurrences DESC;
125179
```
126180

127-
### ROLLUP 总计
181+
示例输出:
182+
183+
```
184+
route_name | event_tag | occurrences | avg_risk
185+
Downtown Loop | hard_brake | 1 | 0.81
186+
Port Perimeter | lane_merge | 1 | 0.74
187+
Downtown Loop | pedestrian | 1 | 0.67
188+
CBD Night Sweep | night_lowlight | 1 | 0.63
189+
Airport Connector | hard_brake | 1 | 0.59
190+
```
191+
192+
### ROLLUP:计算总计
128193
```sql
129194
SELECT v.route_name,
130195
f.event_tag,
@@ -135,7 +200,20 @@ GROUP BY ROLLUP(v.route_name, f.event_tag)
135200
ORDER BY v.route_name NULLS LAST, f.event_tag;
136201
```
137202

138-
### CUBE:路线 × 天气 覆盖
203+
示例输出(前 6 行):
204+
205+
```
206+
route_name | event_tag | occurrences
207+
Airport Connector | hard_brake | 1
208+
Airport Connector | NULL | 1
209+
CBD Night Sweep | night_lowlight | 1
210+
CBD Night Sweep | NULL | 1
211+
Downtown Loop | hard_brake | 1
212+
Downtown Loop | pedestrian | 1
213+
... (total rows: 10)
214+
```
215+
216+
### CUBE:路线 × 天气覆盖率
139217
```sql
140218
SELECT v.route_name,
141219
v.weather,
@@ -145,10 +223,23 @@ GROUP BY CUBE(v.route_name, v.weather)
145223
ORDER BY v.route_name NULLS LAST, v.weather NULLS LAST;
146224
```
147225

226+
示例输出(前 6 行):
227+
228+
```
229+
route_name | weather | videos
230+
Airport Connector | Clear | 1
231+
Airport Connector | NULL | 1
232+
CBD Night Sweep | LightFog | 1
233+
CBD Night Sweep | NULL | 1
234+
Downtown Loop | Rain | 1
235+
Downtown Loop | NULL | 1
236+
... (total rows: 13)
237+
```
238+
148239
---
149240

150241
## 5. 窗口函数
151-
### 单次视频的风险累计
242+
### 单个视频的累积风险
152243
```sql
153244
WITH ordered_events AS (
154245
SELECT video_id, collected_at, risk_score
@@ -166,7 +257,19 @@ FROM ordered_events
166257
ORDER BY video_id, collected_at;
167258
```
168259

169-
### 帧级滑动平均
260+
示例输出(前 6 行):
261+
262+
```
263+
video_id | collected_at | risk_score | cumulative_risk
264+
VID-20250101-001| 2025-01-01 08:15:21 | 0.81 | 0.81
265+
VID-20250101-001| 2025-01-01 08:33:54 | 0.67 | 1.48
266+
VID-20250101-002| 2025-01-01 11:12:02 | 0.74 | 0.74
267+
VID-20250102-001| 2025-01-02 09:44:18 | 0.59 | 0.59
268+
VID-20250103-001| 2025-01-03 21:18:07 | 0.63 | 0.63
269+
VID-MISSING-001 | 2025-01-04 10:00:00 | 0.25 | 0.25
270+
```
271+
272+
### 最近帧的滑动平均值
170273
```sql
171274
SELECT video_id,
172275
frame_id,
@@ -181,12 +284,24 @@ FROM frame_events
181284
ORDER BY video_id, frame_index;
182285
```
183286

287+
示例输出(前 6 行):
288+
289+
```
290+
video_id | frame_id | frame_index | risk_score | rolling_avg_risk
291+
VID-20250101-001| FRAME-0101 | 125 | 0.81 | 0.81
292+
VID-20250101-001| FRAME-0102 | 416 | 0.67 | 0.74
293+
VID-20250101-002| FRAME-0201 | 298 | 0.74 | 0.74
294+
VID-20250102-001| FRAME-0301 | 188 | 0.59 | 0.59
295+
VID-20250103-001| FRAME-0401 | 522 | 0.63 | 0.63
296+
VID-MISSING-001 | FRAME-0501 | 10 | 0.25 | 0.25
297+
```
298+
184299
窗口函数可以在 SQL 中直接表达滚动求和或滑动平均。完整列表见:[窗口函数](/sql/sql-functions/window-functions)
185300

186301
---
187302

188-
## 6. 聚合索引提速
189-
使用 [Aggregating Index](/guides/performance/aggregating-index) 缓存高频汇总,让仪表盘查询避开全表扫描
303+
## 6. 聚合索引加速
304+
持久化常用的仪表盘汇总数据
190305

191306
```sql
192307
CREATE OR REPLACE AGGREGATING INDEX idx_video_event_summary
@@ -199,4 +314,74 @@ FROM frame_events
199314
GROUP BY video_id, event_tag;
200315
```
201316

202-
当你再次运行相同的汇总(如路线事件分布)时,`EXPLAIN` 会显示 `AggregatingIndex` 节点,说明查询已经命中上面的摘要副本。索引会在新的帧写入后自动刷新,无须额外 ETL 即可保持秒级体验。
317+
当分析师再次查询相同的 KPI 时,优化器会直接从索引中读取数据:
318+
319+
```sql
320+
SELECT v.route_name,
321+
e.event_tag,
322+
COUNT(*) AS event_count,
323+
AVG(e.risk_score) AS avg_risk
324+
FROM frame_events e
325+
JOIN citydrive_videos v USING (video_id)
326+
WHERE v.capture_date >= '2025-01-01'
327+
GROUP BY v.route_name, e.event_tag
328+
ORDER BY avg_risk DESC;
329+
```
330+
331+
示例输出:
332+
333+
```
334+
route_name | event_tag | event_count | avg_risk
335+
Downtown Loop | hard_brake | 1 | 0.81
336+
Port Perimeter | lane_merge | 1 | 0.74
337+
Downtown Loop | pedestrian | 1 | 0.67
338+
CBD Night Sweep | night_lowlight | 1 | 0.63
339+
Airport Connector | hard_brake | 1 | 0.59
340+
```
341+
342+
文档:[Aggregating Index](/guides/performance/aggregating-index)[EXPLAIN](/sql/sql-commands/explain-cmds/explain)
343+
344+
---
345+
346+
## 7. 存储过程自动化
347+
将逻辑封装起来,确保定时任务始终生成一致的报告。
348+
349+
```sql
350+
CREATE OR REPLACE PROCEDURE citydrive_route_report(days_back UINT8)
351+
RETURNS TABLE(route_name STRING, event_tag STRING, event_count BIGINT, avg_risk DOUBLE)
352+
LANGUAGE SQL
353+
AS
354+
$$
355+
BEGIN
356+
RETURN TABLE (
357+
SELECT v.route_name,
358+
e.event_tag,
359+
COUNT(*) AS event_count,
360+
AVG(e.risk_score) AS avg_risk
361+
FROM frame_events e
362+
JOIN citydrive_videos v USING (video_id)
363+
WHERE v.capture_date >= DATEADD('day', -:days_back, DATE '2025-01-04')
364+
GROUP BY v.route_name, e.event_tag
365+
);
366+
END;
367+
$$;
368+
369+
CALL PROCEDURE citydrive_route_report(30);
370+
```
371+
372+
示例输出:
373+
374+
```
375+
route_name | event_tag | event_count | avg_risk
376+
Downtown Loop | hard_brake | 1 | 0.81
377+
CBD Night Sweep | night_lowlight | 1 | 0.63
378+
Downtown Loop | pedestrian | 1 | 0.67
379+
Airport Connector | hard_brake | 1 | 0.59
380+
Port Perimeter | lane_merge | 1 | 0.74
381+
```
382+
383+
存储过程可以手动触发,也可以通过 [TASKS](/guides/load-data/continuous-data-pipelines/task) 或编排工具触发。
384+
385+
---
386+
387+
有了这些表和模式,CityDrive 指南的其余部分就可以引用完全相同的 `video_id` 键——无论是用于 JSON 搜索的 `frame_metadata_catalog`、用于相似度分析的帧嵌入、用于地理查询的 GPS 位置,还是保持它们同步的单一 ETL 链路。

0 commit comments

Comments
 (0)