22title : SQL 分析
33---
44
5- > ** 场景:** CityDrive 会把所有行车视频写入共享的关系表,分析师因此可以在同一批 ` video_id ` / ` frame_id ` 上做过滤、连接与聚合,供后续的 JSON、向量、地理和 ETL 负载共用 。
5+ > ** 场景:** CityDrive 将所有行车记录暂存到统一的关系型表中。这些关系型数据(如视频元信息、事件标签)均由后台处理流程从原始行车视频的关键帧中提取而来。这样分析师就可以针对同一批 ` video_id ` / ` frame_id ` 数据进行过滤、关联和聚合,并供所有下游业务复用 。
66
7- 本演练建模了 CityDrive 编目中的关系层,并串起常见的 SQL 积木。这里出现的示例 ID 会在其余指南中再次用到 。
7+ 本指南将对该目录的关系型数据部分进行建模,并重点介绍实用的 SQL 构建模块。这里用到的示例 ID 也会在后续的 JSON、向量、地理空间和 ETL 指南中反复出现 。
88
99## 1. 创建基础表
10- ` citydrive_videos ` 保存视频级元数据, 而 ` frame_events ` 记录每段视频里抽出的关键帧 。
10+ ` citydrive_videos ` 用于存储视频片段的元数据, 而 ` frame_events ` 则记录从每个片段中提取出的关键帧(Interesting Frames) 。
1111
1212``` sql
1313CREATE OR REPLACE TABLE citydrive_videos (
@@ -41,21 +41,39 @@ INSERT INTO frame_events VALUES
4141 (' FRAME-0102' , ' VID-20250101-001' , 416 , ' 2025-01-01 08:33:54' , ' pedestrian' , 0 .67 , 24 .8 ),
4242 (' FRAME-0201' , ' VID-20250101-002' , 298 , ' 2025-01-01 11:12:02' , ' lane_merge' , 0 .74 , 48 .1 ),
4343 (' FRAME-0301' , ' VID-20250102-001' , 188 , ' 2025-01-02 09:44:18' , ' hard_brake' , 0 .59 , 52 .6 ),
44- (' FRAME-0401' , ' VID-20250103-001' , 522 , ' 2025-01-03 21:18:07' , ' night_lowlight' , 0 .63 , 38 .9 );
44+ (' FRAME-0401' , ' VID-20250103-001' , 522 , ' 2025-01-03 21:18:07' , ' night_lowlight' , 0 .63 , 38 .9 ),
45+ -- 故意保留一个孤立事件,用于演示 NOT EXISTS
46+ (' FRAME-0501' , ' VID-MISSING-001' , 10 , ' 2025-01-04 10:00:00' , ' sensor_fault' , 0 .25 , 15 .0 );
47+
48+ -- 下面的 JOIN 模式需要此表;表结构与“JSON 与搜索”指南中的一致。
49+ CREATE OR REPLACE TABLE frame_metadata_catalog (
50+ doc_id STRING,
51+ meta_json VARIANT,
52+ captured_at TIMESTAMP ,
53+ INVERTED INDEX idx_meta_json (meta_json)
54+ );
55+
56+ INSERT INTO frame_metadata_catalog VALUES
57+ (' FRAME-0101' , PARSE_JSON(' {"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":32.4},"detections":{"objects":[{"type":"vehicle","confidence":0.88},{"type":"brake_light","confidence":0.64}]},"media_meta":{"tagging":{"labels":["hard_brake","rain","downtown_loop"]}}}' ), ' 2025-01-01 08:15:21' ),
58+ (' FRAME-0102' , PARSE_JSON(' {"scene":{"weather_code":"rain","lighting":"day"},"camera":{"sensor_view":"roof"},"vehicle":{"speed_kmh":24.8},"detections":{"objects":[{"type":"pedestrian","confidence":0.92},{"type":"bike","confidence":0.35}]},"media_meta":{"tagging":{"labels":["pedestrian","swerve","crosswalk"]}}}' ), ' 2025-01-01 08:33:54' ),
59+ (' FRAME-0201' , PARSE_JSON(' {"scene":{"weather_code":"overcast","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":48.1},"detections":{"objects":[{"type":"lane_merge","confidence":0.74},{"type":"vehicle","confidence":0.41}]},"media_meta":{"tagging":{"labels":["lane_merge","urban"]}}}' ), ' 2025-01-01 11:12:02' ),
60+ (' FRAME-0301' , PARSE_JSON(' {"scene":{"weather_code":"clear","lighting":"day"},"camera":{"sensor_view":"front"},"vehicle":{"speed_kmh":52.6},"detections":{"objects":[{"type":"vehicle","confidence":0.82},{"type":"hard_brake","confidence":0.59}]},"media_meta":{"tagging":{"labels":["hard_brake","highway"]}}}' ), ' 2025-01-02 09:44:18' ),
61+ (' FRAME-0401' , PARSE_JSON(' {"scene":{"weather_code":"lightfog","lighting":"night"},"camera":{"sensor_view":"rear"},"vehicle":{"speed_kmh":38.9},"detections":{"objects":[{"type":"traffic_light","confidence":0.78},{"type":"vehicle","confidence":0.36}]},"media_meta":{"tagging":{"labels":["night_lowlight","traffic_light"]}}}' ), ' 2025-01-03 21:18:07' );
4562```
4663
4764文档:[ CREATE TABLE] ( /sql/sql-commands/ddl/table/ddl-create-table ) 、[ INSERT] ( /sql/sql-commands/dml/dml-insert ) 。
4865
4966---
5067
51- ## 2. 只看最新车次
52- 把调查范围控制在最近 3 天的导航路线 。
68+ ## 2. 过滤工作集
69+ 将查询范围限定在种子数据中 1 月 1 日至 3 日的快照上,以确保演示查询始终能返回结果 。
5370
5471``` sql
5572WITH recent_videos AS (
5673 SELECT *
5774 FROM citydrive_videos
58- WHERE capture_date >= DATEADD(' day' , - 3 , TODAY())
75+ WHERE capture_date >= ' 2025-01-01'
76+ AND capture_date < ' 2025-01-04'
5977)
6078SELECT v .video_id ,
6179 v .route_name ,
@@ -69,10 +87,20 @@ ORDER BY flagged_frames DESC;
6987
7088文档:[ DATEADD] ( /sql/sql-functions/datetime-functions/date-add ) 、[ GROUP BY] ( /sql/sql-commands/query-syntax/query-select#group-by-clause ) 。
7189
90+ 示例输出:
91+
92+ ```
93+ video_id | route_name | weather | flagged_frames
94+ VID-20250101-001| Downtown Loop | Rain | 2
95+ VID-20250101-002| Port Perimeter | Overcast | 1
96+ VID-20250102-001| Airport Connector | Clear | 1
97+ VID-20250103-001| CBD Night Sweep | LightFog | 1
98+ ```
99+
72100---
73101
74- ## 3. 常见 JOIN 模式
75- ### INNER JOIN:取帧上下文
102+ ## 3. 连接模式 ( JOIN Patterns)
103+ ### INNER JOIN:获取帧上下文
76104``` sql
77105SELECT f .frame_id ,
78106 f .event_tag ,
@@ -84,7 +112,18 @@ JOIN citydrive_videos AS v USING (video_id)
84112ORDER BY f .collected_at ;
85113```
86114
87- ### NOT EXISTS:做 QA
115+ 示例输出:
116+
117+ ```
118+ frame_id | event_tag | risk_score | route_name | camera_source
119+ FRAME-0101| hard_brake | 0.81 | Downtown Loop | roof_cam
120+ FRAME-0102| pedestrian | 0.67 | Downtown Loop | roof_cam
121+ FRAME-0201| lane_merge | 0.74 | Port Perimeter | front_cam
122+ FRAME-0301| hard_brake | 0.59 | Airport Connector | front_cam
123+ FRAME-0401| night_lowlight | 0.63 | CBD Night Sweep | rear_cam
124+ ```
125+
126+ ### 反连接 (Anti Join):质量检查 (QA)
88127``` sql
89128SELECT frame_id
90129FROM frame_events f
@@ -95,7 +134,14 @@ WHERE NOT EXISTS (
95134);
96135```
97136
98- ### LATERAL FLATTEN:展开 JSON 检测
137+ 示例输出:
138+
139+ ```
140+ frame_id
141+ FRAME-0501
142+ ```
143+
144+ ### LATERAL FLATTEN:展开嵌套检测结果
99145``` sql
100146SELECT f .frame_id ,
101147 obj .value [' type' ]::STRING AS detected_type,
@@ -107,12 +153,20 @@ WHERE f.event_tag = 'pedestrian'
107153ORDER BY confidence DESC ;
108154```
109155
156+ 示例输出:
157+
158+ ```
159+ frame_id | detected_type | confidence
160+ FRAME-0102| pedestrian | 0.92
161+ FRAME-0102| bike | 0.35
162+ ```
163+
110164文档:[ JOIN] ( /sql/sql-commands/query-syntax/query-join ) 、[ FLATTEN] ( /sql/sql-functions/table-functions/flatten ) 。
111165
112166---
113167
114168## 4. 车队 KPI 聚合
115- ### 分路线的行为统计
169+ ### 按路线统计驾驶行为
116170``` sql
117171SELECT v .route_name ,
118172 f .event_tag ,
@@ -124,7 +178,18 @@ GROUP BY v.route_name, f.event_tag
124178ORDER BY avg_risk DESC , occurrences DESC ;
125179```
126180
127- ### ROLLUP 总计
181+ 示例输出:
182+
183+ ```
184+ route_name | event_tag | occurrences | avg_risk
185+ Downtown Loop | hard_brake | 1 | 0.81
186+ Port Perimeter | lane_merge | 1 | 0.74
187+ Downtown Loop | pedestrian | 1 | 0.67
188+ CBD Night Sweep | night_lowlight | 1 | 0.63
189+ Airport Connector | hard_brake | 1 | 0.59
190+ ```
191+
192+ ### ROLLUP:计算总计
128193``` sql
129194SELECT v .route_name ,
130195 f .event_tag ,
@@ -135,7 +200,20 @@ GROUP BY ROLLUP(v.route_name, f.event_tag)
135200ORDER BY v .route_name NULLS LAST, f .event_tag ;
136201```
137202
138- ### CUBE:路线 × 天气 覆盖
203+ 示例输出(前 6 行):
204+
205+ ```
206+ route_name | event_tag | occurrences
207+ Airport Connector | hard_brake | 1
208+ Airport Connector | NULL | 1
209+ CBD Night Sweep | night_lowlight | 1
210+ CBD Night Sweep | NULL | 1
211+ Downtown Loop | hard_brake | 1
212+ Downtown Loop | pedestrian | 1
213+ ... (total rows: 10)
214+ ```
215+
216+ ### CUBE:路线 × 天气覆盖率
139217``` sql
140218SELECT v .route_name ,
141219 v .weather ,
@@ -145,10 +223,23 @@ GROUP BY CUBE(v.route_name, v.weather)
145223ORDER BY v .route_name NULLS LAST, v .weather NULLS LAST;
146224```
147225
226+ 示例输出(前 6 行):
227+
228+ ```
229+ route_name | weather | videos
230+ Airport Connector | Clear | 1
231+ Airport Connector | NULL | 1
232+ CBD Night Sweep | LightFog | 1
233+ CBD Night Sweep | NULL | 1
234+ Downtown Loop | Rain | 1
235+ Downtown Loop | NULL | 1
236+ ... (total rows: 13)
237+ ```
238+
148239---
149240
150241## 5. 窗口函数
151- ### 单次视频的风险累计
242+ ### 单个视频的累积风险
152243``` sql
153244WITH ordered_events AS (
154245 SELECT video_id, collected_at, risk_score
@@ -166,7 +257,19 @@ FROM ordered_events
166257ORDER BY video_id, collected_at;
167258```
168259
169- ### 帧级滑动平均
260+ 示例输出(前 6 行):
261+
262+ ```
263+ video_id | collected_at | risk_score | cumulative_risk
264+ VID-20250101-001| 2025-01-01 08:15:21 | 0.81 | 0.81
265+ VID-20250101-001| 2025-01-01 08:33:54 | 0.67 | 1.48
266+ VID-20250101-002| 2025-01-01 11:12:02 | 0.74 | 0.74
267+ VID-20250102-001| 2025-01-02 09:44:18 | 0.59 | 0.59
268+ VID-20250103-001| 2025-01-03 21:18:07 | 0.63 | 0.63
269+ VID-MISSING-001 | 2025-01-04 10:00:00 | 0.25 | 0.25
270+ ```
271+
272+ ### 最近帧的滑动平均值
170273``` sql
171274SELECT video_id,
172275 frame_id,
@@ -181,12 +284,24 @@ FROM frame_events
181284ORDER BY video_id, frame_index;
182285```
183286
287+ 示例输出(前 6 行):
288+
289+ ```
290+ video_id | frame_id | frame_index | risk_score | rolling_avg_risk
291+ VID-20250101-001| FRAME-0101 | 125 | 0.81 | 0.81
292+ VID-20250101-001| FRAME-0102 | 416 | 0.67 | 0.74
293+ VID-20250101-002| FRAME-0201 | 298 | 0.74 | 0.74
294+ VID-20250102-001| FRAME-0301 | 188 | 0.59 | 0.59
295+ VID-20250103-001| FRAME-0401 | 522 | 0.63 | 0.63
296+ VID-MISSING-001 | FRAME-0501 | 10 | 0.25 | 0.25
297+ ```
298+
184299窗口函数可以在 SQL 中直接表达滚动求和或滑动平均。完整列表见:[ 窗口函数] ( /sql/sql-functions/window-functions ) 。
185300
186301---
187302
188- ## 6. 聚合索引提速
189- 使用 [ Aggregating Index ] ( /guides/performance/aggregating-index ) 缓存高频汇总,让仪表盘查询避开全表扫描 。
303+ ## 6. 聚合索引加速
304+ 持久化常用的仪表盘汇总数据 。
190305
191306``` sql
192307CREATE OR REPLACE AGGREGATING INDEX idx_video_event_summary
@@ -199,4 +314,74 @@ FROM frame_events
199314GROUP BY video_id, event_tag;
200315```
201316
202- 当你再次运行相同的汇总(如路线事件分布)时,` EXPLAIN ` 会显示 ` AggregatingIndex ` 节点,说明查询已经命中上面的摘要副本。索引会在新的帧写入后自动刷新,无须额外 ETL 即可保持秒级体验。
317+ 当分析师再次查询相同的 KPI 时,优化器会直接从索引中读取数据:
318+
319+ ``` sql
320+ SELECT v .route_name ,
321+ e .event_tag ,
322+ COUNT (* ) AS event_count,
323+ AVG (e .risk_score ) AS avg_risk
324+ FROM frame_events e
325+ JOIN citydrive_videos v USING (video_id)
326+ WHERE v .capture_date >= ' 2025-01-01'
327+ GROUP BY v .route_name , e .event_tag
328+ ORDER BY avg_risk DESC ;
329+ ```
330+
331+ 示例输出:
332+
333+ ```
334+ route_name | event_tag | event_count | avg_risk
335+ Downtown Loop | hard_brake | 1 | 0.81
336+ Port Perimeter | lane_merge | 1 | 0.74
337+ Downtown Loop | pedestrian | 1 | 0.67
338+ CBD Night Sweep | night_lowlight | 1 | 0.63
339+ Airport Connector | hard_brake | 1 | 0.59
340+ ```
341+
342+ 文档:[ Aggregating Index] ( /guides/performance/aggregating-index ) 和 [ EXPLAIN] ( /sql/sql-commands/explain-cmds/explain ) 。
343+
344+ ---
345+
346+ ## 7. 存储过程自动化
347+ 将逻辑封装起来,确保定时任务始终生成一致的报告。
348+
349+ ``` sql
350+ CREATE OR REPLACE PROCEDURE citydrive_route_report(days_back UINT8)
351+ RETURNS TABLE(route_name STRING, event_tag STRING, event_count BIGINT , avg_risk DOUBLE)
352+ LANGUAGE SQL
353+ AS
354+ $$
355+ BEGIN
356+ RETURN TABLE (
357+ SELECT v .route_name ,
358+ e .event_tag ,
359+ COUNT (* ) AS event_count,
360+ AVG (e .risk_score ) AS avg_risk
361+ FROM frame_events e
362+ JOIN citydrive_videos v USING (video_id)
363+ WHERE v .capture_date >= DATEADD(' day' , - :days_back, DATE ' 2025-01-04' )
364+ GROUP BY v .route_name , e .event_tag
365+ );
366+ END;
367+ $$;
368+
369+ CALL PROCEDURE citydrive_route_report(30 );
370+ ```
371+
372+ 示例输出:
373+
374+ ```
375+ route_name | event_tag | event_count | avg_risk
376+ Downtown Loop | hard_brake | 1 | 0.81
377+ CBD Night Sweep | night_lowlight | 1 | 0.63
378+ Downtown Loop | pedestrian | 1 | 0.67
379+ Airport Connector | hard_brake | 1 | 0.59
380+ Port Perimeter | lane_merge | 1 | 0.74
381+ ```
382+
383+ 存储过程可以手动触发,也可以通过 [ TASKS] ( /guides/load-data/continuous-data-pipelines/task ) 或编排工具触发。
384+
385+ ---
386+
387+ 有了这些表和模式,CityDrive 指南的其余部分就可以引用完全相同的 ` video_id ` 键——无论是用于 JSON 搜索的 ` frame_metadata_catalog ` 、用于相似度分析的帧嵌入、用于地理查询的 GPS 位置,还是保持它们同步的单一 ETL 链路。
0 commit comments