Pick cloud2 #2

oppenheimer01 · 2025-04-29T04:42:51Z

Fixes #ISSUE_Number

What does this PR do?

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

CI Skip Instructions

core changes: 1. add macro serverless, configure with --enable-serverless 2. add hooks to get control in transaction/dispatch management 3. add transaction processing framework 4. add session state dispatch framework

…ue in plugin. 2. add SimpleLruReadPage_hook for plugin to read SLRU page. 3. add StartChildProcess_hook for plugin to get control in child process startup.

…n catalog is permitted on single master without warehouse.

Currently, we use randomly distribution for hashdata table, and the number of segments is set to 0. When we query on hashdata table, the distribution policy's segment number is set to number of segments of current warehouse.

…Request

This commit is mainly used to add extensible smgr slot for other extension storage format. When we create storage format extension, will add relevant smgr slot in smgrsw array. Morever,add smgropen and smgrclose in RelationDropStorage. authored-by: Zhang Wenchao [email protected]

1. Two hook functions , ext_dml_init_hook and ext_dml_finish_hook have been added. These functions perform some resource initialization and cleanup at the start and end of data modification operations (such as modifyTable, CopyFrom, CreateAs, Matview, etc.)

If we drop a hashdata table, we can not delete the record in main_manifest. So we make the main_manifest as a catalog table, and add a depedency in pg_depend table. When we drop a table, the depedency will be deleted too.

…tplan.

Mainly implements extensible libpq protocol in this commit. Morever, imports extensible ExecStatusType and DispatcherAsyncFuncs which can be extended in extension. By this way, we can extend these modules in extension as wanted. authored-by: Zhang Wenchao [email protected]

1. In serverless architecture, we do not need to dispatch the vacuum command. 2. Make T_ExtensibleNode in CMD_TAG list, which is needed by CreateCommandTag inutility.c. We can not hook it because it executes before standard_ProcessUtility function. Co-authored-by: roseduan <[email protected]>

apache#210) Change storage_am related catalog table main_manifest field type from uint32 to uint64, and change the name from relid to relnode Co-authored-by: xiaosongwang <[email protected]>

…rror. (apache#211)

RetrieveRelStorageType add a magic number 7015. we use the am_id(7015) we assigned to the custom table am, and let the orca optimizer treat this columnar storage format as AOCS to generate an execution plan

use new struct AnalyzeContext instead of gp_acquire_sample_rows_context to pass analyze context in table_beginscan_analyze

Support Altertable dispatch rewrite hook, do dispatch for every rewrite table remove the original dispatch routine after all the work done on QD Co-authored-by: xiaosongwang <[email protected]>

Co-authored-by: leo <[email protected]>

1. Hook 'SearchCatCache_hook' for plugins to get control in SearchCatCache. 2. Hook 'ReleaseCatCache_hook' for plugins to get control in ReleaseCatCache.

Hook 'RelationValidation_hook' for plugins to validate the relation in relcache.

New hook 'getgpsegmentCount_hook' for plugins to get control in getgpsegmentCount.

It's reasonable that main_manifest is not shared like pg_class.

In serverless architecture, implementing trigger the same as foreign table which use tuplestore to store the tuple is more efficient. Because it is inefficient to fetch tuple throught its ctid. Besides, in serverless architecture, concurrent update or delete is not supported. So we can fetch tuple directly without lock tuple in GetTupleForTrigger.

This routine is for colllect the catalog for the am, Is can alse use for other purpose. Is is called before scan_begin.

We use insert and delete to update, it's not used.

For a append agg inside a subquery, we will use its target list to match materialized view. However, Postgres will remove unused columns of subquery that do not exit in upper query in a hacky way: make NUll for target entries. It will make us fail to match view and rewrite as we only support exactly match for now. Workaround for this with GUC if we are allowed to attmpt to answer query.

1. Support alter warehouse name suspend/resume. 2. Support alter warehouse name options/replace options 3. Change FTS ignore dbid and contentid check when serverless is defined.

…lity 1. remove some useless code and hooks 1. remove MACRO SERVERLESS from common logical code 1. add MACRO SERVERLESS to cloud service related code

In serverless architecture, change the way of using tablespace.

It will coredump when auto analyze the inherited table, this reason is that it will enter the acquire_sample_rows_dispatcher method: if (Gp_role == GP_ROLE_DISPATCH && ENABLE_DISPATCH()) { return acquire_sample_rows_dispatcher(onerel, true, /* inherited stats */ elevel, rows, targrows, totalrows, totaldeadrows); } but if we analyze the table manually, we hold dispatch in hashdata_ProcessUtility. so we will follow the manual analyze logic, just igore this routine in serverlese mode.

CBDB has exposed aqumv_adjust_simple_query to adjust parse tree, remove duplicated codes. Ignore am_by_tablespace files by the way.

CBDB has added MatviewUsableForAppendAgg() to identify data status is up to date or is avaliable for Append Agg Plan. Remove pg_class.relinsertonly. Authored-by: Zhang Mingli [[email protected]](mailto:[email protected])

The upstream code has merged into hashdata cloud, which is missed to handle the serverless fts. Sine there is no mirrr db info in cloud, so we will ignore the log detail, otherwise it will cause coredump.

IVM has enabled min, max function with partial agg results. They have no difference with others like count, sum and should be able for Append Agg Plan. Add cases for that. Authored-by: Zhang Mingli [[email protected]](mailto:[email protected])

IVM with partial do not use vectorization plan.

IVM with partial agg fallback to normal plan. Fix corner case of version 0. Fix tablespace.

Due to unstable DNS service in cloud environment, retry 10 second to get the hostip from DNS.

When use erreport, it can print stack message in log system. But it sometimes cannot print some symbols. And some functions especially other extern functions(for example C++) cannnot display correctly, it should resolve. So add a hook for finding correct function names.

disable autovacuum temporarily because of the flacky regression test 'vacuum.sql'

Support count(n) where n is a const value, user's SQL has something like that. select count(1) has no difference with selelct count(*). Authored-by: Zhang Mingli [email protected]

For CRATE..AS, taregt table's distribution policy could be derived from the Query of AS part. create materialized view cloud_ctas_mv as select a, count(b) from cloud_ctas_t0 group by a with no data; The locus of cloud_ctas_mv may be Hashed by cloumn a as it's a agg with group by. It's ok for CBDB, but in serverless mode, the underlying data is random, we can't store a distribution policy with distkeys or numsegments for that.Else, will get error when we switch to a cluster with more or less segments. We have done something in extensions, but it didn't take effect due to the architecture and process of utility hooks. This is the last resort in core codes. Authored-by: Zhang Mingli [email protected]

This is part of commit "[CLOUD] Enable start QD in utility" but changes the CBDB.

Co-authored-by: Wei Shaolun [email protected]

…ews. * offload two Guc to ivm modules. * Clean up task dependencies.

Const expressions like where 1 = 1 and a > 1 will be processed to where a > 1 by planner. Quals like: 1 = 1 is always TRUE, for a AND expression that's useless. But as we store MV's view query as it was originally, the parse tree processed by planner may not match MV's exactly. Process that quals during Append AGG to fix. Authored-by: Zhang Mingli [email protected]

Support compile and deploy db without cloud extension source code. Add a ci for compilation check as preparation for regression and isolation2 test ci

husen and others added 30 commits April 23, 2025 15:23

Support separation of catalog and compute.

27e4fa7

core changes: 1. add macro serverless, configure with --enable-serverless 2. add hooks to get control in transaction/dispatch management 3. add transaction processing framework 4. add session state dispatch framework

1. add global variable enable_serverless, default to false, set to tr…

5c872ed

…ue in plugin. 2. add SimpleLruReadPage_hook for plugin to read SLRU page. 3. add StartChildProcess_hook for plugin to get control in child process startup.

disable WAL-log information required only for Hot Standby in serverless

7e7c992

Add support for creating cluster with single master, and only query o…

1c71dfd

…n catalog is permitted on single master without warehouse.

1. set distributedXid to LocalTransactionId 2. do not send FTS Probe …

84f4898

…Request

Feature: support subtransaction and savepoint

5f26818

Fix: Only master can set transaction status

9539d27

Feature: add dml hook

bc464ec

1. Two hook functions , ext_dml_init_hook and ext_dml_finish_hook have been added. These functions perform some resource initialization and cleanup at the start and end of data modification operations (such as modifyTable, CopyFrom, CreateAs, Matview, etc.)

Add main_manifest catalog table

89daa53

If we drop a hashdata table, we can not delete the record in main_manifest. So we make the main_manifest as a catalog table, and add a depedency in pg_depend table. When we drop a table, the depedency will be deleted too.

Add hooks for plugins to get control in transientrel_init/intorel_ini…

4f16fc0

…tplan.

change storage_am related catalog table main_manifest field type from… (

41659a1

apache#210) Change storage_am related catalog table main_manifest field type from uint32 to uint64, and change the name from relid to relnode Co-authored-by: xiaosongwang <[email protected]>

Add regress pipeline for branch union_store_catalog and fix compile e…

8f4a8df

…rror. (apache#211)

bugfix: support hashdata tableam in Orca (apache#222)

295eb56

RetrieveRelStorageType add a magic number 7015. we use the am_id(7015) we assigned to the custom table am, and let the orca optimizer treat this columnar storage format as AOCS to generate an execution plan

support analyze for unionstore table in cloudberry (apache#207)

d4e979f

use new struct AnalyzeContext instead of gp_acquire_sample_rows_context to pass analyze context in table_beginscan_analyze

New altertable rewrite dispatch policy (apache#223)

1a22c6d

Support Altertable dispatch rewrite hook, do dispatch for every rewrite table remove the original dispatch routine after all the work done on QD Co-authored-by: xiaosongwang <[email protected]>

Fix: do not commit subtransaction through DTX protocol. (apache#226)

76aa413

Co-authored-by: leo <[email protected]>

Add: new hooks for plugins to get control in syscache.

ca08709

1. Hook 'SearchCatCache_hook' for plugins to get control in SearchCatCache. 2. Hook 'ReleaseCatCache_hook' for plugins to get control in ReleaseCatCache.

Change myTempNamespace from static variable to extern variable.

68a1212

Implement drop warehouse

3137ea5

Add: new hook for plugins to validate the relation

08b65a7

Hook 'RelationValidation_hook' for plugins to validate the relation in relcache.

fix triggers

6414895

Add: some interfaces to get transaction state and xids

6b42b22

Fix copy from freeze will check subtransaction id in QEs.

f81b4d9

Add: new hook to get control in getgpsegmentCount. (apache#277)

bea9c79

New hook 'getgpsegmentCount_hook' for plugins to get control in getgpsegmentCount.

make main_manifest table not shared (apache#311)

f660c74

It's reasonable that main_manifest is not shared like pg_class.

oppenheimer01 and others added 30 commits April 23, 2025 16:17

Add a new table am routine ScanCatalogPrepare

4b091a6

This routine is for colllect the catalog for the am, Is can alse use for other purpose. Is is called before scan_begin.

Enable autovacuum process

f6a03fe

Remove useless function UpdateManifestRecord.

4c98882

We use insert and delete to update, it's not used.

Fix: use MACRO SERVERLESS instead of GUC enable_serverless

ed1fdc4

Enhance: Support for alter warehouse name suspend/resume/options

4d1fc06

1. Support alter warehouse name suspend/resume. 2. Support alter warehouse name options/replace options 3. Change FTS ignore dbid and contentid check when serverless is defined.

Refactor: use MACRO SERVERLESS to improve readability and maintainabi…

068ea37

…lity 1. remove some useless code and hooks 1. remove MACRO SERVERLESS from common logical code 1. add MACRO SERVERLESS to cloud service related code

Add SERVERLESS macro on the function load/write rel cache function

5cd6a0b

Enhancement: adjust tablespace for support serverless architecture

56d1c77

In serverless architecture, change the way of using tablespace.

Refactor: add MACRO SERVERLESS to catalog dispatching related code

f3c161b

Use aqumv public function in grouping paths.

96fa89f

CBDB has exposed aqumv_adjust_simple_query to adjust parse tree, remove duplicated codes. Ignore am_by_tablespace files by the way.

Remove regular lock on segment

570dae9

Fix work around of materialized view data status.

cd075ee

CBDB has added MatviewUsableForAppendAgg() to identify data status is up to date or is avaliable for Append Agg Plan. Remove pg_class.relinsertonly. Authored-by: Zhang Mingli [[email protected]](mailto:[email protected])

Fix: fts coredump

2327c30

The upstream code has merged into hashdata cloud, which is missed to handle the serverless fts. Sine there is no mirrr db info in cloud, so we will ignore the log detail, otherwise it will cause coredump.

Support min, max in Append Agg Plan.

5f31cab

IVM has enabled min, max function with partial agg results. They have no difference with others like count, sum and should be able for Append Agg Plan. Add cases for that. Authored-by: Zhang Mingli [[email protected]](mailto:[email protected])

Add create_plan_hook and cursor option

d865aa0

IVM with partial do not use vectorization plan.

[CLOUD] Enable delta scan vectorization

ab95372

IVM with partial agg fallback to normal plan. Fix corner case of version 0. Fix tablespace.

Fix: Retry get hostip from DNS if failed

6c33730

Due to unstable DNS service in cloud environment, retry 10 second to get the hostip from DNS.

[CLOUD] Remove unionstore extention from segment

64ccba0

disable autovacuum temporarily

bef94eb

disable autovacuum temporarily because of the flacky regression test 'vacuum.sql'

Add MACRO 'FAULT_INJECTOR'/'SERVERLESS' to related code

eaeb741

Support COUNT with const value in Append AGG plan.

3afd6ce

Support count(n) where n is a const value, user's SQL has something like that. select count(1) has no difference with selelct count(*). Authored-by: Zhang Mingli [email protected]

Forbid connecting to QE in utility mode

88c92ac

This is part of commit "[CLOUD] Enable start QD in utility" but changes the CBDB.

Make column case-sensitivity only for output column.

ecbb91a

Co-authored-by: Wei Shaolun [email protected]

Clean pg_cron guc optimizer,enable_answer_query_using_materialized_vi…

779bb8c

…ews. * offload two Guc to ivm modules. * Clean up task dependencies.

Build db without cloud extension

2226561

Support compile and deploy db without cloud extension source code. Add a ci for compilation check as preparation for regression and isolation2 test ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pick cloud2 #2

Pick cloud2 #2

Uh oh!

oppenheimer01 commented Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Pick cloud2 #2

Are you sure you want to change the base?

Pick cloud2 #2

Uh oh!

Conversation

oppenheimer01 commented Apr 29, 2025

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

CI Skip Instructions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants