Skip to content

Conversation

@oppenheimer01
Copy link
Owner

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


husen and others added 30 commits April 23, 2025 15:23
core changes:
1. add macro serverless, configure with --enable-serverless
2. add hooks to get control in transaction/dispatch management
3. add transaction processing framework
4. add session state dispatch framework
…ue in plugin.

2. add SimpleLruReadPage_hook for plugin to read SLRU page.
3. add StartChildProcess_hook for plugin to get control in child process startup.
…n catalog is permitted on single master without warehouse.
Currently, we use randomly distribution for hashdata table, and the
number of segments is set to 0. When we query on hashdata table, the
distribution policy's segment number is set to number of segments of
current warehouse.
This commit is mainly used to add extensible smgr slot for other
extension storage format. When we create storage format extension,
will add relevant smgr slot in smgrsw array. Morever,add smgropen
and smgrclose in RelationDropStorage.

authored-by: Zhang Wenchao [email protected]
1. Two hook functions , ext_dml_init_hook and ext_dml_finish_hook have been added. These functions perform some resource initialization and cleanup at the start and end of data modification operations (such as modifyTable, CopyFrom, CreateAs, Matview, etc.)
If we drop a hashdata table, we can not delete the record in main_manifest.
So we make the main_manifest as a catalog table, and add a depedency in pg_depend table.
When we drop a table, the depedency will be deleted too.
Mainly implements extensible libpq protocol in this commit. Morever, imports
extensible ExecStatusType and DispatcherAsyncFuncs which can be extended in
extension. By this way, we can extend these modules in extension as wanted.

authored-by: Zhang Wenchao [email protected]
1. In serverless architecture, we do not need to dispatch the vacuum command.
2. Make T_ExtensibleNode in CMD_TAG list, which is needed by CreateCommandTag inutility.c.
We can not hook it because it executes before standard_ProcessUtility function.

Co-authored-by: roseduan <[email protected]>
apache#210)

Change storage_am related catalog table main_manifest field type from uint32 to uint64,
and change the name from relid to relnode


Co-authored-by: xiaosongwang <[email protected]>
RetrieveRelStorageType add a magic number 7015. we use the am_id(7015) we
assigned to the custom table am, and let the orca optimizer treat this
columnar storage format as AOCS to generate an execution plan
use new struct AnalyzeContext instead of gp_acquire_sample_rows_context to pass analyze context in table_beginscan_analyze
Support Altertable dispatch rewrite hook, do dispatch for every rewrite table remove the original dispatch routine after all the work done on QD

Co-authored-by: xiaosongwang <[email protected]>
1. Hook 'SearchCatCache_hook' for plugins to get control in SearchCatCache.
2. Hook 'ReleaseCatCache_hook' for plugins to get control in ReleaseCatCache.
Hook 'RelationValidation_hook' for plugins to validate the relation in
relcache.
New hook 'getgpsegmentCount_hook' for plugins to get control in getgpsegmentCount.
It's reasonable that main_manifest is not shared like pg_class.
In serverless architecture, implementing trigger the same as foreign table which
use tuplestore to store the tuple is more efficient. Because it is inefficient
to fetch tuple throught its ctid. Besides, in serverless architecture, concurrent
update or delete is not supported. So we can fetch tuple directly without lock
tuple in GetTupleForTrigger.
oppenheimer01 and others added 30 commits April 23, 2025 16:17
This routine is for colllect the catalog for the am, Is can alse use for other purpose.
Is is called before scan_begin.
We use insert and delete to update, it's not used.
For a append agg inside a subquery, we will use its target list to match
materialized view.
However, Postgres will remove unused columns of subquery that do not
exit in upper query in a hacky way: make NUll for target entries.
It will make us fail to match view and rewrite as we only support
exactly match for now.

Workaround for this with GUC if we are allowed to  attmpt to answer query.
1. Support alter warehouse name suspend/resume.
2. Support alter warehouse name options/replace options
3. Change FTS ignore dbid and contentid check when serverless is defined.
…lity

1. remove some useless code and hooks
1. remove MACRO SERVERLESS from common logical code
1. add MACRO SERVERLESS to cloud service related code
In serverless architecture, change the way of using tablespace.
It will coredump when auto analyze the inherited table, this reason is that it will enter
the acquire_sample_rows_dispatcher method:

if (Gp_role == GP_ROLE_DISPATCH && ENABLE_DISPATCH())
{
        return acquire_sample_rows_dispatcher(onerel,
                                            true, /* inherited stats */
                                            elevel,
                                            rows,
                                            targrows,
                                            totalrows,
                                            totaldeadrows);
}

but if we analyze the table manually, we hold dispatch in hashdata_ProcessUtility.
so we will follow the manual analyze logic, just igore this routine in serverlese mode.
CBDB has exposed aqumv_adjust_simple_query to adjust parse tree, remove duplicated codes.
Ignore am_by_tablespace files by the way.
CBDB has added MatviewUsableForAppendAgg() to identify data status is up to date or
is avaliable for Append Agg Plan.

Remove pg_class.relinsertonly.

Authored-by: Zhang Mingli [[email protected]](mailto:[email protected])
The upstream code has merged into hashdata cloud, which is missed to
handle the serverless fts.

Sine there is no mirrr db info in cloud, so we will ignore the log
detail, otherwise it will cause coredump.
IVM has enabled min, max function with partial agg results.
They have no difference with others like count, sum and should be able for Append Agg Plan.
Add cases for that.

Authored-by: Zhang Mingli [[email protected]](mailto:[email protected])
IVM with partial do not use vectorization plan.
IVM with partial agg fallback to normal plan.
Fix corner case of version 0.
Fix tablespace.
Due to unstable DNS service in cloud environment, retry 10 second
to get the hostip from DNS.
When use erreport, it can print stack message in log system. But it
sometimes cannot print some symbols. And some functions especially other
extern functions(for example C++) cannnot display correctly, it should
resolve. So add a hook for finding correct function names.
disable autovacuum temporarily because of the flacky regression test 'vacuum.sql'
Support count(n) where n is a const value, user's SQL
has something like that.
select count(1) has no difference with selelct count(*).

Authored-by: Zhang Mingli [email protected]
For CRATE..AS, taregt table's distribution policy could be derived
from the Query of AS part.

create materialized view cloud_ctas_mv as select a, count(b) from
cloud_ctas_t0 group by a with no data;

The locus of cloud_ctas_mv may be Hashed by cloumn a as it's a agg
with group by.
It's ok for CBDB, but in serverless mode, the underlying data is
random, we can't store a distribution policy with distkeys or
numsegments for that.Else, will get error when we switch to a
cluster with more or less segments.

We have done something in extensions, but it didn't take effect due
to the architecture and process of utility hooks.

This is the last resort in core codes.

Authored-by: Zhang Mingli [email protected]
This is part of commit "[CLOUD] Enable start QD in utility"
but changes the CBDB.
…ews.

* offload two Guc to ivm modules.
* Clean up task dependencies.
Const expressions like where 1 = 1 and a > 1 will be processed
to where a > 1 by planner.
Quals like: 1 = 1 is always TRUE, for a AND expression that's
useless.

But as we store MV's view query as it was originally, the parse
tree processed by planner may not match MV's exactly.
Process that quals during Append AGG to fix.

Authored-by: Zhang Mingli [email protected]
Support compile and deploy db without cloud extension source code.

Add a ci for compilation check as preparation for regression and
isolation2 test ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants