Key Default Type Description
table.optimizer.agg-phase-strategy

Batch Streaming
AUTO

Enum

Strategy for aggregate phase. Only AUTO, TWO_PHASE or ONE_PHASE can be set. AUTO: No special enforcer for aggregate stage. Whether to choose two stage aggregate or one stage aggregate depends on cost. TWO_PHASE: Enforce to use two stage aggregate which has localAggregate and globalAggregate. Note that if aggregate call does not support optimize into two phase, we will still use one stage aggregate. ONE_PHASE: Enforce to use one stage aggregate which only has CompleteGlobalAggregate.

Possible values:
  • "AUTO"
  • "ONE_PHASE"
  • "TWO_PHASE"
table.optimizer.bushy-join-reorder-threshold

Batch Streaming
12 Integer The maximum number of joined nodes allowed in the bushy join reorder algorithm, otherwise the left-deep join reorder algorithm will be used. The search space of bushy join reorder algorithm will increase with the increase of this threshold value, so this threshold is not recommended to be set too large. The default value is 12.
table.optimizer.distinct-agg.split.bucket-num

Streaming
1024 Integer Configure the number of buckets when splitting distinct aggregation. The number is used in the first level aggregation to calculate a bucket key 'hash_code(distinct_key) % BUCKET_NUM' which is used as an additional group key after splitting.
table.optimizer.distinct-agg.split.enabled

Streaming
false Boolean Tells the optimizer whether to split distinct aggregation (e.g. COUNT(DISTINCT col), SUM(DISTINCT col)) into two level. The first aggregation is shuffled by an additional key which is calculated using the hashcode of distinct_key and number of buckets. This optimization is very useful when there is data skew in distinct aggregation and gives the ability to scale-up the job. Default is false.
table.optimizer.dynamic-filtering.enabled

Batch Streaming
true Boolean When it is true, the optimizer will try to push dynamic filtering into scan table source, the irrelevant partitions or input data will be filtered to reduce scan I/O in runtime.
table.optimizer.incremental-agg-enabled

Streaming
true Boolean When both local aggregation and distinct aggregation splitting are enabled, a distinct aggregation will be optimized into four aggregations, i.e., local-agg1, global-agg1, local-agg2, and global-agg2. We can combine global-agg1 and local-agg2 into a single operator (we call it incremental agg because it receives incremental accumulators and outputs incremental results). In this way, we can reduce some state overhead and resources. Default is enabled.
table.optimizer.join-reorder-enabled

Batch Streaming
false Boolean Enables join reorder in optimizer. Default is disabled.
table.optimizer.join.broadcast-threshold

Batch
1048576 Long Configures the maximum size in bytes for a table that will be broadcast to all worker nodes when performing a join. By setting this value to -1 to disable broadcasting.
table.optimizer.multiple-input-enabled

Batch
true Boolean When it is true, the optimizer will merge the operators with pipelined shuffling into a multiple input operator to reduce shuffling and improve performance. Default value is true.
table.optimizer.non-deterministic-update.strategy

Streaming
IGNORE

Enum

When it is `TRY_RESOLVE`, the optimizer tries to resolve the correctness issue caused by 'Non-Deterministic Updates' (NDU) in a changelog pipeline. Changelog may contain kinds of message types: Insert (I), Delete (D), Update_Before (UB), Update_After (UA). There's no NDU problem in an insert only changelog pipeline. For updates, there are three main NDU problems:
1. Non-deterministic functions, include scalar, table, aggregate functions, both builtin and custom ones.
2. LookupJoin on an evolving source
3. Cdc-source carries metadata fields which are system columns, not belongs to the entity data itself.

For the first step, the optimizer automatically enables the materialization for No.2(LookupJoin) if needed, and gives the detailed error message for No.1(Non-deterministic functions) and No.3(Cdc-source with metadata) which is relatively easier to solve by changing the SQL.
Default value is `IGNORE`, the optimizer does no changes.

Possible values:
  • "TRY_RESOLVE"
  • "IGNORE"
table.optimizer.reuse-optimize-block-with-digest-enabled

Batch Streaming
false Boolean When true, the optimizer will try to find out duplicated sub-plans by digest to build optimize blocks (a.k.a. common sub-graphs). Each optimize block will be optimized independently.
table.optimizer.reuse-source-enabled

Batch Streaming
true Boolean When it is true, the optimizer will try to find out duplicated table sources and reuse them. This works only when table.optimizer.reuse-sub-plan-enabled is true.
table.optimizer.reuse-sub-plan-enabled

Batch Streaming
true Boolean When it is true, the optimizer will try to find out duplicated sub-plans and reuse them.
table.optimizer.runtime-filter.enabled

Batch
false Boolean A flag to enable or disable the runtime filter. When it is true, the optimizer will try to inject a runtime filter for eligible join.
table.optimizer.runtime-filter.max-build-data-size

Batch
150 mb MemorySize Max data volume threshold of the runtime filter build side. Estimated data volume needs to be under this value to try to inject runtime filter.
table.optimizer.runtime-filter.min-filter-ratio

Batch
0.5 Double Min filter ratio threshold of the runtime filter. Estimated filter ratio needs to be over this value to try to inject runtime filter.
table.optimizer.runtime-filter.min-probe-data-size

Batch
10 gb MemorySize Min data volume threshold of the runtime filter probe side. Estimated data volume needs to be over this value to try to inject runtime filter.This value should be larger than table.optimizer.runtime-filter.max-build-data-size.
table.optimizer.source.report-statistics-enabled

Batch Streaming
true Boolean When it is true, the optimizer will collect and use the statistics from source connectors if the source extends from SupportsStatisticReport and the statistics from catalog is UNKNOWN.Default value is true.
table.optimizer.sql2rel.project-merge.enabled

Batch Streaming
false Boolean If set to true, it will merge projects when converting SqlNode to RelNode.
Note: it is not recommended to turn on unless you are aware of possible side effects, such as causing the output of certain non-deterministic expressions to not meet expectations(see FLINK-20887).
table.optimizer.union-all-as-breakpoint-enabled

Batch Streaming
true Boolean When true, the optimizer will breakup the graph at union-all node when it's a breakpoint. When false, the optimizer will skip the union-all node even it's a breakpoint, and will try find the breakpoint in its inputs.