Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/content.zh/docs/connectors/table/formats/debezium.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Flink 还支持将 Flink SQL 中的 INSERT / UPDATE / DELETE 消息编码为 Deb

{{< sql_download_table "debezium-json" >}}

*注意: 请参考 [Debezium 文档](https://debezium.io/documentation/reference/1.3/index.html),了解如何设置 Debezium Kafka Connect 用来将变更日志同步到 Kafka 主题。*
*注意: 请参考 [Debezium 文档](https://debezium.io/documentation/reference/stable/index.html),了解如何设置 Debezium Kafka Connect 用来将变更日志同步到 Kafka 主题。*


如何使用 Debezium Format
Expand Down Expand Up @@ -82,7 +82,7 @@ Debezium 为变更日志提供了统一的格式,这是一个 JSON 格式的
}
```

*注意: 请参考 [Debezium 文档](https://debezium.io/documentation/reference/1.3/connectors/mysql.html#mysql-connector-events_debezium),了解每个字段的含义。*
*注意: 请参考 [Debezium 文档](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-events),了解每个字段的含义。*

MySQL 产品表有4列(`id`、`name`、`description`、`weight`)。上面的 JSON 消息是 `products` 表上的一条更新事件,其中 `id = 111` 的行的 `weight` 值从 `5.18` 更改为 `5.15`。假设此消息已同步到 Kafka 主题 `products_binlog`,则可以使用以下 DDL(用于 Debezium JSON 和 Debezium Confluent Avro)来消费此主题并解析更改事件。

Expand Down Expand Up @@ -476,12 +476,12 @@ Flink 提供了 `debezium-avro-confluent` 和 `debezium-json` 两种 format 来

### 消费 Debezium Postgres Connector 产生的数据

如果你正在使用 [Debezium PostgreSQL Connector](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html) 捕获变更到 Kafka,请确保被监控表的 [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY) 已经被配置成 `FULL` 了,默认值是 `DEFAULT`。
如果你正在使用 [Debezium PostgreSQL Connector](https://debezium.io/documentation/reference/stable/connectors/postgresql.html) 捕获变更到 Kafka,请确保被监控表的 [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY) 已经被配置成 `FULL` 了,默认值是 `DEFAULT`。
否则,Flink SQL 将无法正确解析 Debezium 数据。

当配置为 `FULL` 时,更新和删除事件将完整包含所有列的之前的值。当为其他配置时,更新和删除事件的 "before" 字段将只包含 primary key 字段的值,或者为 null(没有 primary key)。
你可以通过运行 `ALTER TABLE <your-table-name> REPLICA IDENTITY FULL` 来更改 `REPLICA IDENTITY` 的配置。
请阅读 [Debezium 关于 PostgreSQL REPLICA IDENTITY 的文档](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html#postgresql-replica-identity) 了解更多。
请阅读 [Debezium 关于 PostgreSQL REPLICA IDENTITY 的文档](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-replica-identity) 了解更多。

数据类型映射
----------------
Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/connectors/table/formats/ogg.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ Ogg 为变更日志提供了统一的格式, 这是一个 JSON 格式的从 Orac
}
```

*注意:请参考 [Debezium documentation](https://debezium.io/documentation/reference/1.3/connectors/oracle.html#oracle-events)
*注意:请参考 [Debezium documentation](https://debezium.io/documentation/reference/stable/connectors/oracle.html#oracle-events)
了解每个字段的含义.*

Oracle `PRODUCTS` 表 有 4 列 (`id`, `name`, `description` and `weight`). 上面的 JSON 消息是 `PRODUCTS` 表上的一条更新事件,其中 `id = 111` 的行的
Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/deployment/adaptive_batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,6 @@ Adaptive Batch Scheduler 默认启用 Skewed Join 优化,你可以通过配
- **只支持 AdaptiveBatchScheduler**: 不过由于 Adaptive Batch Scheduler 是 Flink 默认的批作业调度器,无需额外配置。除非用户显式的配置了使用其他调度器,例如 [`jobmanager.scheduler: default`]。
- **只支持所有数据交换都为 BLOCKING 或 HYBRID 模式的作业**: 目前 Adaptive Batch Scheduler 只支持 [shuffle mode]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) 为 ALL_EXCHANGES_BLOCKING 或 ALL_EXCHANGES_HYBRID_FULL 或 ALL_EXCHANGES_HYBRID_SELECTIVE 的作业。请注意,使用 DataSet API 的作业无法识别上述 shuffle 模式,需要将 ExecutionMode 设置为 BATCH_FORCED 才能强制启用 BLOCKING shuffle。
- **不支持 FileInputFormat 类型的 source**: 不支持 FileInputFormat 类型的 source, 包括 `StreamExecutionEnvironment#readFile(...)` 和 `StreamExecutionEnvironment#createInput(FileInputFormat, ...)`。 当使用 Adaptive Batch Scheduler 时,用户应该使用新版的 Source API ([FileSystem DataStream Connector]({{< ref "docs/connectors/datastream/filesystem.md" >}}) 或 [FileSystem SQL Connector]({{< ref "docs/connectors/table/filesystem.md" >}})) 来读取文件.
- **Web UI 上展示的上游输出的数据量和下游收到的数据量可能不一致**: 在使用 Adaptive Batch Scheduler 自动推导并行度时,对于 broadcast 边,上游发送的数据量是基于下游最大并行度估算的结果,与下游算子实际接收的数据量可能会不相等,这在 Web UI 的显示上可能会困扰用户。细节详见 [FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)。
- **Web UI 上展示的上游输出的数据量和下游收到的数据量可能不一致**: 在使用 Adaptive Batch Scheduler 自动推导并行度时,对于 broadcast 边,上游发送的数据量是基于下游最大并行度估算的结果,与下游算子实际接收的数据量可能会不相等,这在 Web UI 的显示上可能会困扰用户。细节详见 [FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Scheduler)。

{{< top >}}
2 changes: 1 addition & 1 deletion docs/content.zh/docs/deployment/ha/zookeeper_ha.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ This behaviour might be too disruptive in some cases (e.g., unstable network env
If you are willing to take a more aggressive approach, then you can tolerate suspended ZooKeeper connections and only treat lost connections as an error via [high-availability.zookeeper.client.tolerate-suspended-connections]({{< ref "docs/deployment/config" >}}#high-availability-zookeeper-client-tolerate-suspended-connection).
Enabling this feature will make Flink more resilient against temporary connection problems but also increase the risk of running into ZooKeeper timing problems.

For more information take a look at [Curator's error handling](https://curator.apache.org/errors.html).
For more information take a look at [Curator's error handling](https://curator.apache.org/docs/errors).

<a name="bootstrap-zookeeper" />

Expand Down
6 changes: 3 additions & 3 deletions docs/content.zh/docs/deployment/speculative_execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ public interface SupportsHandleExecutionAttemptSourceEvent {
这意味着 SplitEnumerator 需要知道是哪个执行实例发出了这个事件。否则,JobManager 会在收到 SourceEvent 的时候报错从而导致作业失败。

除此之外的 Source 不需要额外的改动就可以进行预测执行,包括
{{< gh_link file="/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SourceFunction.java" name="SourceFunction Source" >}},
{{< gh_link file="/flink-runtime/src/main/java/org/apache/flink/streaming/api/functions/source/legacy/SourceFunction.java" name="SourceFunction Source" >}},
{{< gh_link file="/flink-core/src/main/java/org/apache/flink/api/common/io/InputFormat.java" name="InputFormat Source" >}},
和 {{< gh_link file="/flink-core/src/main/java/org/apache/flink/api/connector/source/Source.java" name="新版 Source" >}}.
Apache Flink 官方提供的 Source 都支持预测执行。
Expand All @@ -91,7 +91,7 @@ public interface SupportsConcurrentExecutionAttempts {}
```
接口 {{< gh_link file="/flink-core/src/main/java/org/apache/flink/api/common/SupportsConcurrentExecutionAttempts.java" name="SupportsConcurrentExecutionAttempts" >}}
适用于 {{< gh_link file="/flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java" name="Sink" >}}
,{{< gh_link file="/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/SinkFunction.java" name="SinkFunction" >}}
,{{< gh_link file="/flink-runtime/src/main/java/org/apache/flink/streaming/api/functions/sink/legacy/SinkFunction.java" name="SinkFunction" >}}
以及 {{< gh_link file="/flink-core/src/main/java/org/apache/flink/api/common/io/OutputFormat.java" name="OutputFormat" >}}。

{{< hint info >}}
Expand All @@ -101,7 +101,7 @@ public interface SupportsConcurrentExecutionAttempts {}
{{< hint info >}}
对于 {{< gh_link file="/flink-core/src/main/java/org/apache/flink/api/connector/sink2/Sink.java" name="Sink" >}} 实现,
Flink 会关闭 {{< gh_link file="/flink-core/src/main/java/org/apache/flink/api/connector/sink2/Committer.java" name="Committer" >}} 的预测执行,
(包括被 {{< gh_link file="/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/connector/sink2/WithPreCommitTopology.java" name="WithPreCommitTopology" >}} 和 {{< gh_link file="/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/connector/sink2/WithPostCommitTopology.java" name="WithPostCommitTopology" >}} 扩展的算子)。
(包括被 {{< gh_link file="/flink-runtime/src/main/java/org/apache/flink/streaming/api/connector/sink2/SupportsPreCommitTopology.java" name="SupportsPreCommitTopology" >}} 和 {{< gh_link file="/flink-runtime/src/main/java/org/apache/flink/streaming/api/connector/sink2/SupportsPostCommitTopology.java" name="SupportsPostCommitTopology" >}} 扩展的算子)。
因为如果用户对并行提交理解不深的话,这里可能会引起意料之外的问题。另外一个原因是提交的部分往往不是批作业的瓶颈所在。
{{< /hint >}}

Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/dev/configuration/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ repositories {
}
// NOTE: We cannot use "compileOnly" or "shadow" configurations since then we could not run code
// in the IDE or with "gradle run". We also cannot exclude transitive dependencies from the
// shadowJar yet (see https://github.com/johnrengelman/shadow/issues/159).
// shadowJar yet (see https://github.com/GradleUp/shadow/issues/159).
// -> Explicitly define the // libraries we want to be included in the "flinkShadowJar" configuration!
configurations {
flinkShadowJar // dependencies which go into the shadowJar
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Flink 基于下面的规则来支持 [POJO 类型]({{< ref "docs/dev/datastream/
### Avro 类型

Flink 完全支持 Avro 状态类型的升级,只要数据结构的修改是被
[Avro 的数据结构解析规则](http://avro.apache.org/docs/current/spec.html#Schema+Resolution)认为兼容的即可。
[Avro 的数据结构解析规则](https://avro.apache.org/docs/current/specification/)认为兼容的即可。

一个例外是如果新的 Avro 数据 schema 生成的类无法被重定位或者使用了不同的命名空间,在作业恢复时状态数据会被认为是不兼容的。

Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/dev/datastream/operators/windows.md
Original file line number Diff line number Diff line change
Expand Up @@ -1107,7 +1107,7 @@ Flink 包含一些内置 trigger。
* `PurgingTrigger` 接收另一个 trigger 并将它转换成一个会清理数据的 trigger。

如果你需要实现自定义的 trigger,你应该看看这个抽象类
{{< gh_link file="/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/triggers/Trigger.java" name="Trigger" >}} 。
{{< gh_link file="/flink-runtime/src/main/java/org/apache/flink/streaming/api/windowing/triggers/Trigger.java" name="Trigger" >}} 。
请注意,这个 API 仍在发展,所以在之后的 Flink 版本中可能会发生变化。

## Evictors
Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/dev/table/table_environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ These APIs are used to create/remove Table API/SQL Tables and write queries:
| `executeSql(stmt)` | `execute_sql(stmt)` | Executes the given SQL statement and returns the execution result. Supports DDL/DML/DQL/SHOW/DESCRIBE/EXPLAIN/USE statements. |
| `sqlQuery(query)` | `sql_query(query)` | Evaluates a SQL query and retrieves the result as a Table. |

See the [Java API docs](https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/table/api/TableEnvironment.html) and [Python API docs](https://nightlies.apache.org/flink/flink-docs-master/api/python/reference/pyflink.table/api/pyflink.table.TableEnvironment.html) for complete API reference.
See the [Java API docs](https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/table/api/TableEnvironment.html) and [Python API docs](https://nightlies.apache.org/flink/flink-docs-master/api/python/reference/pyflink.table/table_environment.html) for complete API reference.

<big><strong>Deprecated APIs</strong></big>

Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/sql/hive-compatibility/hiveserver2.md
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@ After the setup, you can explore Flink with DBeaver.
### Apache Superset

Apache Superset is a powerful data exploration and visualization platform. With the API compatibility, you can connect
to the Flink SQL Gateway like Hive. Please refer to the [guidance](https://superset.apache.org/docs/databases/hive) for more details.
to the Flink SQL Gateway like Hive. Please refer to the [guidance](https://superset.apache.org/user-docs/databases/supported/apache-hive/) for more details.

{{< img width="80%" src="/fig/apache_superset.png" alt="Apache Superset" >}}

Expand Down
8 changes: 4 additions & 4 deletions docs/content/docs/connectors/table/formats/debezium.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Dependencies
{{< sql_download_table "debezium-json" >}}


*Note: please refer to [Debezium documentation](https://debezium.io/documentation/reference/1.3/index.html) about how to setup a Debezium Kafka Connect to synchronize changelog to Kafka topics.*
*Note: please refer to [Debezium documentation](https://debezium.io/documentation/reference/stable/index.html) about how to setup a Debezium Kafka Connect to synchronize changelog to Kafka topics.*


How to use Debezium format
Expand Down Expand Up @@ -82,7 +82,7 @@ Debezium provides a unified format for changelog, here is a simple example for a
}
```

*Note: please refer to [Debezium documentation](https://debezium.io/documentation/reference/1.3/connectors/mysql.html#mysql-connector-events_debezium) about the meaning of each fields.*
*Note: please refer to [Debezium documentation](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-events) about the meaning of each fields.*

The MySQL `products` table has 4 columns (`id`, `name`, `description` and `weight`). The above JSON message is an update change event on the `products` table where the `weight` value of the row with `id = 111` is changed from `5.18` to `5.15`.
Assuming this messages is synchronized to Kafka topic `products_binlog`, then we can use the following DDLs (for Debezium JSON and Debezium Confluent Avro) to consume this topic and interpret the change events.
Expand Down Expand Up @@ -471,12 +471,12 @@ Framework will generate an additional stateful operator, and use the primary key

### Consuming data produced by Debezium Postgres Connector

If you are using [Debezium Connector for PostgreSQL](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html) to capture the changes to Kafka, please make sure the [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY) configuration of the monitored PostgreSQL table has been set to `FULL` which is by default `DEFAULT`.
If you are using [Debezium Connector for PostgreSQL](https://debezium.io/documentation/reference/stable/connectors/postgresql.html) to capture the changes to Kafka, please make sure the [REPLICA IDENTITY](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-CREATETABLE-REPLICA-IDENTITY) configuration of the monitored PostgreSQL table has been set to `FULL` which is by default `DEFAULT`.
Otherwise, Flink SQL currently will fail to interpret the Debezium data.

In `FULL` strategy, the UPDATE and DELETE events will contain the previous values of all the table’s columns. In other strategies, the "before" field of UPDATE and DELETE events will only contain primary key columns or null if no primary key.
You can change the `REPLICA IDENTITY` by running `ALTER TABLE <your-table-name> REPLICA IDENTITY FULL`.
See more details in [Debezium Documentation for PostgreSQL REPLICA IDENTITY](https://debezium.io/documentation/reference/1.2/connectors/postgresql.html#postgresql-replica-identity).
See more details in [Debezium Documentation for PostgreSQL REPLICA IDENTITY](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-replica-identity).

Data Type Mapping
----------------
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/connectors/table/formats/ogg.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ captured from an Oracle `PRODUCTS` table in JSON format:
```

*Note: please refer
to [Debezium documentation](https://debezium.io/documentation/reference/1.3/connectors/oracle.html#oracle-events)
to [Debezium documentation](https://debezium.io/documentation/reference/stable/connectors/oracle.html#oracle-events)
about the meaning of each field.*

The Oracle `PRODUCTS` table has 4 columns (`id`, `name`, `description` and `weight`). The above JSON
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/deployment/adaptive_batch.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,6 @@ Additionally, you can adjust the following configurations based on the character
- **AdaptiveBatchScheduler only**: It only takes effect when using the AdaptiveBatchScheduler. And since the Adaptive Batch Scheduler is the default batch job scheduler in Flink, no additional configuration is required unless the user explicitly configures to use another scheduler, such as [`jobmanager.scheduler: default`].
- **BLOCKING or HYBRID jobs only**: At the moment, Adaptive Batch Scheduler only supports jobs whose [shuffle mode]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) is `ALL_EXCHANGES_BLOCKING / ALL_EXCHANGES_HYBRID_FULL / ALL_EXCHANGES_HYBRID_SELECTIVE`.
- **FileInputFormat sources are not supported**: FileInputFormat sources are not supported, including `StreamExecutionEnvironment#readFile(...)` and `StreamExecutionEnvironment#createInput(FileInputFormat, ...)`. Users should use the new sources([FileSystem DataStream Connector]({{< ref "docs/connectors/datastream/filesystem.md" >}}) or [FileSystem SQL Connector]({{< ref "docs/connectors/table/filesystem.md" >}})) to read files when using the Adaptive Batch Scheduler.
- **Inconsistent broadcast results metrics on WebUI**: When use Adaptive Batch Scheduler to automatically decide parallelisms for operators, for broadcast results, the number of bytes/records sent by the upstream task counted by metric is not equal to the number of bytes/records received by the downstream task, which may confuse users when displayed on the Web UI. See [FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler) for details.
- **Inconsistent broadcast results metrics on WebUI**: When use Adaptive Batch Scheduler to automatically decide parallelisms for operators, for broadcast results, the number of bytes/records sent by the upstream task counted by metric is not equal to the number of bytes/records received by the downstream task, which may confuse users when displayed on the Web UI. See [FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Scheduler) for details.

{{< top >}}
2 changes: 1 addition & 1 deletion docs/content/docs/deployment/ha/zookeeper_ha.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ This behaviour might be too disruptive in some cases (e.g., unstable network env
If you are willing to take a more aggressive approach, then you can tolerate suspended ZooKeeper connections and only treat lost connections as an error via [high-availability.zookeeper.client.tolerate-suspended-connections]({{< ref "docs/deployment/config" >}}#high-availability-zookeeper-client-tolerate-suspended-connection).
Enabling this feature will make Flink more resilient against temporary connection problems but also increase the risk of running into ZooKeeper timing problems.

For more information take a look at [Curator's error handling](https://curator.apache.org/errors.html).
For more information take a look at [Curator's error handling](https://curator.apache.org/docs/errors).

## Bootstrap ZooKeeper

Expand Down
Loading