YARN-11960. Fix NodeManager OOM caused by large client.json#8473
YARN-11960. Fix NodeManager OOM caused by large client.json#8473magnuma3 wants to merge 2 commits into
Conversation
5c7caee to
9004952
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Pull request overview
This PR addresses a NodeManager OOM risk when a large Docker config.json (referenced via YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG) is read and parsed by adding a configurable maximum file size limit and covering the behavior with a unit test.
Changes:
- Add a new YARN configuration key to cap the Docker client config file size (default 1024 bytes).
- Enforce the size limit when reading Docker client credentials from
config.json, rejecting oversized files. - Add a NodeManager Docker runtime unit test that verifies oversized configs are rejected.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/.../TestDockerContainerRuntime.java |
Adds a test validating rejection of oversized Docker client config files. |
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/.../DockerLinuxContainerRuntime.java |
Adjusts error reporting when additional Docker client config credentials cannot be read. |
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml |
Documents the new max-size configuration property and its default. |
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/.../DockerClientConfigHandler.java |
Implements the file-size check before reading/parsing config.json. |
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/.../YarnConfiguration.java |
Defines the new configuration constants for the max Docker client config size. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| int configFileMaxSizeByte = conf.getInt(YarnConfiguration.NM_DOCKER_CLIENT_CONFIG_MAX_SIZE_BYTE, | ||
| YarnConfiguration.DEFAULT_NM_DOCKER_CLIENT_CONFIG_MAX_SIZE_BYTE); | ||
| if (configFileMaxSizeByte <= 0){ |
| FileStatus filestatus = fs.getFileStatus(configFile); | ||
| if(filestatus.getLen() > configFileMaxSizeByte){ | ||
| exceptionMessage = "configFile (" + filestatus.getLen() | ||
| + " bytes) is larger than max limitation (" | ||
| + configFileMaxSizeByte + " bytes): "; | ||
| } else { |
| FSDataInputStream fileHandle = null; | ||
| try { | ||
| fileHandle = fs.open(configFile); | ||
| if (fileHandle != null) { | ||
| contents = IOUtils.toString(fileHandle, Charset.forName("UTF-8")); | ||
| } | ||
| } catch (IOException e) { | ||
| throw new IOException("Failed to read Docker client configuration: " | ||
| + configFile + ": " + e); | ||
| } finally { | ||
| if (fileHandle != null) { | ||
| fileHandle.close(); | ||
| } | ||
| } |
| } catch (IOException e) { | ||
| throw new RuntimeException( | ||
| "Fail to read additional docker client config file from " + clientConfig); | ||
| "Fail to read additional docker client config file: " + e.getMessage()); |
| BufferedWriter bw = new BufferedWriter(new FileWriter(file)); | ||
| bw.write("x".repeat(1025)); | ||
| bw.close(); |
| initHttps(pHttps); | ||
| try { | ||
| testLaunchContainer(null, file); | ||
| fail(); | ||
| } catch (RuntimeException e) { | ||
| assertThat(e.getMessage()) | ||
| .contains("configFile (1025 bytes) is larger than max limitation (1024 bytes)"); | ||
| } |
| public static final String NM_DOCKER_CLIENT_CONFIG_MAX_SIZE_BYTE = | ||
| DOCKER_CONTAINER_RUNTIME_PREFIX + "client-config-file-max-size-bytes"; | ||
| public static final int DEFAULT_NM_DOCKER_CLIENT_CONFIG_MAX_SIZE_BYTE = 1024; | ||
|
|
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
74630e7 to
5d08782
Compare
|
💔 -1 overall
This message was automatically generated. |
5d08782 to
477e923
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
Description of PR
In Spark or MapReduce jobs, when accessing private Docker registry, Docker config.json path can be set by below environment variable:
NodeManager reads this file and writes to local filesystem as config.json. But current implementation reads whole file to memory and do JSON parsing, then writes it.
If large file is configured, NodeManager can occur OOM.
So file size limit should be added.
How was this patch tested?
unit test
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?AI Tooling
If an AI tool was used:
where is the name of the AI tool used.
https://www.apache.org/legal/generative-tooling.html