Skip to content

feat(aws-cdk): surface custom resource Lambda logs on deployment failure#1645

Draft
iankhou wants to merge 7 commits into
mainfrom
iankhou-custom-resource-logs
Draft

feat(aws-cdk): surface custom resource Lambda logs on deployment failure#1645
iankhou wants to merge 7 commits into
mainfrom
iankhou-custom-resource-logs

Conversation

@iankhou

@iankhou iankhou commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

When a custom resource (Custom::* or AWS::CloudFormation::CustomResource) fails to deploy, fetch the backing Lambda's CloudWatch logs and surface them as additional diagnostic context, mirroring the ECS service investigation.

  • Resolve the backing Lambda by reading the resource's ServiceToken from the stack's original template (literal ARN, Fn::GetAtt, or Ref via describeStackResources).
  • Derive the log group from the /aws/lambda/ convention; only call getFunctionConfiguration to read a custom LoggingConfig.LogGroup if the convention group has no events. (cfn-response usage does not imply the default group; advanced logging controls can override it.)
  • Target the exact failing invocation by extracting the log stream name from the cfn-response status reason ("See the details in CloudWatch Log Stream: ").
  • Bound the query to a window around the failure event timestamp so update and rollback failures resolve the right invocation, not the latest stream. Adds an optional ResourceError.timestamp for this.
  • Add getFunctionConfiguration to the Lambda SDK client.

All exploratory calls are best-effort: failures are logged at debug level and never break diagnosis. Verified end-to-end against a live cfn-response custom resource deployment failure.

Fixes #

Checklist

  • This change contains a major version upgrade for a dependency and I confirm all breaking changes are addressed
    • Release notes for the new version:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

…failure

When a custom resource (Custom::* or AWS::CloudFormation::CustomResource) fails
to deploy, fetch the backing Lambda's CloudWatch logs and surface them as
additional diagnostic context, mirroring the ECS service investigation.

- Resolve the backing Lambda by reading the resource's ServiceToken from the
  stack's original template (literal ARN, Fn::GetAtt, or Ref via
  describeStackResources).
- Derive the log group from the /aws/lambda/<fn> convention; only call
  getFunctionConfiguration to read a custom LoggingConfig.LogGroup if the
  convention group has no events. (cfn-response usage does not imply the default
  group; advanced logging controls can override it.)
- Target the exact failing invocation by extracting the log stream name from the
  cfn-response status reason ("See the details in CloudWatch Log Stream: <name>").
- Bound the query to a window around the failure event timestamp so update and
  rollback failures resolve the right invocation, not the latest stream. Adds an
  optional ResourceError.timestamp for this.
- Add getFunctionConfiguration to the Lambda SDK client.

All exploratory calls are best-effort: failures are logged at debug level and
never break diagnosis. Verified end-to-end against a live cfn-response custom
resource deployment failure.
@github-actions github-actions Bot added the p2 label Jun 18, 2026
@iankhou iankhou changed the title feat(toolkit-lib): surface custom resource Lambda logs on deployment failure feat(aws-cdk): surface custom resource Lambda logs on deployment failure Jun 18, 2026
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@aws-cdk-automation aws-cdk-automation requested a review from a team June 18, 2026 21:28
@iankhou iankhou requested a review from Copilot June 18, 2026 21:28
@iankhou iankhou requested a review from Copilot June 19, 2026 17:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

…ive function

A first-deploy custom resource failure rolls back and deletes the backing
Lambda, so getFunctionConfiguration is unavailable exactly when we need the
configured (advanced-logging) log group. Read LoggingConfig.LogGroup from the
stack template instead — the template survives rollback — handling both a
literal value and the common CDK shape where it is a Ref to an
AWS::Logs::LogGroup resource. Fall back to the live function configuration only
when the template value is an unresolvable intrinsic or the function is defined
outside this stack.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment on lines +541 to +543
const resp = await cfn.describeStackResources({ StackName: stackName, LogicalResourceId: referencedLogicalId });
const physicalId = resp.StackResources?.[0]?.PhysicalResourceId;
return physicalId ? functionNameFromArnOrName(physicalId) : undefined;
Comment on lines +492 to +494
try {
const resp = await cfn.getTemplate({ StackName: stackName });
if (!resp.TemplateBody) {
iankhou added 2 commits June 19, 2026 15:00
…g group names

The template-sourced log group only handled a literal LogGroupName. The common
CDK case is an AWS::Logs::LogGroup with no explicit name, where CloudFormation
generates the physical name — absent from the template. Resolve that name via
describeStackResources (which still returns RETAINed/orphaned resources after a
rollback), so advanced-logging functions whose backing Lambda was deleted by
rollback still surface their logs. Extract a shared resolvePhysicalId helper used
by both the ServiceToken and log-group resolution paths.

Verified end-to-end against a live advanced-logging custom resource deployment.
Parse Lambda CloudWatch log events into readable lines before rendering, on the
custom-resource path only (ECS logs are arbitrary container output and are left
as-is). Handles both Lambda log formats:

- Text: strip the per-line timestamp/requestId prefix, render aligned
  'LEVEL message'.
- JSON: render the level + message (and error envelopes' errorMessage +
  stackTrace) instead of dumping raw JSON objects.

Drops only Lambda platform boilerplate (INIT_START/START/END/REPORT and JSON
platform.* events). Application output is never dropped by level — failure
detail often rides in INFO lines (e.g. the cfn-response 'Response body'). Lines
we don't recognize pass through verbatim, and full logs remain available via the
console link.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants