Skip to content

@azure-tools/typespec-ts emitter generates incorrect base64 decoding for bytes responses with non-binary content types (e.g. application/xml) #3761

@chahibi

Description

@chahibi

Copilot-generated bug report

@azure-tools/typespec-ts emitter generates incorrect base64 decoding for bytes responses with non-binary content types (e.g. application/xml)

  • Package Name: @azure/planetarycomputer
  • Package Version: 1.0.0-beta.1
  • Operating system: Windows 11
  • nodejs
    • version: 22.x
  • browser
    • name/version:
  • typescript
    • version: 5.x
  • Is the bug related to documentation in

Describe the bug

When a TypeSpec operation declares a response body as bytes with a non-binary content type like application/xml, the @azure-tools/typespec-ts v0.49.0 modular emitter generates a deserializer that incorrectly treats the response body as base64-encoded data. The generated code calls stringToUint8Array(result.body, "base64") on what is actually plain UTF-8 text (XML), producing garbled binary output.

This affects the Planetary Computer SDK's WMTS capabilities endpoints, whose TypeSpec definition is:

// From models.tiler.common.tsp in azure-rest-api-specs
model WmtsCapabilitiesXmlResponse {
  @statusCode statusCode: 200;
  @body body: bytes;
  @header contentType: "application/xml";
}

These endpoints return XML text (a WMTS capabilities document), not base64-encoded binary data.

To Reproduce

  1. Generate the @azure/planetarycomputer SDK from the TypeSpec spec at specification/orbital/Microsoft.PlanetaryComputer (commit e19c31f75d537aa3fb4bd926e02d4968ee83910b)
  2. Call getMosaicsWmtsCapabilities(...) or getWmtsCapabilities(...) — any operation returning WmtsCapabilitiesXmlResponse
  3. Observe that the returned Uint8Array contains garbled data instead of valid XML

Expected behavior

The returned Uint8Array should contain the UTF-8 bytes of the XML capabilities document. Decoding it with new TextDecoder().decode(result) should produce a valid XML string starting with <?xml version="1.0" or <Capabilities.

Screenshots

N/A

Additional context

Generated code (incorrect)

The emitter generates:

// In src/api/data/operations.ts
export async function _getMosaicsWmtsCapabilitiesDeserialize(
  result: PathUncheckedResponse,
): Promise<Uint8Array> {
  const expectedStatuses = ["200"];
  if (!expectedStatuses.includes(result.status)) {
    throw createRestError(result);
  }

  return typeof result.body === "string"
    ? stringToUint8Array(result.body, "base64")  // ❌ WRONG: treats XML text as base64
    : result.body;
}

Expected generated code

return typeof result.body === "string"
  ? new TextEncoder().encode(result.body)  // ✅ CORRECT: encodes XML text as UTF-8 bytes
  : result.body;

Root cause analysis

The bug is in the deserializeResponseValue function in packages/typespec-ts/src/modular/helpers/operationHelpers.ts:

case "bytes":
  if (format !== "binary" && format !== "bytes") {
    return `${nullOrUndefinedPrefix}typeof ${restValue} === 'string'
      ? ${stringToUint8ArrayReference}(${restValue}, "${format ?? "base64"}")
      : ${restValue}`;
  }
  return restValue;

When the format is not "binary" or "bytes", the code defaults to base64 decoding (format ?? "base64"). The format is determined upstream by isBinaryPayload(), which only returns true for content types classified as KnownMediaType.Binary (audio, image, video, octet-stream). Since application/xml is classified as KnownMediaType.Xml (not Binary), the emitter passes a non-"binary" format to deserializeResponseValue, which then defaults to base64.

The logic chain is:

  1. getDeserializePrivateFunction (line ~362) checks isBinaryPayload(context, response.type!.__raw!, contentTypes) to determine the format
  2. isBinaryPayload (in operationUtil.ts) only returns true for KnownMediaType.Binary content types
  3. application/xmlKnownMediaType.Xml → not binary → format is undefined
  4. deserializeResponseValue gets format = undefined → falls into the bytes case → uses format ?? "base64" → generates stringToUint8Array(body, "base64")

However, when the response is received over HTTP with Content-Type: application/xml, the body is plain text (not base64-encoded). The correct behavior would be to encode the string as UTF-8 bytes using new TextEncoder().encode().

Scope of impact

This affects any TypeSpec API that declares @body body: bytes with a non-binary content type such as application/xml, text/plain, text/html, etc. The generated deserializer will corrupt the response data by attempting to base64-decode plain text.

Binary content types (like image/png, application/octet-stream) are handled correctly because they go through the getBinaryResponse streaming path, bypassing this deserializer entirely.

Current workaround

Manual post-generation patch replacing stringToUint8Array(result.body, "base64") with new TextEncoder().encode(result.body) in the generated operations file. This patch is fragile and will be overwritten on the next TypeSpec regeneration.

Both getMosaicsWmtsCapabilities and getWmtsCapabilities deserializers require this patch.

Prior fix attempt in the API spec

A previous attempt was made to fix this at the spec level (in azure-rest-api-specs), but it was reverted on Jan 26, 2025 with the message "Revert 'Fix WMTS Capabilities Response Type for JavaScript SDK'". This suggests the fix should be in the emitter, not the spec.

Comparison with http-client-js emitter

The newer http-client-js emitter (in the typespec repo) handles this case correctly by setting encoding to "none" for non-JSON content types:

// From typespec repo: packages/http-client-js/src/common/serialization/encode.ts
if (isNonJsonTextualFormat(contentType)) {
  return "none"; // No encoding needed for text content types
}

This suggests the fix for @azure-tools/typespec-ts would be similar: when the content type is a text-based format (XML, plain text, etc.) and the body type is bytes, the deserializer should use TextEncoder instead of base64 decoding.

Suggested fix

In deserializeResponseValue, the bytes case should account for the response content type. When the content type is a text-based format (XML, text, etc.), the body string should be encoded as UTF-8 bytes rather than base64-decoded. For example:

case "bytes":
  if (format === "binary" || format === "bytes") {
    return restValue;
  }
  if (format === "text" || format === "xml") {
    // Text-based content: encode the string as UTF-8 bytes
    return `${nullOrUndefinedPrefix}typeof ${restValue} === 'string'
      ? new TextEncoder().encode(${restValue})
      : ${restValue}`;
  }
  // Default: base64 decode (for JSON-embedded bytes, etc.)
  return `${nullOrUndefinedPrefix}typeof ${restValue} === 'string'
    ? ${stringToUint8ArrayReference}(${restValue}, "${format ?? "base64"}")
    : ${restValue}`;

Alternatively, the isBinaryPayload check upstream could be extended to also recognize XML/text content types as requiring special handling when paired with bytes body type.

Metadata

Metadata

Assignees

Labels

needs-author-feedbackWorkflow: More information is needed from author to address the issue.no-recent-activityThere has been no recent activity on this issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions