Skip to content

Conversation

@YoelDruxman
Copy link

@YoelDruxman YoelDruxman commented Jan 14, 2026

Respect .editorconfig end_of_line and charset in generated code

Description

Generated mapper files (.g.cs) now respect .editorconfig settings for line endings and character encoding.

Problem

Generated code always used CRLF (\r\n) line endings and UTF-8 without BOM, regardless of the project's .editorconfig configuration. This caused issues for projects configured with different line endings.

Solution

Read end_of_line and charset settings from the source file's .editorconfig via AnalyzerConfigOptionsProvider and apply them to the generated output.

Supported settings:

  • end_of_line: lf, crlf, cr (defaults to crlf)
  • charset: utf-8, utf-8-bom, utf-16be, utf-16le (defaults to utf-8 without BOM)

Known Limitation

Due to a Roslyn limitation, editorconfig settings are read from the source file containing the mapper declaration, not from generated .g.cs files. This means [*.g.cs] patterns in .editorconfig will not be applied. Users should configure settings in [*.cs] or [*] sections.

Files changed:

  • LineEndingTextWriter.cs - Streaming line ending replacement
  • IncrementalValuesProviderExtensions.cs - Read editorconfig settings and apply encoding
  • MapperNode.cs - Store EndOfLine and Charset properties
  • MapperGenerator.cs - Pass settings from source file to output

Addresses this discussion

Checklist

  • The existing code style is followed
  • The commit message follows our guidelines
  • Performed a self-review of my code
  • Hard-to-understand areas of my code are commented
  • The documentation is updated (as applicable)
  • Unit tests are added/updated
  • Integration tests are added/updated (as applicable)

@YoelDruxman YoelDruxman force-pushed the feat/editorconfig-settings branch 6 times, most recently from 4199944 to 3194a3e Compare January 14, 2026 15:30
@latonz
Copy link
Contributor

latonz commented Jan 14, 2026

Thank you for this contribution. Before I start the review, I’d like to clarify a few open questions:

  • How do other source generators handle this? It seems like a challenge all source generators face.
  • What is the performance impact?
  • Would it make more sense to emit the correct line feeds directly in the syntax emitters instead of replacing them afterward? This would probably reduce the performance impact.

@YoelDruxman
Copy link
Author

YoelDruxman commented Jan 14, 2026

Thank you @latonz! I am still working on this. Let me consider your feedback and update the PR or at least report back.

@YoelDruxman YoelDruxman changed the title Respect .editorconfig end_of_line and charset in generated code Respect .editorconfig end_of_line and charset in generated code Jan 15, 2026
@YoelDruxman
Copy link
Author

@latonz I think this is ready for your review. Your thoughtful questions led to changing the implementation for the better:

  1. It seems that this is an unsolved problem.
  2. The performance has been improved to what I believe is almost transparent.
  3. This seems like more effort and difficulty than it is worth.

Here is a report I created with Claude that delves a little deeper.

Please let me know what you think. Any feedback is welcome! Thank you for this wonderful library!

YoelDruxman and others added 6 commits January 19, 2026 19:34
Generated mapper files (.g.cs) now honor .editorconfig settings:
- end_of_line: supports lf, crlf, cr (defaults to crlf)
- charset: supports utf-8, utf-8-bom, utf-16be, utf-16le (defaults to utf-8 without BOM)

This allows generated code to match the project's line ending and encoding
preferences, avoiding git diffs and post-processing requirements.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use the mapper source file path (.cs) instead of trying to create a
synthetic tree with the generated file path (.g.cs). This provides
reliable editorconfig support with a documented limitation.

Note: [*.g.cs] patterns are not supported due to a Roslyn limitation.
Users should configure settings in [*.cs] or [*] sections.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of GetText().ToString().Replace() which creates 2 string
allocations for non-CRLF line endings, use SyntaxNode.WriteTo()
with a custom LineEndingTextWriter that replaces CRLF on-the-fly.

This reduces allocations from 2 to 1 for LF/CR configurations while
maintaining the fast path (no replacement) for CRLF (default).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Clarifies that settings are read from the source file containing
the mapper declaration, not from generated files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@YoelDruxman YoelDruxman force-pushed the feat/editorconfig-settings branch from 72bed81 to 76d8d14 Compare January 20, 2026 00:34
@YoelDruxman YoelDruxman marked this pull request as ready for review January 20, 2026 00:34
// Respect .editorconfig charset setting
// Default to UTF-8 without BOM (most common for source files)
var encoding = _utf8NoBom;
if (!string.IsNullOrEmpty(mapper.Charset))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if even needed? If the charset is null/empty/unknown the default arm of the switch is used anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 75f1755. Removed the unnecessary if check - the switch default arm handles null/empty/unknown values.

);
}

private static string GetSourceText(Microsoft.CodeAnalysis.CSharp.Syntax.CompilationUnitSyntax body, string? endOfLine)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is there a reason for using a FQN here instead of a using?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 58af35e. Replaced FQN with using directive for CompilationUnitSyntax.

Comment on lines 103 to 106
var text = GetSourceText(mapper.Body, mapper.EndOfLine);

// Always use SourceText.From to ensure BOM is included when specified
spc.AddSource(mapper.FileName, SourceText.From(text, encoding));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a performance penalty even when the feature is not used. Could you measure the performance impact compared to main using Riok.Mapperly.Benchmarks, both with and without custom EOL or encoding settings?

Copy link
Author

@YoelDruxman YoelDruxman Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code to have the fast-track as requested, when measuring performance, there did not seem to be any impact in any cases (although I am not 100% confident).

/// A TextWriter that replaces CRLF (\r\n) with a target line ending on-the-fly.
/// This avoids intermediate string allocations when converting line endings.
/// </summary>
internal sealed class LineEndingTextWriter : TextWriter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests to test this class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d8fe736. Added comprehensive unit tests for LineEndingTextWriter covering CRLF conversion, edge cases, and buffer boundary handling. Made the class public to allow direct testing, following the pattern of other tested helper classes.

preamble.Length.ShouldBe(expectBom ? 3 : 0, $"charset={charsetSetting ?? "(default)"} should {(expectBom ? "" : "not ")}have BOM");
}

private static string GenerateSource(string? endOfLine = null, string? charset = null) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the remaining code in this file should probably be integrated into or merged with TestHelper.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 40b791d. Integrated editorconfig test infrastructure with TestHelper. Added SourceTextConfig parameter to TestHelperOptions and moved TestAnalyzerConfigOptionsProvider to a shared location.

@latonz latonz added the enhancement New feature or request label Jan 20, 2026
YoelDruxman and others added 8 commits January 20, 2026 14:25
Address PR review comment riok#2: Remove redundant comments that don't
add value to the self-explanatory switch expression.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#3: Remove redundant null/empty check
since the switch expression's default arm handles these cases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#6: Add using directive for
Microsoft.CodeAnalysis.CSharp.Syntax and use the simple type name.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ions

Address PR review comment riok#9: Extract EndOfLine and Charset properties
from MapperNode into a separate SourceTextConfig record for better
organization of source text formatting configuration.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#10: Move editorconfig testing infrastructure
from EndOfLineTest into shared TestHelper utilities.

- Add EditorConfigEndOfLine and EditorConfigCharset to TestHelperOptions
- Create TestAnalyzerConfigOptionsProvider for shared editorconfig testing
- Add TestHelper.GenerateSourceText() for encoding/line ending tests
- Simplify EndOfLineTest to use shared infrastructure

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#8: Add comprehensive unit tests for the
LineEndingTextWriter class to verify CRLF to LF/CR conversion behavior.

Made LineEndingTextWriter public to allow direct testing, following the
pattern of other tested helper classes in the project.

Tests cover:
- CRLF to LF conversion
- CRLF to CR conversion
- Text with no line endings
- Standalone CR and LF preservation
- Mixed line endings
- Empty and null strings
- Consecutive CRLF sequences
- Trailing CR at end of text
- Char array writes
- Large text handling
- CRLF split across writes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comments riok#4 and riok#5: Add RMG095 and RMG096 diagnostics
to warn users when unknown charset or end_of_line values are specified
in .editorconfig.

- RMG095: Unknown charset value (valid: utf-8, utf-8-bom, utf-16be, utf-16le)
- RMG096: Unknown end_of_line value (valid: lf, crlf, cr)

Validation is performed during mapper generation and diagnostics are
reported at the mapper declaration location.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#1: Add a reference link in the EditorConfig
support section pointing to the analyzer diagnostics page for
configuring diagnostic severities via .editorconfig.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@latonz latonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates. I am deferring my full review until the performance benchmarks are done, as those metrics may necessitate adjustments to the current design (by using the correct newline when building the syntaxes).


### New Rules

Rule ID | Category | Severity | Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont use the unshipped file, we track added/removed diagnostics via git. Please move it to the shipped file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 902cfb5


// Validate editorconfig settings and add diagnostics for unknown values
var location = mapperDeclaration.Syntax.GetLocation();
var allDiagnostics = diagnostics.ToList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the ToList? Couldn't you report it directly via DiagnosticCollection?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 11a2002

}
}

private static readonly HashSet<string> _validCharsets = new(StringComparer.OrdinalIgnoreCase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codebase keeps the fields on the top of the class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 51ed304

Comment on lines 145 to 151
"utf-8",
"utf-8-bom",
"utf-16be",
"utf-16le",
};

private static readonly HashSet<string> _validEndOfLineValues = new(StringComparer.OrdinalIgnoreCase) { "lf", "crlf", "cr" };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have these constants at multiple locations... Wouldn't it make sense to map to the actual value here now? Move the switch here, the default arm reports the diagnostic, SourceTextConfig.EndOfLine is \n/\r\n/... and SourceTextConfig.Charset would get SourceTextConfig.Encoding.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 11a2002

);
}

// UTF-8 encoding without BOM (default for 'charset = utf-8' in .editorconfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codebase keeps fields at the top of a class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in e4c4746

Comment on lines +21 to +22
string? EditorConfigEndOfLine = null,
string? EditorConfigCharset = null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense using the SourceTextConfig here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test options use raw editorconfig strings ("lf", "utf-8", etc.) while SourceTextConfig holds actual parsed values ("\n", Encoding.UTF8, etc.).

The current approach feeds raw strings to TestAnalyzerConfigOptionsProvider which simulates the editorconfig file, then the generator parses them into SourceTextConfig. This tests the full editorconfig parsing path.

If we use SourceTextConfig directly, we'd either need to convert actual values back to raw strings (seems backwards), or bypass the editorconfig parsing layer entirely and inject SourceTextConfig directly into the generator (significant infrastructure change).

Could you clarify what you had in mind?

// Respect .editorconfig end_of_line setting
// The syntax tree uses CRLF by default (ElasticCarriageReturnLineFeed)
// For non-CRLF, use streaming replacement to avoid intermediate string allocation
var text = GetSourceText(mapper.Body, mapper.SourceTextConfig.EndOfLine);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should add a fast path using the GetText directly if no option or only the encoding is set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Done in 11a2002 - fast path now uses body.GetText(config.Encoding) directly when no line ending conversion is needed (EndOfLine is null or "\r\n").

Move new editorconfig diagnostics from Unshipped.md to Shipped.md
as this project tracks added/removed diagnostics via git history.
YoelDruxman and others added 3 commits January 21, 2026 09:08
Follow codebase convention of keeping fields at the top of the class.
Follow codebase convention of keeping fields at the top of the class.
…lues

SourceTextConfig now holds actual values (Encoding, line ending strings)
instead of raw editorconfig strings. Parsing and validation moved to
MapperGenerator.BuildSourceTextConfig. Fast path uses GetText directly
when no line ending conversion is needed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@YoelDruxman
Copy link
Author

YoelDruxman commented Jan 22, 2026

@latonz Thank you for the review! I think I addressed all your comments. Please let me know if there is more to do. As I replied above, I was not able to detect any meaningful performance difference.

@YoelDruxman YoelDruxman force-pushed the feat/editorconfig-settings branch from ccbc82b to 11a2002 Compare January 22, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants