-
-
Notifications
You must be signed in to change notification settings - Fork 204
Respect .editorconfig end_of_line and charset in generated code
#2114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4199944 to
3194a3e
Compare
|
Thank you for this contribution. Before I start the review, I’d like to clarify a few open questions:
|
|
Thank you @latonz! I am still working on this. Let me consider your feedback and update the PR or at least report back. |
end_of_line and charset in generated code
|
@latonz I think this is ready for your review. Your thoughtful questions led to changing the implementation for the better:
Here is a report I created with Claude that delves a little deeper. Please let me know what you think. Any feedback is welcome! Thank you for this wonderful library! |
Generated mapper files (.g.cs) now honor .editorconfig settings: - end_of_line: supports lf, crlf, cr (defaults to crlf) - charset: supports utf-8, utf-8-bom, utf-16be, utf-16le (defaults to utf-8 without BOM) This allows generated code to match the project's line ending and encoding preferences, avoiding git diffs and post-processing requirements. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use the mapper source file path (.cs) instead of trying to create a synthetic tree with the generated file path (.g.cs). This provides reliable editorconfig support with a documented limitation. Note: [*.g.cs] patterns are not supported due to a Roslyn limitation. Users should configure settings in [*.cs] or [*] sections. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Instead of GetText().ToString().Replace() which creates 2 string allocations for non-CRLF line endings, use SyntaxNode.WriteTo() with a custom LineEndingTextWriter that replaces CRLF on-the-fly. This reduces allocations from 2 to 1 for LF/CR configurations while maintaining the fast path (no replacement) for CRLF (default). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Clarifies that settings are read from the source file containing the mapper declaration, not from generated files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
72bed81 to
76d8d14
Compare
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
| // Respect .editorconfig charset setting | ||
| // Default to UTF-8 without BOM (most common for source files) | ||
| var encoding = _utf8NoBom; | ||
| if (!string.IsNullOrEmpty(mapper.Charset)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this if even needed? If the charset is null/empty/unknown the default arm of the switch is used anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 75f1755. Removed the unnecessary if check - the switch default arm handles null/empty/unknown values.
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
src/Riok.Mapperly/Helpers/IncrementalValuesProviderExtensions.cs
Outdated
Show resolved
Hide resolved
| ); | ||
| } | ||
|
|
||
| private static string GetSourceText(Microsoft.CodeAnalysis.CSharp.Syntax.CompilationUnitSyntax body, string? endOfLine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is there a reason for using a FQN here instead of a using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 58af35e. Replaced FQN with using directive for CompilationUnitSyntax.
| var text = GetSourceText(mapper.Body, mapper.EndOfLine); | ||
|
|
||
| // Always use SourceText.From to ensure BOM is included when specified | ||
| spc.AddSource(mapper.FileName, SourceText.From(text, encoding)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a performance penalty even when the feature is not used. Could you measure the performance impact compared to main using Riok.Mapperly.Benchmarks, both with and without custom EOL or encoding settings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the code to have the fast-track as requested, when measuring performance, there did not seem to be any impact in any cases (although I am not 100% confident).
| /// A TextWriter that replaces CRLF (\r\n) with a target line ending on-the-fly. | ||
| /// This avoids intermediate string allocations when converting line endings. | ||
| /// </summary> | ||
| internal sealed class LineEndingTextWriter : TextWriter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit tests to test this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d8fe736. Added comprehensive unit tests for LineEndingTextWriter covering CRLF conversion, edge cases, and buffer boundary handling. Made the class public to allow direct testing, following the pattern of other tested helper classes.
| preamble.Length.ShouldBe(expectBom ? 3 : 0, $"charset={charsetSetting ?? "(default)"} should {(expectBom ? "" : "not ")}have BOM"); | ||
| } | ||
|
|
||
| private static string GenerateSource(string? endOfLine = null, string? charset = null) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the remaining code in this file should probably be integrated into or merged with TestHelper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 40b791d. Integrated editorconfig test infrastructure with TestHelper. Added SourceTextConfig parameter to TestHelperOptions and moved TestAnalyzerConfigOptionsProvider to a shared location.
Address PR review comment riok#2: Remove redundant comments that don't add value to the self-explanatory switch expression. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#3: Remove redundant null/empty check since the switch expression's default arm handles these cases. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#6: Add using directive for Microsoft.CodeAnalysis.CSharp.Syntax and use the simple type name. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ions Address PR review comment riok#9: Extract EndOfLine and Charset properties from MapperNode into a separate SourceTextConfig record for better organization of source text formatting configuration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#10: Move editorconfig testing infrastructure from EndOfLineTest into shared TestHelper utilities. - Add EditorConfigEndOfLine and EditorConfigCharset to TestHelperOptions - Create TestAnalyzerConfigOptionsProvider for shared editorconfig testing - Add TestHelper.GenerateSourceText() for encoding/line ending tests - Simplify EndOfLineTest to use shared infrastructure Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#8: Add comprehensive unit tests for the LineEndingTextWriter class to verify CRLF to LF/CR conversion behavior. Made LineEndingTextWriter public to allow direct testing, following the pattern of other tested helper classes in the project. Tests cover: - CRLF to LF conversion - CRLF to CR conversion - Text with no line endings - Standalone CR and LF preservation - Mixed line endings - Empty and null strings - Consecutive CRLF sequences - Trailing CR at end of text - Char array writes - Large text handling - CRLF split across writes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comments riok#4 and riok#5: Add RMG095 and RMG096 diagnostics to warn users when unknown charset or end_of_line values are specified in .editorconfig. - RMG095: Unknown charset value (valid: utf-8, utf-8-bom, utf-16be, utf-16le) - RMG096: Unknown end_of_line value (valid: lf, crlf, cr) Validation is performed during mapper generation and diagnostics are reported at the mapper declaration location. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Address PR review comment riok#1: Add a reference link in the EditorConfig support section pointing to the analyzer diagnostics page for configuring diagnostic severities via .editorconfig. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
latonz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates. I am deferring my full review until the performance benchmarks are done, as those metrics may necessitate adjustments to the current design (by using the correct newline when building the syntaxes).
|
|
||
| ### New Rules | ||
|
|
||
| Rule ID | Category | Severity | Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We dont use the unshipped file, we track added/removed diagnostics via git. Please move it to the shipped file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 902cfb5
src/Riok.Mapperly/MapperGenerator.cs
Outdated
|
|
||
| // Validate editorconfig settings and add diagnostics for unknown values | ||
| var location = mapperDeclaration.Syntax.GetLocation(); | ||
| var allDiagnostics = diagnostics.ToList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the ToList? Couldn't you report it directly via DiagnosticCollection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 11a2002
src/Riok.Mapperly/MapperGenerator.cs
Outdated
| } | ||
| } | ||
|
|
||
| private static readonly HashSet<string> _validCharsets = new(StringComparer.OrdinalIgnoreCase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This codebase keeps the fields on the top of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 51ed304
src/Riok.Mapperly/MapperGenerator.cs
Outdated
| "utf-8", | ||
| "utf-8-bom", | ||
| "utf-16be", | ||
| "utf-16le", | ||
| }; | ||
|
|
||
| private static readonly HashSet<string> _validEndOfLineValues = new(StringComparer.OrdinalIgnoreCase) { "lf", "crlf", "cr" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have these constants at multiple locations... Wouldn't it make sense to map to the actual value here now? Move the switch here, the default arm reports the diagnostic, SourceTextConfig.EndOfLine is \n/\r\n/... and SourceTextConfig.Charset would get SourceTextConfig.Encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 11a2002
| ); | ||
| } | ||
|
|
||
| // UTF-8 encoding without BOM (default for 'charset = utf-8' in .editorconfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This codebase keeps fields at the top of a class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in e4c4746
| string? EditorConfigEndOfLine = null, | ||
| string? EditorConfigCharset = null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense using the SourceTextConfig here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test options use raw editorconfig strings ("lf", "utf-8", etc.) while SourceTextConfig holds actual parsed values ("\n", Encoding.UTF8, etc.).
The current approach feeds raw strings to TestAnalyzerConfigOptionsProvider which simulates the editorconfig file, then the generator parses them into SourceTextConfig. This tests the full editorconfig parsing path.
If we use SourceTextConfig directly, we'd either need to convert actual values back to raw strings (seems backwards), or bypass the editorconfig parsing layer entirely and inject SourceTextConfig directly into the generator (significant infrastructure change).
Could you clarify what you had in mind?
| // Respect .editorconfig end_of_line setting | ||
| // The syntax tree uses CRLF by default (ElasticCarriageReturnLineFeed) | ||
| // For non-CRLF, use streaming replacement to avoid intermediate string allocation | ||
| var text = GetSourceText(mapper.Body, mapper.SourceTextConfig.EndOfLine); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should add a fast path using the GetText directly if no option or only the encoding is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! Done in 11a2002 - fast path now uses body.GetText(config.Encoding) directly when no line ending conversion is needed (EndOfLine is null or "\r\n").
Move new editorconfig diagnostics from Unshipped.md to Shipped.md as this project tracks added/removed diagnostics via git history.
Follow codebase convention of keeping fields at the top of the class.
Follow codebase convention of keeping fields at the top of the class.
…lues SourceTextConfig now holds actual values (Encoding, line ending strings) instead of raw editorconfig strings. Parsing and validation moved to MapperGenerator.BuildSourceTextConfig. Fast path uses GetText directly when no line ending conversion is needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@latonz Thank you for the review! I think I addressed all your comments. Please let me know if there is more to do. As I replied above, I was not able to detect any meaningful performance difference. |
ccbc82b to
11a2002
Compare
Respect .editorconfig
end_of_lineandcharsetin generated codeDescription
Generated mapper files (
.g.cs) now respect.editorconfigsettings for line endings and character encoding.Problem
Generated code always used CRLF (
\r\n) line endings and UTF-8 without BOM, regardless of the project's.editorconfigconfiguration. This caused issues for projects configured with different line endings.Solution
Read
end_of_lineandcharsetsettings from the source file's.editorconfigviaAnalyzerConfigOptionsProviderand apply them to the generated output.Supported settings:
end_of_line:lf,crlf,cr(defaults tocrlf)charset:utf-8,utf-8-bom,utf-16be,utf-16le(defaults toutf-8without BOM)Known Limitation
Due to a Roslyn limitation, editorconfig settings are read from the source file containing the mapper declaration, not from generated
.g.csfiles. This means[*.g.cs]patterns in.editorconfigwill not be applied. Users should configure settings in[*.cs]or[*]sections.Files changed:
LineEndingTextWriter.cs- Streaming line ending replacementIncrementalValuesProviderExtensions.cs- Read editorconfig settings and apply encodingMapperNode.cs- StoreEndOfLineandCharsetpropertiesMapperGenerator.cs- Pass settings from source file to outputAddresses this discussion
Checklist