-
Notifications
You must be signed in to change notification settings - Fork 1
init read pdf #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init read pdf #14
Conversation
WalkthroughAdds a new Frends.PDF.Read .NET 8 task that reads PDFs: extracts metadata, per‑page text and optional images; includes DTOs, error handling, NUnit tests, solution/project files, CI workflows, documentation, changelog, migration manifest, and Apache‑2.0 licensing. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Caller
participant Task as Frends.PDF.Read.PDF.Read
participant PdfLib as PdfPig
participant Builder as DTO Builder
participant Serializer as JSON Serializer
participant Handler as ErrorHandler
Client->>Task: Read(Input, Options, CancellationToken)
Task->>PdfLib: Open document / get metadata & pages
PdfLib-->>Task: Document, metadata, pages
Task->>Builder: Extract text for each page
alt Options.IncludeImages == true
Task->>PdfLib: Extract images (PNG preferred, RAW fallback)
PdfLib-->>Task: Image bytes, format, dimensions
Task->>Builder: Attach images to page DTOs
end
Task->>Builder: Build PdfReadDocument (Metadata + Pages)
Builder-->>Task: DTO
Task->>Serializer: Serialize DTO (indented JSON)
Serializer-->>Task: JSON
Task-->>Client: Result(success=true, json=...)
opt exception
Task->>Handler: Handle(exception, ThrowErrorOnFailure, ErrorMessageOnFailure)
Handler-->>Task: throws OR returns Result(false, null, Error)
Task-->>Client: Result(false, null, Error) OR exception propagated
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🤖 Fix all issues with AI agents
In `@Frends.PDF.Read/Frends.PDF.Read.Tests/Frends.PDF.Read.Tests.csproj`:
- Around line 12-14: Update the PackageReference for StyleCop.Analyzers (the
element with Include="StyleCop.Analyzers") so it has the same asset restrictions
as the main project: add PrivateAssets="all" and an explicit IncludeAssets
attribute (e.g. include runtime; build; native; contentfiles; analyzers) to
prevent the analyzer package from flowing to runtime dependencies; modify the
PackageReference element in Frends.PDF.Read.Tests.csproj that references
StyleCop.Analyzers to include these attributes.
In `@Frends.PDF.Read/Frends.PDF.Read.Tests/GlobalSuppressions.cs`:
- Line 3: The assembly suppression attribute in GlobalSuppressions.cs uses an
incorrect double-colon in the rule identifier; update the assembly attribute
[assembly: SuppressMessage("StyleCop.CSharp.SpecialRules",
"SA0001::XmlCommentAnalysisDisabled", Justification = "...")] to use a single
colon so the identifier reads "SA0001:XmlCommentAnalysisDisabled" to match the
other suppressions and ensure the rule is properly recognized.
In `@Frends.PDF.Read/Frends.PDF.Read.Tests/UnitTests.cs`:
- Around line 68-85: The test ReadPdf_IncludeImages is missing a success check
before deserializing the JSON result from PDF.Read; add an assertion like
Assert.That(result.Success, Is.True) immediately after obtaining result (before
calling JsonConvert.DeserializeObject and using PdfReadDocument/ result.Json) to
ensure the operation succeeded prior to further assertions.
- Around line 49-66: Add an assertion that the PDF.Read call succeeded before
deserializing: in the ReadPdf_SinglePagePdf_ReturnsOnePageWithMetadata test,
after calling PDF.Read and storing result, assert result.Success is true (and
optionally assert result.Json is not null/empty) before calling
JsonConvert.DeserializeObject on result.Json so deserialization and subsequent
assertions won’t produce misleading null-reference failures.
In `@Frends.PDF.Read/Frends.PDF.Read/Definitions/Result.cs`:
- Line 14: The XML doc for the parameter "json" is incorrect; update the XML
<param name="json"> on the Result class/constructor in Result.cs to accurately
describe that it contains the JSON content produced by the PDF extraction (e.g.,
"The JSON content returned from the PDF extraction"), replacing the current "The
path to the signed file" copy-paste text.
- Line 1: Remove the unused using directive `using static
System.Runtime.InteropServices.JavaScript.JSType;` from the Result.cs file;
locate the top of the file where `using` statements are declared (in Result.cs)
and delete that line so only necessary namespaces remain, then rebuild to
confirm no references to JSType exist.
In `@Frends.PDF.Read/Frends.PDF.Read/DTOs/PdfReadImage.cs`:
- Around line 22-26: The XML summary for the Width property in class
PdfReadImage has an extra leading space; open the PdfReadImage class and edit
the XML comment above the Width property (the "Width" auto-property) to remove
the leading space so the <summary> text begins immediately with "Width of image
in points." ensuring consistent XML doc formatting.
In `@Frends.PDF.Read/Frends.PDF.Read/Frends.PDF.Read.cs`:
- Line 23: Replace the placeholder XML doc for the parameter named "options"
with a clear description of what the parameter represents and how it affects
behavior: state that it is the options/configuration object passed into the
Frends.PDF.Read method (or constructor) that controls things like input source,
parsing settings, and any optional flags/defaults; mention expected properties
or types and whether null/omitted is allowed and what the default behavior is.
Update the <param name="options"> tag text accordingly so the documentation
explains the purpose and expected contents of the options parameter.
In `@Frends.PDF.Read/Frends.PDF.Read/LICENSE`:
- Around line 1-255: The package currently declares MIT but depends on PdfPig
v0.1.13 (Apache-2.0) and the Frends.PDF.Read/LICENSE file (containing Apache
text and third‑party notices) is not being packed; update the .csproj (project
file) to either set PackageLicenseExpression to "Apache-2.0" (or add clear
license justification if you intend to remain MIT) and explicitly include the
LICENSE (and any NOTICE) in the package by adding a Pack Include for
Frends.PDF.Read/LICENSE (e.g., via a None/Content Include with Pack="true" and
appropriate PackagePath) so the third‑party attribution required by Apache 2.0
§4(d) is distributed.
In `@Frends.PDF.Read/LICENSE`:
- Around line 1-21: The nested Frends.PDF.Read/Frends.PDF.Read/LICENSE contains
Apache-2.0 terms for an external PdfPig dependency but the package declares MIT;
rename or move that nested file (e.g., to
Frends.PDF.Read/Frends.PDF.Read/DEPENDENCIES.md or LICENSE.EXTERNAL) and update
its header to state it documents third-party licenses (Apache-2.0 for PdfPig),
then consolidate or clarify the top-level Frends.PDF.Read/LICENSE to state the
project is MIT while pointing to the new DEPENDENCIES.md for external
components; also update any packaging/metadata in the .csproj that references
the old nested LICENSE so the package license remains MIT and the external
license file is included as documentation rather than the package license.
In `@Frends.PDF.Read/README.md`:
- Line 5: The README badge links to a workflow file named
Read_build_and_test_on_main.yml which may not match the actual GitHub Actions
workflow name; verify the real workflow filename in .github/workflows and update
the badge URL target (the href and the image path) to match the actual workflow
filename (e.g., Replace Read_build_and_test_on_main.yml with the correct
workflow filename) so the badge points to the existing workflow file.
🧹 Nitpick comments (7)
Frends.PDF.Read/Frends.PDF.Read/migration.json (1)
6-8: Add a brief migration description.
An empty description makes the manifest harder to interpret during audits and tooling output.♻️ Suggested update
- "Description": "", + "Description": "Initial migration for Frends.PDF.Read task.",Frends.PDF.Read/README.md (1)
13-33: Add explicit developer setup prerequisites (SDK/tooling).The guidelines require developer setup instructions. Please add a short “Prerequisites” section (e.g., .NET 8 SDK, any required workloads/tools).
As per coding guidelines.
Frends.PDF.Read/CHANGELOG.md (1)
5-7: Consider more descriptive changelog entries.Per the coding guidelines, the changelog should include all functional changes. "Initial implementation" is vague and doesn't help users understand what features are available in version 1.0.0.
Consider expanding to describe the actual functionality:
### Added -- Initial implementation +- Initial implementation of PDF.Read task +- Extract document metadata (Title, Author, PageCount) +- Extract page text content +- Optional image extraction with format detection (PNG/RAW)Based on coding guidelines for
CHANGELOG.mdfiles.Frends.PDF.Read/Frends.PDF.Read/Definitions/Error.cs (1)
16-20: Fix misleading XML example for AdditionalInfo.The example shows
object { Exception AdditionalInfo }which doesn't represent an actual Exception value. Consider a more accurate example.Suggested fix
/// <summary> /// Additional information about the error. /// </summary> - /// <example>object { Exception AdditionalInfo }</example> + /// <example>System.IO.FileNotFoundException</example> public Exception AdditionalInfo { get; set; }Frends.PDF.Read/Frends.PDF.Read/Frends.PDF.Read.csproj (1)
26-26: Consider pinning analyzer version for reproducible builds.Using
Version="1.*"forFrendsTaskAnalyzersmay cause non-deterministic builds. Consider pinning to a specific version for consistency across environments.Frends.PDF.Read/Frends.PDF.Read.Tests/ErrorHandlerTest.cs (2)
13-19: Consider removing redundant assertion.
Assert.Throws<Exception>will fail the test if no exception is thrown or return the caught exception. The subsequentAssert.That(ex, Is.Not.Null)is redundant sinceAssert.Throwsguarantees a non-null return when it succeeds.Proposed simplification
[Test] public void Should_Throw_Error_When_ThrowErrorOnFailure_Is_True() { - var ex = Assert.Throws<Exception>(() => + Assert.Throws<Exception>(() => PDF.Read(DefaultInput(), DefaultOptions(), CancellationToken.None)); - Assert.That(ex, Is.Not.Null); }
30-39: Consider removing redundant null check.Similar to the first test,
Assert.That(ex, Is.Not.Null)on line 37 is redundant. The important assertion on line 38 validating the custom error message is the valuable one.Proposed simplification
[Test] public void Should_Use_Custom_ErrorMessageOnFailure() { var options = DefaultOptions(); options.ErrorMessageOnFailure = CustomErrorMessage; var ex = Assert.Throws<Exception>(() => PDF.Read(DefaultInput(), options, CancellationToken.None)); - Assert.That(ex, Is.Not.Null); Assert.That(ex.Message, Contains.Substring(CustomErrorMessage)); }
|
add new task info to main README in repo root |
| @@ -0,0 +1,255 @@ | |||
| Apache License | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to include this into package (same as we include e.g. migration.json) You can also specify why this License is included (which package requires it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@Frends.PDF.Read/Frends.PDF.Read.Tests/GlobalSuppressions.cs`:
- Around line 1-8: Add a coverage threshold and/or align the test framework:
either configure coverlet (add coverlet.msbuild package and set a minimum
coverage threshold of 80% via a project/Directory.Build.props entry like
Threshold=80) so builds fail when coverage for tests (ErrorHandlerTest.cs,
UnitTests.cs) drops below 80%, or update the repository guidance to accept NUnit
(tests use [TestFixture]/[Test]) and ensure CI/test runners use NUnit rather
than MSTest; pick one approach and make the corresponding change (add coverlet
threshold configuration or update docs/CI to reflect NUnit) and ensure test
project references and CI tasks are updated accordingly.
In `@Frends.PDF.Read/Frends.PDF.Read/Frends.PDF.Read.cs`:
- Around line 26-33: Add a null-guard for options (e.g. if (options == null)
throw new ArgumentNullException(nameof(options));) before the try in the Read
method so the catch/ErrorHandler path cannot NRE on options, but keep any input
validation inside the try block so ErrorHandler still applies; apply the same
pattern to the other method(s) that perform error handling (the other occurrence
around lines 90-93) to prevent dereferencing options/input in the catch.
In `@README.md`:
- Around line 7-9: The README contains a case-mismatch in the links: update the
link label/path that currently points to "Frends.Pdf.Read/README.md" to match
the actual folder casing "Frends.PDF.Read/README.md" so the link resolves on
case-sensitive filesystems; ensure the other entries use exact folder names
(e.g., "Frends.PDF.Create", "Frends.PDF.MergeDocuments", "Frends.PDF.Read") to
avoid broken links.
🧹 Nitpick comments (2)
Frends.PDF.Read/Frends.PDF.Read/DTOs/PdfReadImage.cs (1)
14-20: Consider clarifying the XML example forBytes.The example says "base64-encoded byte array" but this property holds raw
byte[]data, not a base64 string. The base64 encoding only occurs during JSON serialization. This could confuse users about what to expect from this property directly.Suggested fix
/// <summary> /// Raw image data as byte array. /// </summary> - /// <example> - /// "base64-encoded byte array" - /// </example> + /// <example>[0x89, 0x50, 0x4E, 0x47, ...]</example> public byte[] Bytes { get; set; }Frends.PDF.Read/Frends.PDF.Read/Frends.PDF.Read.cs (1)
13-17: RenameLine 16 uses
XmlWriter,HtmlTag). This aligns with the coding guidelines for proper abbreviation naming (Csv, Url, Api). Renaming now ensures consistency with .NET conventions and avoids long-term API inconsistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@Frends.PDF.Read/README.md`:
- Line 7: The README badge currently shows MIT; update the Markdown license
badge line to Apache-2.0 by replacing the badge image and link (the existing
"[](...)" markup) with an Apache‑2.0 badge and link to the
Apache‑2.0 license (e.g. using the Shields.io Apache-2.0 badge and linking to
https://opensource.org/licenses/Apache-2.0) so the visible badge and hyperlink
match the repository's Apache-2.0 license.
🧹 Nitpick comments (1)
Frends.PDF.Read/README.md (1)
42-42: Avoid bare URL in Markdown.
Wrap the PdfPig project URL in a Markdown link to satisfy MD034 and improve readability.
|
|
||
| [](https://github.com/FrendsPlatform/Frends.PDF/actions/workflows/Read_test_on_main.yml) | ||
|  | ||
| [](https://opensource.org/licenses/MIT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
License badge appears inconsistent with repo license.
The badge shows MIT, but the PR summary indicates Apache‑2.0 licensing. This is misleading for users and compliance. Please update the badge to reflect the actual license.
🤖 Prompt for AI Agents
In `@Frends.PDF.Read/README.md` at line 7, The README badge currently shows MIT;
update the Markdown license badge line to Apache-2.0 by replacing the badge
image and link (the existing "[](...)" markup) with an
Apache‑2.0 badge and link to the Apache‑2.0 license (e.g. using the Shields.io
Apache-2.0 badge and linking to https://opensource.org/licenses/Apache-2.0) so
the visible badge and hyperlink match the repository's Apache-2.0 license.
Please review my changes :)
Summary by CodeRabbit
New Features
Tests
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.