Skip to content

Conversation

@MichalFrends1
Copy link
Contributor

@MichalFrends1 MichalFrends1 commented Jan 21, 2026

Please review my changes :)

Summary by CodeRabbit

  • New Features

    • Added a task that extracts PDF document metadata, per-page text and optional images; returns structured JSON results.
  • Tests

    • Added unit tests covering content extraction and error-handling behavior.
  • Documentation

    • Added README, CHANGELOG and license information.
  • Chores

    • Added CI workflows, solution/project scaffolding, packaging and migration manifests, code-analysis suppressions and test project configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 21, 2026

Walkthrough

Adds a new Frends.PDF.Read .NET 8 task that reads PDFs: extracts metadata, per‑page text and optional images; includes DTOs, error handling, NUnit tests, solution/project files, CI workflows, documentation, changelog, migration manifest, and Apache‑2.0 licensing.

Changes

Cohort / File(s) Summary
GitHub Actions Workflows
\.github/workflows/Read_release.yml, \.github/workflows/Read_test_on_main.yml, \.github/workflows/Read_test_on_push.yml
New workflows that reuse shared templates; set workdir=Frends.PDF.Read, dotnet_version=8.0.x, enable analyzers, and forward required secrets; triggered on push/main or workflow_dispatch.
Solution & Projects
Frends.PDF.Read.sln, Frends.PDF.Read/Frends.PDF.Read.csproj, Frends.PDF.Read.Tests/Frends.PDF.Read.Tests.csproj
New solution and project files: library targeting net8.0 with dependencies (Newtonsoft.Json, PdfPig, analyzers) and a NUnit test project with coverage tooling.
Core Task Implementation
Frends.PDF.Read/Frends.PDF.Read.cs, Frends.PDF.Read/Frends.PDF.Read/Helpers/ErrorHandler.cs
New static PDF.Read(Input, Options, CancellationToken) method that opens PDFs, extracts metadata, iterates pages to collect text and optionally images (PNG preferred, RAW fallback), serializes to indented JSON, supports cancellation, and funnels errors via ErrorHandler.
Data Models (DTOs)
Frends.PDF.Read/Frends.PDF.Read/DTOs/*
\.../PdfMetadata.cs`, `.../PdfReadDocument.cs`, `.../PdfReadPage.cs`, `.../PdfReadImage.cs``
Added DTOs for document metadata, pages, and images (Title/Author/PageCount; PdfReadDocument; PdfReadPage with Number/Text/Images; PdfReadImage with Format/Bytes/Width/Height).
Task Definitions
Frends.PDF.Read/Frends.PDF.Read/Definitions/*
\.../Input.cs`, `.../Options.cs`, `.../Result.cs`, `.../Error.cs``
New input/options/result/error types: FilePath input, options IncludeImages/ThrowErrorOnFailure/ErrorMessageOnFailure, Result wrapper (Success/Json/Error), and Error details (Message/AdditionalInfo).
Tests & Suppressions
Frends.PDF.Read.Tests/UnitTests.cs, Frends.PDF.Read.Tests/ErrorHandlerTest.cs, Frends.PDF.Read.Tests/GlobalSuppressions.cs
NUnit tests covering multi/single‑page reads, image inclusion, and error handling; test project-level StyleCop suppressions.
Analysis & Task Metadata
Frends.PDF.Read/Frends.PDF.Read/GlobalSuppressions.cs, Frends.PDF.Read/FrendsTaskMetadata.json
Assembly-level StyleCop suppressions for the library; FrendsTaskMetadata.json registering the task entry point.
Docs, Changelog & Licensing
Frends.PDF.Read/README.md, Frends.PDF.Read/CHANGELOG.md, Frends.PDF.Read/Frends.PDF.Read/LICENSE, Frends.PDF.Read/migration.json, README.md
Added task README, CHANGELOG (1.0.0), Apache‑2.0 LICENSE with third‑party notices, empty migration manifest, and root README link update.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Caller
  participant Task as Frends.PDF.Read.PDF.Read
  participant PdfLib as PdfPig
  participant Builder as DTO Builder
  participant Serializer as JSON Serializer
  participant Handler as ErrorHandler

  Client->>Task: Read(Input, Options, CancellationToken)
  Task->>PdfLib: Open document / get metadata & pages
  PdfLib-->>Task: Document, metadata, pages
  Task->>Builder: Extract text for each page
  alt Options.IncludeImages == true
    Task->>PdfLib: Extract images (PNG preferred, RAW fallback)
    PdfLib-->>Task: Image bytes, format, dimensions
    Task->>Builder: Attach images to page DTOs
  end
  Task->>Builder: Build PdfReadDocument (Metadata + Pages)
  Builder-->>Task: DTO
  Task->>Serializer: Serialize DTO (indented JSON)
  Serializer-->>Task: JSON
  Task-->>Client: Result(success=true, json=...)
  opt exception
    Task->>Handler: Handle(exception, ThrowErrorOnFailure, ErrorMessageOnFailure)
    Handler-->>Task: throws OR returns Result(false, null, Error)
    Task-->>Client: Result(false, null, Error) OR exception propagated
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble bytes from pages wide and deep,
I gather text and images you may keep,
Metadata snug in JSON's cozy lair,
Errors handled gently — or thrown with care,
Hop, read, and tuck your PDFs to sleep.

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'init read pdf' is vague and generic, using non-descriptive terms that don't clearly convey what was implemented or changed in the changeset. Consider using a more specific title like 'Add PDF.Read task with document extraction and metadata parsing' or 'Implement Frends.PDF.Read task for PDF content extraction' to better describe the main changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Fix all issues with AI agents
In `@Frends.PDF.Read/Frends.PDF.Read.Tests/Frends.PDF.Read.Tests.csproj`:
- Around line 12-14: Update the PackageReference for StyleCop.Analyzers (the
element with Include="StyleCop.Analyzers") so it has the same asset restrictions
as the main project: add PrivateAssets="all" and an explicit IncludeAssets
attribute (e.g. include runtime; build; native; contentfiles; analyzers) to
prevent the analyzer package from flowing to runtime dependencies; modify the
PackageReference element in Frends.PDF.Read.Tests.csproj that references
StyleCop.Analyzers to include these attributes.

In `@Frends.PDF.Read/Frends.PDF.Read.Tests/GlobalSuppressions.cs`:
- Line 3: The assembly suppression attribute in GlobalSuppressions.cs uses an
incorrect double-colon in the rule identifier; update the assembly attribute
[assembly: SuppressMessage("StyleCop.CSharp.SpecialRules",
"SA0001::XmlCommentAnalysisDisabled", Justification = "...")] to use a single
colon so the identifier reads "SA0001:XmlCommentAnalysisDisabled" to match the
other suppressions and ensure the rule is properly recognized.

In `@Frends.PDF.Read/Frends.PDF.Read.Tests/UnitTests.cs`:
- Around line 68-85: The test ReadPdf_IncludeImages is missing a success check
before deserializing the JSON result from PDF.Read; add an assertion like
Assert.That(result.Success, Is.True) immediately after obtaining result (before
calling JsonConvert.DeserializeObject and using PdfReadDocument/ result.Json) to
ensure the operation succeeded prior to further assertions.
- Around line 49-66: Add an assertion that the PDF.Read call succeeded before
deserializing: in the ReadPdf_SinglePagePdf_ReturnsOnePageWithMetadata test,
after calling PDF.Read and storing result, assert result.Success is true (and
optionally assert result.Json is not null/empty) before calling
JsonConvert.DeserializeObject on result.Json so deserialization and subsequent
assertions won’t produce misleading null-reference failures.

In `@Frends.PDF.Read/Frends.PDF.Read/Definitions/Result.cs`:
- Line 14: The XML doc for the parameter "json" is incorrect; update the XML
<param name="json"> on the Result class/constructor in Result.cs to accurately
describe that it contains the JSON content produced by the PDF extraction (e.g.,
"The JSON content returned from the PDF extraction"), replacing the current "The
path to the signed file" copy-paste text.
- Line 1: Remove the unused using directive `using static
System.Runtime.InteropServices.JavaScript.JSType;` from the Result.cs file;
locate the top of the file where `using` statements are declared (in Result.cs)
and delete that line so only necessary namespaces remain, then rebuild to
confirm no references to JSType exist.

In `@Frends.PDF.Read/Frends.PDF.Read/DTOs/PdfReadImage.cs`:
- Around line 22-26: The XML summary for the Width property in class
PdfReadImage has an extra leading space; open the PdfReadImage class and edit
the XML comment above the Width property (the "Width" auto-property) to remove
the leading space so the <summary> text begins immediately with "Width of image
in points." ensuring consistent XML doc formatting.

In `@Frends.PDF.Read/Frends.PDF.Read/Frends.PDF.Read.cs`:
- Line 23: Replace the placeholder XML doc for the parameter named "options"
with a clear description of what the parameter represents and how it affects
behavior: state that it is the options/configuration object passed into the
Frends.PDF.Read method (or constructor) that controls things like input source,
parsing settings, and any optional flags/defaults; mention expected properties
or types and whether null/omitted is allowed and what the default behavior is.
Update the <param name="options"> tag text accordingly so the documentation
explains the purpose and expected contents of the options parameter.

In `@Frends.PDF.Read/Frends.PDF.Read/LICENSE`:
- Around line 1-255: The package currently declares MIT but depends on PdfPig
v0.1.13 (Apache-2.0) and the Frends.PDF.Read/LICENSE file (containing Apache
text and third‑party notices) is not being packed; update the .csproj (project
file) to either set PackageLicenseExpression to "Apache-2.0" (or add clear
license justification if you intend to remain MIT) and explicitly include the
LICENSE (and any NOTICE) in the package by adding a Pack Include for
Frends.PDF.Read/LICENSE (e.g., via a None/Content Include with Pack="true" and
appropriate PackagePath) so the third‑party attribution required by Apache 2.0
§4(d) is distributed.

In `@Frends.PDF.Read/LICENSE`:
- Around line 1-21: The nested Frends.PDF.Read/Frends.PDF.Read/LICENSE contains
Apache-2.0 terms for an external PdfPig dependency but the package declares MIT;
rename or move that nested file (e.g., to
Frends.PDF.Read/Frends.PDF.Read/DEPENDENCIES.md or LICENSE.EXTERNAL) and update
its header to state it documents third-party licenses (Apache-2.0 for PdfPig),
then consolidate or clarify the top-level Frends.PDF.Read/LICENSE to state the
project is MIT while pointing to the new DEPENDENCIES.md for external
components; also update any packaging/metadata in the .csproj that references
the old nested LICENSE so the package license remains MIT and the external
license file is included as documentation rather than the package license.

In `@Frends.PDF.Read/README.md`:
- Line 5: The README badge links to a workflow file named
Read_build_and_test_on_main.yml which may not match the actual GitHub Actions
workflow name; verify the real workflow filename in .github/workflows and update
the badge URL target (the href and the image path) to match the actual workflow
filename (e.g., Replace Read_build_and_test_on_main.yml with the correct
workflow filename) so the badge points to the existing workflow file.
🧹 Nitpick comments (7)
Frends.PDF.Read/Frends.PDF.Read/migration.json (1)

6-8: Add a brief migration description.
An empty description makes the manifest harder to interpret during audits and tooling output.

♻️ Suggested update
-        "Description": "",
+        "Description": "Initial migration for Frends.PDF.Read task.",
Frends.PDF.Read/README.md (1)

13-33: Add explicit developer setup prerequisites (SDK/tooling).

The guidelines require developer setup instructions. Please add a short “Prerequisites” section (e.g., .NET 8 SDK, any required workloads/tools).

As per coding guidelines.

Frends.PDF.Read/CHANGELOG.md (1)

5-7: Consider more descriptive changelog entries.

Per the coding guidelines, the changelog should include all functional changes. "Initial implementation" is vague and doesn't help users understand what features are available in version 1.0.0.

Consider expanding to describe the actual functionality:

 ### Added
 
-- Initial implementation
+- Initial implementation of PDF.Read task
+- Extract document metadata (Title, Author, PageCount)
+- Extract page text content
+- Optional image extraction with format detection (PNG/RAW)

Based on coding guidelines for CHANGELOG.md files.

Frends.PDF.Read/Frends.PDF.Read/Definitions/Error.cs (1)

16-20: Fix misleading XML example for AdditionalInfo.

The example shows object { Exception AdditionalInfo } which doesn't represent an actual Exception value. Consider a more accurate example.

Suggested fix
     /// <summary>
     /// Additional information about the error.
     /// </summary>
-    /// <example>object { Exception AdditionalInfo }</example>
+    /// <example>System.IO.FileNotFoundException</example>
     public Exception AdditionalInfo { get; set; }
Frends.PDF.Read/Frends.PDF.Read/Frends.PDF.Read.csproj (1)

26-26: Consider pinning analyzer version for reproducible builds.

Using Version="1.*" for FrendsTaskAnalyzers may cause non-deterministic builds. Consider pinning to a specific version for consistency across environments.

Frends.PDF.Read/Frends.PDF.Read.Tests/ErrorHandlerTest.cs (2)

13-19: Consider removing redundant assertion.

Assert.Throws<Exception> will fail the test if no exception is thrown or return the caught exception. The subsequent Assert.That(ex, Is.Not.Null) is redundant since Assert.Throws guarantees a non-null return when it succeeds.

Proposed simplification
     [Test]
     public void Should_Throw_Error_When_ThrowErrorOnFailure_Is_True()
     {
-        var ex = Assert.Throws<Exception>(() =>
+        Assert.Throws<Exception>(() =>
            PDF.Read(DefaultInput(), DefaultOptions(), CancellationToken.None));
-        Assert.That(ex, Is.Not.Null);
     }

30-39: Consider removing redundant null check.

Similar to the first test, Assert.That(ex, Is.Not.Null) on line 37 is redundant. The important assertion on line 38 validating the custom error message is the valuable one.

Proposed simplification
     [Test]
     public void Should_Use_Custom_ErrorMessageOnFailure()
     {
         var options = DefaultOptions();
         options.ErrorMessageOnFailure = CustomErrorMessage;
         var ex = Assert.Throws<Exception>(() =>
             PDF.Read(DefaultInput(), options, CancellationToken.None));
-        Assert.That(ex, Is.Not.Null);
         Assert.That(ex.Message, Contains.Substring(CustomErrorMessage));
     }

@MatteoDelOmbra
Copy link
Contributor

add new task info to main README in repo root

@@ -0,0 +1,255 @@
Apache License
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to include this into package (same as we include e.g. migration.json) You can also specify why this License is included (which package requires it)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@Frends.PDF.Read/Frends.PDF.Read.Tests/GlobalSuppressions.cs`:
- Around line 1-8: Add a coverage threshold and/or align the test framework:
either configure coverlet (add coverlet.msbuild package and set a minimum
coverage threshold of 80% via a project/Directory.Build.props entry like
Threshold=80) so builds fail when coverage for tests (ErrorHandlerTest.cs,
UnitTests.cs) drops below 80%, or update the repository guidance to accept NUnit
(tests use [TestFixture]/[Test]) and ensure CI/test runners use NUnit rather
than MSTest; pick one approach and make the corresponding change (add coverlet
threshold configuration or update docs/CI to reflect NUnit) and ensure test
project references and CI tasks are updated accordingly.

In `@Frends.PDF.Read/Frends.PDF.Read/Frends.PDF.Read.cs`:
- Around line 26-33: Add a null-guard for options (e.g. if (options == null)
throw new ArgumentNullException(nameof(options));) before the try in the Read
method so the catch/ErrorHandler path cannot NRE on options, but keep any input
validation inside the try block so ErrorHandler still applies; apply the same
pattern to the other method(s) that perform error handling (the other occurrence
around lines 90-93) to prevent dereferencing options/input in the catch.

In `@README.md`:
- Around line 7-9: The README contains a case-mismatch in the links: update the
link label/path that currently points to "Frends.Pdf.Read/README.md" to match
the actual folder casing "Frends.PDF.Read/README.md" so the link resolves on
case-sensitive filesystems; ensure the other entries use exact folder names
(e.g., "Frends.PDF.Create", "Frends.PDF.MergeDocuments", "Frends.PDF.Read") to
avoid broken links.
🧹 Nitpick comments (2)
Frends.PDF.Read/Frends.PDF.Read/DTOs/PdfReadImage.cs (1)

14-20: Consider clarifying the XML example for Bytes.

The example says "base64-encoded byte array" but this property holds raw byte[] data, not a base64 string. The base64 encoding only occurs during JSON serialization. This could confuse users about what to expect from this property directly.

Suggested fix
         /// <summary>
         /// Raw image data as byte array.
         /// </summary>
-        /// <example>
-        /// "base64-encoded byte array"
-        /// </example>
+        /// <example>[0x89, 0x50, 0x4E, 0x47, ...]</example>
         public byte[] Bytes { get; set; }
Frends.PDF.Read/Frends.PDF.Read/Frends.PDF.Read.cs (1)

13-17: Rename PDFPdf for .NET acronym casing.

Line 16 uses PDF, but .NET naming guidelines prefer PascalCase acronyms for three or more letters (e.g., Pdf, XmlWriter, HtmlTag). This aligns with the coding guidelines for proper abbreviation naming (Csv, Url, Api). Renaming now ensures consistency with .NET conventions and avoids long-term API inconsistency.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@Frends.PDF.Read/README.md`:
- Line 7: The README badge currently shows MIT; update the Markdown license
badge line to Apache-2.0 by replacing the badge image and link (the existing
"[![License: MIT](...)](...)" markup) with an Apache‑2.0 badge and link to the
Apache‑2.0 license (e.g. using the Shields.io Apache-2.0 badge and linking to
https://opensource.org/licenses/Apache-2.0) so the visible badge and hyperlink
match the repository's Apache-2.0 license.
🧹 Nitpick comments (1)
Frends.PDF.Read/README.md (1)

42-42: Avoid bare URL in Markdown.
Wrap the PdfPig project URL in a Markdown link to satisfy MD034 and improve readability.


[![Read_build](https://github.com/FrendsPlatform/Frends.PDF/actions/workflows/Read_test_on_main.yml/badge.svg)](https://github.com/FrendsPlatform/Frends.PDF/actions/workflows/Read_test_on_main.yml)
![Coverage](https://app-github-custom-badges.azurewebsites.net/Badge?key=FrendsPlatform/Frends.PDF/Frends.PDF.Read|main)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

License badge appears inconsistent with repo license.
The badge shows MIT, but the PR summary indicates Apache‑2.0 licensing. This is misleading for users and compliance. Please update the badge to reflect the actual license.

🤖 Prompt for AI Agents
In `@Frends.PDF.Read/README.md` at line 7, The README badge currently shows MIT;
update the Markdown license badge line to Apache-2.0 by replacing the badge
image and link (the existing "[![License: MIT](...)](...)" markup) with an
Apache‑2.0 badge and link to the Apache‑2.0 license (e.g. using the Shields.io
Apache-2.0 badge and linking to https://opensource.org/licenses/Apache-2.0) so
the visible badge and hyperlink match the repository's Apache-2.0 license.

@MatteoDelOmbra MatteoDelOmbra merged commit 2b3b7a9 into main Jan 23, 2026
5 checks passed
@MatteoDelOmbra MatteoDelOmbra deleted the FSPW-670-PDF.Read branch January 23, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants