Skip to content

Conversation

@paxcalpt
Copy link
Contributor

Issue

The DOCX converter was not indicating corresponding authors, even though the metadata contained the field and email information.

Changes

  • ✅ Add asterisk (*) markers next to corresponding authors in the author list
  • ✅ Add 'Correspondence:' section after affiliations with contact information
  • ✅ Support both plain text and base64-encoded fields
  • ✅ Handle multiple corresponding authors with semicolon separation

Example Output

Before this fix, the author line would show:

Jane Smith¹, John Doe², Alice Johnson¹,²

After this fix, it shows:

Jane Smith¹*, John Doe², Alice Johnson¹,²*

* Correspondence: Jane Smith (jane.smith@example.com); Alice Johnson (alice@example.com)

Testing

  • ✅ All existing DOCX tests pass (37 passed, 9 skipped)
  • ✅ Manual verification with test script confirms:
    • Corresponding author markers appear correctly
    • Correspondence section is added after affiliations
    • Emails are properly decoded from base64 when needed

Related

This brings DOCX export in line with PDF/LaTeX generation, which already indicates corresponding authors using the \Letter symbol.

- Add asterisk (*) markers next to corresponding authors in author list
- Add 'Correspondence:' section after affiliations with contact info
- Support both plain text email and base64-encoded email64 fields
- Handle multiple corresponding authors with semicolon separation

Fixes issue where DOCX export was not indicating corresponding authors
even though the metadata contained the information.
Copilot AI review requested due to automatic review settings December 23, 2025 09:06
@github-actions
Copy link

Code Review for PR #272: Add corresponding author indication to DOCX export

Thank you for this PR! I've reviewed the changes according to the project's coding standards and best practices outlined in CLAUDE.md. Overall, this is a solid implementation that properly addresses the feature gap. Here's my detailed feedback:


Strengths

1. Code Quality

  • Clean implementation: The code follows the existing patterns in docx_writer.py and integrates seamlessly with the _add_title_page method
  • Proper placement: The correspondence section is logically placed after affiliations, matching the layout pattern from PDF export
  • Good code structure: The logic is clear and easy to follow with appropriate comments

2. Security & Error Handling

  • Safe base64 decoding: Properly wrapped in try-except to handle invalid encoded data gracefully (lines 279-282)
  • Silent failure is appropriate: Catches generic Exception and sets empty email on failure, which is correct for this use case (no sensitive data exposure)
  • No injection risks: All data is properly handled through python-docx's API with no raw string interpolation

3. Consistency

  • Matches PDF output: Brings DOCX export in line with LaTeX/PDF generation which uses \Letter symbol
  • Follows existing patterns: Uses the same author.get() pattern for safe dictionary access throughout
  • Proper formatting: Uses superscript asterisk marker and appropriate font sizing (Pt(10))

🔍 Observations & Minor Suggestions

1. Exception Handling (Line 281)

Current code:

except Exception:
    email = ""

Consideration: While catching generic Exception is acceptable here since the fallback behavior is safe, you could be slightly more specific for better debugging:

except (ValueError, binascii.Error, UnicodeDecodeError):
    email = ""

This would catch the expected exceptions from base64 decoding while allowing truly unexpected errors to surface in logs. However, this is a minor point and the current implementation is acceptable.

2. Font Size Application (Line 290)

Current code:

corr_para.paragraph_format.space_after = Pt(12)
corr_para.runs[-1].font.size = Pt(10)

Observation: The font size is only applied to the last run. If there are multiple corresponding authors, only the last author's email will have the 10pt font size applied.

Suggested improvement:

corr_para.paragraph_format.space_after = Pt(12)
# Apply consistent font size to all runs in the correspondence paragraph
for run in corr_para.runs:
    run.font.size = Pt(10)

This ensures consistent formatting across all correspondence information.

3. Test Coverage

Good: The PR description mentions manual testing and that existing tests pass.

Suggestion: Consider adding a unit test specifically for the corresponding author functionality to prevent regressions:

def test_corresponding_author_markers_in_docx(tmp_path):
    """Test that corresponding authors get asterisk markers and correspondence section."""
    test_dir = tmp_path / "corr_author_test"
    test_dir.mkdir()
    
    main_file = test_dir / "01_MAIN.md"
    main_file.write_text("""---
title: "Test Manuscript"
authors:
  - name: "Jane Smith"
    affiliations: ["Uni A"]
    corresponding_author: true
    email: "jane@example.com"
  - name: "John Doe"
    affiliations: ["Uni B"]
affiliations:
  - shortname: "Uni A"
    full_name: "University A"
  - shortname: "Uni B"
    full_name: "University B"
---

# Test
Content here.
""")
    
    bib_file = test_dir / "03_REFERENCES.bib"
    bib_file.write_text("@article{test2023,author={Test},title={Test},year={2023}}")
    
    exporter = DocxExporter(manuscript_path=str(test_dir))
    result_path = exporter.export()
    doc = Document(str(result_path))
    all_text = "\n".join([p.text for p in doc.paragraphs])
    
    # Check for correspondence section
    assert "* Correspondence:" in all_text
    assert "Jane Smith" in all_text
    assert "jane@example.com" in all_text
    
    # Check for asterisk marker in author line
    # (This requires checking runs with superscript formatting)
    has_superscript_asterisk = False
    for para in doc.paragraphs:
        if "Jane Smith" in para.text:
            for run in para.runs:
                if run.text == "*" and run.font.superscript:
                    has_superscript_asterisk = True
                    break
    assert has_superscript_asterisk

This would belong in tests/integration/test_docx_export_integration.py.

4. Base64 Decoding Consistency

The project has a dedicated email_encoder.py utility (src/rxiv_maker/utils/email_encoder.py) with decode_email() function that includes validation. Consider using that instead of inline decoding:

from ..utils.email_encoder import decode_email

# Later in the code:
if not email:
    email64 = author.get("email64", "")
    if email64:
        try:
            email = decode_email(email64)
        except ValueError:
            email = ""

This would provide:

  • Better validation (email format checking)
  • Code reuse (DRY principle)
  • Consistent behavior with other parts of the codebase

🚀 Performance Considerations

  • Efficient: The code only processes corresponding authors once during title page generation
  • No unnecessary loops: Direct list comprehension for filtering corresponding authors is optimal
  • Memory usage: Minimal additional memory footprint

📋 Testing Checklist

According to CLAUDE.md testing requirements:

  • ✅ Existing tests pass (37 passed, 9 skipped as mentioned)
  • ⚠️ Consider adding: Specific unit/integration test for corresponding author feature
  • ⚠️ Manual verification recommended: Test with multiple corresponding authors to verify font sizing
  • ✅ Manual testing completed per PR description

📝 Documentation

  • ✅ PR description is clear and thorough
  • ✅ Code comments are appropriate
  • ✅ Inline documentation matches code behavior

🎯 Final Recommendation

Status: APPROVE with minor suggestions

The implementation is solid and ready to merge. The suggestions above are optional improvements that could be addressed either before merge or in a follow-up PR:

  1. Critical: None
  2. Recommended: Apply font size to all runs in correspondence paragraph (not just last run)
  3. Optional: Add dedicated test case, use existing decode_email() utility

Great work on bringing DOCX export to feature parity with PDF export! The code is clean, secure, and follows the project's conventions.


Review conducted according to rxiv-maker CLAUDE.md guidelines, checking: code quality, security, performance, test coverage, and consistency with existing patterns.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds corresponding author indication to DOCX exports, bringing it in line with PDF/LaTeX generation which already includes this functionality. The implementation adds asterisk markers next to corresponding authors in the author list and includes a "Correspondence:" section with contact information after the affiliations.

Key changes:

  • Adds asterisk (*) superscript markers next to corresponding authors in the author list
  • Adds a "Correspondence:" section after affiliations with author names and email addresses
  • Supports both plain text email and base64-encoded email64 fields for contact information

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

corr_para.add_run(name)

corr_para.paragraph_format.space_after = Pt(12)
corr_para.runs[-1].font.size = Pt(10)
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The font size is being set only on the last run of the paragraph. This means the font size (10pt) will only apply to the last author's name/email text, while the asterisk, "Correspondence:" label, and other author entries will remain at the default size. For consistent styling, the font size should be set on all runs in the correspondence paragraph, similar to how it's done for affiliations (line 255) where each run's font size is set individually.

Suggested change
corr_para.runs[-1].font.size = Pt(10)
for run in corr_para.runs:
run.font.size = Pt(10)

Copilot uses AI. Check for mistakes.
Comment on lines +279 to +282
try:
email = base64.b64decode(email64).decode("utf-8")
except Exception:
email = ""
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bare except clause catches all exceptions, including system exceptions like KeyboardInterrupt and SystemExit. This should be more specific by catching only the expected exception types (e.g., binascii.Error, ValueError) that could occur during base64 decoding. Using a bare except is an anti-pattern that can hide bugs and make debugging more difficult.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

Code Review: PR #272 - Corresponding Author Indication

Summary

This PR adds corresponding author indication to DOCX exports, bringing it in line with PDF/LaTeX generation. The implementation properly handles the asterisk markers and correspondence section with email decoding.

✅ Strengths

1. Feature Completeness

  • Adds asterisk markers next to corresponding authors (lines 235-238)
  • Creates a dedicated correspondence section with contact info (lines 260-289)
  • Handles multiple corresponding authors with semicolon separation
  • Supports both plain email and base64-encoded email64 fields

2. Code Quality

  • Clean, readable implementation
  • Proper error handling for base64 decoding (try/except with fallback)
  • Consistent formatting with existing code style
  • Good use of list comprehension for filtering corresponding authors

3. User Experience

  • Professional output format matches academic standards
  • Proper spacing and typography (Pt(10), Pt(12) spacing)
  • Superscript markers for visual consistency

4. Testing

  • All 37 existing DOCX tests pass
  • Manual verification confirms correct behavior

🔍 Observations & Suggestions

1. Merge Conflict with PR #273
⚠️ IMPORTANT: This PR conflicts with PR #273, which modifies the exact same method (_add_title_page). Both PRs:

  • Add import base64 at the top
  • Add corresponding author handling in _add_title_page

Recommendation:

2. Test Coverage Enhancement
Consider adding a dedicated test for corresponding author indication:

def test_corresponding_author_indication(sample_manuscript_path, tmp_path):
    """Test that corresponding authors are marked with asterisks."""
    # Ensure test fixture has corresponding_author metadata
    exporter = DocxExporter(manuscript_path=sample_manuscript_path)
    result_path = exporter.export()
    doc = Document(str(result_path))
    
    all_text = "\n".join([p.text for p in doc.paragraphs])
    
    # Check for asterisk markers in author line
    author_para_text = doc.paragraphs[1].text  # Adjust index as needed
    assert "*" in author_para_text, "Expected asterisk marker for corresponding author"
    
    # Check for correspondence section
    assert "Correspondence:" in all_text
    assert "(" in all_text and ")" in all_text  # Check for email format

3. Edge Case: No Email Available
The code handles missing emails gracefully (lines 283-285), but consider logging a warning:

if email:
    corr_para.add_run(f"{name} ({email})")
else:
    corr_para.add_run(name)
    logger.warning(f"Corresponding author {name} has no email address")

4. Base64 Decoding Error Handling
The bare except Exception: at line 282 is appropriate for robustness, but consider being more specific:

try:
    email = base64.b64decode(email64).decode("utf-8")
except (ValueError, UnicodeDecodeError, Exception) as e:
    logger.warning(f"Failed to decode email64 for {name}: {e}")
    email = ""

5. Font Size Inconsistency
Line 290 sets font size on the last run only:

corr_para.runs[-1].font.size = Pt(10)

This should probably apply to all runs in the correspondence paragraph for consistency. Consider:

# Set font size for entire correspondence section
for run in corr_para.runs:
    run.font.size = Pt(10)

6. Documentation
The PR description mentions this brings DOCX in line with PDF/LaTeX, which uses \Letter symbol. Consider:

  • Adding a comment in the code referencing this parity
  • Updating documentation to mention this feature
  • Adding to the changelog

🛡️ Security Considerations

✅ Good security practices

  • Base64 decoding is properly error-handled
  • No user input is directly processed
  • Email addresses are from trusted metadata files

Minor suggestion: The bare Exception catch is fine for robustness, but being more specific would help catch unexpected issues in development.

🚀 Performance Considerations

✅ No performance concerns

  • Simple list comprehension for filtering
  • Minimal overhead (one additional paragraph)
  • No expensive operations

🐛 Potential Issues

1. Index-based Font Size Assignment (Line 290)

corr_para.runs[-1].font.size = Pt(10)

This only sets the font size on the last run. If the intention is to make the entire correspondence section smaller, this should loop through all runs:

for run in corr_para.runs:
    run.font.size = Pt(10)

2. Multiple Corresponding Authors Formatting
The semicolon separation is good, but consider adding a space after the semicolon for better readability:

if i > 0:
    corr_para.add_run("; ")  # Already has space - good!

✅ This is already correct in the code.

📊 Code Metrics

  • Lines changed: +40/-1 (net +39)
  • Complexity: Low (straightforward logic)
  • Maintainability: High (clear and readable)
  • Test coverage: Good (all tests pass, could add specific test)

🎯 Final Recommendation

APPROVE with merge coordination

This is a well-implemented bug fix that:

  • Addresses a real gap in DOCX export functionality
  • Uses clean, maintainable code
  • Handles edge cases appropriately
  • Brings DOCX exports to parity with PDF/LaTeX output

Required Actions:

  1. ⚠️ Coordinate merge with PR feat: DOCX export enhancements - configuration options, formatting fixes, and code centralization #273 to avoid conflicts
  2. Consider merging this PR first (it's a bug fix)

Suggested Follow-ups (not blocking):

  1. Fix font size application to all runs (line 290)
  2. Add specific unit test for corresponding author feature
  3. Enhance error logging for base64 decoding
  4. Update user documentation
  5. Add entry to CHANGELOG.md

📋 Checklist

Excellent work on bringing DOCX export to feature parity with PDF/LaTeX output!


Review conducted following rxiv-maker CLAUDE.md guidelines

@github-actions
Copy link

Code Review: PR 272 - Add corresponding author indication to DOCX export

Overview: This PR adds corresponding author markers (asterisks) and a Correspondence section to DOCX exports, bringing it in line with PDF/LaTeX generation. This is a focused, well-scoped bug fix.

STRENGTHS:

  • Clear problem statement: The issue is well-defined (missing corresponding author indication)
  • Minimal, focused changes: Only modifies what is necessary to fix the issue
  • Good documentation: PR description includes before/after examples
  • Proper formatting: Uses superscript asterisks and adds dedicated Correspondence section
  • Email handling: Supports both plain text email and base64-encoded email64 fields
  • Multiple authors: Handles multiple corresponding authors with semicolon separation
  • All existing tests pass (37 passed, 9 skipped)

ISSUES AND CONCERNS:

  1. MERGE CONFLICT WITH PR 273
    This PR modifies the same section of docx_writer.py as PR 273. PR 273 actually includes ALL changes from this PR plus additional features.
    Impact: If PR 273 merges first, this PR becomes redundant. If this PR merges first, PR 273 will have conflicts.
    Recommendation: Coordinate merge order - suggest merging this PR first (smaller, focused fix), then have PR 273 rebase.

  2. MISSING TEST COVERAGE
    No new tests were added to verify the corresponding author functionality.
    Missing test cases needed in tests/integration/test_docx_export_integration.py:

  • test_export_with_corresponding_author() - Verify asterisk markers appear
  • test_export_with_multiple_corresponding_authors() - Verify semicolon separation
  • test_export_with_corresponding_author_email64() - Verify base64 decoding
  • test_export_without_corresponding_author() - Verify graceful handling when none present

According to CLAUDE.md, the project aims for more than 80 percent test coverage.

  1. SECURITY: BROAD EXCEPTION HANDLING
    Lines 303-306 (in the PR diff) catch Exception which is too broad:
    try: email = base64.b64decode(email64).decode("utf-8")
    except Exception: email = ""

Recommendation: Catch specific exceptions like (ValueError, UnicodeDecodeError, binascii.Error) to avoid hiding unexpected errors.

  1. INCONSISTENT FONT SIZING
    The code sets font size to 8pt for most runs (marker, label, separator, info) which is good for consistency. However, verify there is no conflicting font size assignment elsewhere that could override this.

  2. MINOR: IMPORT STATEMENT
    The base64 module is imported but only used in one place. This is fine, but consider adding a comment or docstring noting that it is for email64 decoding.

RECOMMENDATIONS:

High Priority:

  1. Coordinate with PR 273 to avoid merge conflicts
  2. Add test coverage for corresponding author functionality
  3. Improve exception handling to catch specific exceptions

Medium Priority:
4. Verify font sizing consistency throughout the correspondence section
5. Add CHANGELOG.md entry documenting this bug fix

TESTING CHECKLIST:

  • All existing tests pass (37 passed, 9 skipped)
  • New tests added for corresponding author markers
  • New tests added for multiple corresponding authors
  • New tests added for email64 decoding
  • Manual verification with test manuscripts
  • CHANGELOG.md updated

CODE STYLE COMPLIANCE (per CLAUDE.md):

  • PEP 8 compliance: Code appears properly formatted
  • Self-documenting: Variable names are clear (corresponding_authors, corr_para, etc)
  • Security: Exception handling needs improvement
  • Test coverage: Missing tests for new functionality
  • Backward compatibility: Maintains existing behavior when no corresponding authors present

OVERALL ASSESSMENT:

This is a well-scoped bug fix that addresses a real gap in DOCX export functionality. The implementation is clean and follows existing patterns in the codebase.

However, the PR needs:

  1. Critical: Coordination with PR 273 to avoid merge conflicts
  2. Important: Test coverage for the new functionality
  3. Nice-to-have: Improved exception handling

Recommendation: Add tests and coordinate merge order with PR 273, then this will be ready to merge.

Great work identifying and fixing this gap! The implementation is clean and follows the existing code patterns well.

@github-actions
Copy link

Code Review: Corresponding Author Indication in DOCX Export

Thank you for this well-implemented fix! The PR successfully addresses the missing corresponding author indication in DOCX exports.

Strengths

Code Quality & Consistency

  • Excellent reuse of existing infrastructure: Leverages AuthorAffiliationProcessor to extract corresponding authors, maintaining consistency with co-first author handling
  • Clean implementation: Follows the same pattern as co-first author markers, making it easy to understand and maintain
  • Proper formatting: Consistent use of font sizing and spacing matching existing affiliation formatting

Security & Robustness

  • Safe base64 handling with try-except block to gracefully handle malformed data
  • Input validation: Checks for empty email strings before attempting decode
  • Fallback behavior: Falls back to name-only display if email is unavailable

Feature Completeness

  • Dual indication: Adds both inline asterisk markers and detailed correspondence section
  • Multi-author support: Properly handles multiple corresponding authors with semicolon separation
  • Format parity: Brings DOCX export in line with PDF/LaTeX generation

Minor Observations & Suggestions

1. Overly Broad Exception Handling (Low Priority)
Location: docx_writer.py:345-348 - Consider catching specific exceptions (ValueError, UnicodeDecodeError, binascii.Error) and adding logger.warning for better debugging.

2. Missing Direct Unit Tests (Medium Priority)
While integration tests pass, add specific unit tests for: single corresponding author with email, multiple corresponding authors, missing email field fallback, and malformed base64 email handling.

3. DRY Principle Opportunity (Low Priority)
The base64 email decoding logic could be extracted to email_encoder.py as a decode_email64() helper if this pattern repeats.

Security Analysis

No security concerns identified: Base64 decoding is memory-safe, no SQL injection vectors, no XSS risk, no command injection, input validation present.

Performance

Efficient implementation: AuthorAffiliationProcessor extracts in single pass, no redundant loops, base64 decode is O(n) negligible.

Overall Assessment

Quality Score: 4.5/5
Recommendation: Approve with minor suggestions

This PR is production-ready. The suggestions above are optional improvements for follow-up PRs.

Checklist Against CLAUDE.md

  • ✅ Follows PEP 8
  • ✅ Uses type hints
  • ✅ Validates inputs
  • ✅ No over-engineering
  • ✅ Security best practices
  • ⚠️ Test coverage: Integration yes / Unit tests recommended
  • ✅ Performance: No concerns

Great work on maintaining feature parity across export formats!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants