fix: Add corresponding author indication to DOCX export #272

paxcalpt · 2025-12-23T09:06:47Z

Issue

The DOCX converter was not indicating corresponding authors, even though the metadata contained the field and email information.

Changes

✅ Add asterisk (*) markers next to corresponding authors in the author list
✅ Add 'Correspondence:' section after affiliations with contact information
✅ Support both plain text and base64-encoded fields
✅ Handle multiple corresponding authors with semicolon separation

Example Output

Before this fix, the author line would show:

Jane Smith¹, John Doe², Alice Johnson¹,²

After this fix, it shows:

Jane Smith¹*, John Doe², Alice Johnson¹,²*

* Correspondence: Jane Smith (jane.smith@example.com); Alice Johnson (alice@example.com)

Testing

✅ All existing DOCX tests pass (37 passed, 9 skipped)
✅ Manual verification with test script confirms:
- Corresponding author markers appear correctly
- Correspondence section is added after affiliations
- Emails are properly decoded from base64 when needed

- Add asterisk (*) markers next to corresponding authors in author list - Add 'Correspondence:' section after affiliations with contact info - Support both plain text email and base64-encoded email64 fields - Handle multiple corresponding authors with semicolon separation Fixes issue where DOCX export was not indicating corresponding authors even though the metadata contained the information.

github-actions · 2025-12-23T09:09:05Z

Code Review for PR #272: Add corresponding author indication to DOCX export

Thank you for this PR! I've reviewed the changes according to the project's coding standards and best practices outlined in CLAUDE.md. Overall, this is a solid implementation that properly addresses the feature gap. Here's my detailed feedback:

✅ Strengths

1. Code Quality

Clean implementation: The code follows the existing patterns in docx_writer.py and integrates seamlessly with the _add_title_page method
Proper placement: The correspondence section is logically placed after affiliations, matching the layout pattern from PDF export
Good code structure: The logic is clear and easy to follow with appropriate comments

2. Security & Error Handling

Safe base64 decoding: Properly wrapped in try-except to handle invalid encoded data gracefully (lines 279-282)
Silent failure is appropriate: Catches generic Exception and sets empty email on failure, which is correct for this use case (no sensitive data exposure)
No injection risks: All data is properly handled through python-docx's API with no raw string interpolation

3. Consistency

Matches PDF output: Brings DOCX export in line with LaTeX/PDF generation which uses \Letter symbol
Follows existing patterns: Uses the same author.get() pattern for safe dictionary access throughout
Proper formatting: Uses superscript asterisk marker and appropriate font sizing (Pt(10))

🔍 Observations & Minor Suggestions

1. Exception Handling (Line 281)

Current code:

except Exception:
    email = ""

Consideration: While catching generic Exception is acceptable here since the fallback behavior is safe, you could be slightly more specific for better debugging:

except (ValueError, binascii.Error, UnicodeDecodeError):
    email = ""

This would catch the expected exceptions from base64 decoding while allowing truly unexpected errors to surface in logs. However, this is a minor point and the current implementation is acceptable.

2. Font Size Application (Line 290)

Current code:

corr_para.paragraph_format.space_after = Pt(12)
corr_para.runs[-1].font.size = Pt(10)

Observation: The font size is only applied to the last run. If there are multiple corresponding authors, only the last author's email will have the 10pt font size applied.

Suggested improvement:

corr_para.paragraph_format.space_after = Pt(12)
# Apply consistent font size to all runs in the correspondence paragraph
for run in corr_para.runs:
    run.font.size = Pt(10)

This ensures consistent formatting across all correspondence information.

3. Test Coverage

Good: The PR description mentions manual testing and that existing tests pass.

Suggestion: Consider adding a unit test specifically for the corresponding author functionality to prevent regressions:

def test_corresponding_author_markers_in_docx(tmp_path):
    """Test that corresponding authors get asterisk markers and correspondence section."""
    test_dir = tmp_path / "corr_author_test"
    test_dir.mkdir()
    
    main_file = test_dir / "01_MAIN.md"
    main_file.write_text("""---
title: "Test Manuscript"
authors:
  - name: "Jane Smith"
    affiliations: ["Uni A"]
    corresponding_author: true
    email: "jane@example.com"
  - name: "John Doe"
    affiliations: ["Uni B"]
affiliations:
  - shortname: "Uni A"
    full_name: "University A"
  - shortname: "Uni B"
    full_name: "University B"
---

# Test
Content here.
""")
    
    bib_file = test_dir / "03_REFERENCES.bib"
    bib_file.write_text("@article{test2023,author={Test},title={Test},year={2023}}")
    
    exporter = DocxExporter(manuscript_path=str(test_dir))
    result_path = exporter.export()
    doc = Document(str(result_path))
    all_text = "\n".join([p.text for p in doc.paragraphs])
    
    # Check for correspondence section
    assert "* Correspondence:" in all_text
    assert "Jane Smith" in all_text
    assert "jane@example.com" in all_text
    
    # Check for asterisk marker in author line
    # (This requires checking runs with superscript formatting)
    has_superscript_asterisk = False
    for para in doc.paragraphs:
        if "Jane Smith" in para.text:
            for run in para.runs:
                if run.text == "*" and run.font.superscript:
                    has_superscript_asterisk = True
                    break
    assert has_superscript_asterisk

This would belong in tests/integration/test_docx_export_integration.py.

4. Base64 Decoding Consistency

The project has a dedicated email_encoder.py utility (src/rxiv_maker/utils/email_encoder.py) with decode_email() function that includes validation. Consider using that instead of inline decoding:

from ..utils.email_encoder import decode_email

# Later in the code:
if not email:
    email64 = author.get("email64", "")
    if email64:
        try:
            email = decode_email(email64)
        except ValueError:
            email = ""

This would provide:

Better validation (email format checking)
Code reuse (DRY principle)
Consistent behavior with other parts of the codebase

🚀 Performance Considerations

Efficient: The code only processes corresponding authors once during title page generation
No unnecessary loops: Direct list comprehension for filtering corresponding authors is optimal
Memory usage: Minimal additional memory footprint

📋 Testing Checklist

According to CLAUDE.md testing requirements:

✅ Existing tests pass (37 passed, 9 skipped as mentioned)
⚠️ Consider adding: Specific unit/integration test for corresponding author feature
⚠️ Manual verification recommended: Test with multiple corresponding authors to verify font sizing
✅ Manual testing completed per PR description

📝 Documentation

✅ PR description is clear and thorough
✅ Code comments are appropriate
✅ Inline documentation matches code behavior

🎯 Final Recommendation

Status: APPROVE with minor suggestions

The implementation is solid and ready to merge. The suggestions above are optional improvements that could be addressed either before merge or in a follow-up PR:

Critical: None
Recommended: Apply font size to all runs in correspondence paragraph (not just last run)
Optional: Add dedicated test case, use existing decode_email() utility

Great work on bringing DOCX export to feature parity with PDF export! The code is clean, secure, and follows the project's conventions.

Review conducted according to rxiv-maker CLAUDE.md guidelines, checking: code quality, security, performance, test coverage, and consistency with existing patterns.

Copilot

Pull request overview

This PR adds corresponding author indication to DOCX exports, bringing it in line with PDF/LaTeX generation which already includes this functionality. The implementation adds asterisk markers next to corresponding authors in the author list and includes a "Correspondence:" section with contact information after the affiliations.

Key changes:

Adds asterisk (*) superscript markers next to corresponding authors in the author list
Adds a "Correspondence:" section after affiliations with author names and email addresses
Supports both plain text email and base64-encoded email64 fields for contact information

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-23T09:09:58Z

src/rxiv_maker/exporters/docx_writer.py

+                    corr_para.add_run(name)
+
+            corr_para.paragraph_format.space_after = Pt(12)
+            corr_para.runs[-1].font.size = Pt(10)


The font size is being set only on the last run of the paragraph. This means the font size (10pt) will only apply to the last author's name/email text, while the asterisk, "Correspondence:" label, and other author entries will remain at the default size. For consistent styling, the font size should be set on all runs in the correspondence paragraph, similar to how it's done for affiliations (line 255) where each run's font size is set individually.

Suggested change

corr_para.runs[-1].font.size = Pt(10)

for run in corr_para.runs:

run.font.size = Pt(10)

Copilot · 2025-12-23T09:09:59Z

src/rxiv_maker/exporters/docx_writer.py

+                        try:
+                            email = base64.b64decode(email64).decode("utf-8")
+                        except Exception:
+                            email = ""


The bare except clause catches all exceptions, including system exceptions like KeyboardInterrupt and SystemExit. This should be more specific by catching only the expected exception types (e.g., binascii.Error, ValueError) that could occur during base64 decoding. Using a bare except is an anti-pattern that can hide bugs and make debugging more difficult.

github-actions · 2025-12-23T09:18:00Z

Code Review: PR #272 - Corresponding Author Indication

Summary

This PR adds corresponding author indication to DOCX exports, bringing it in line with PDF/LaTeX generation. The implementation properly handles the asterisk markers and correspondence section with email decoding.

✅ Strengths

1. Feature Completeness

Adds asterisk markers next to corresponding authors (lines 235-238)
Creates a dedicated correspondence section with contact info (lines 260-289)
Handles multiple corresponding authors with semicolon separation
Supports both plain email and base64-encoded email64 fields

2. Code Quality

Clean, readable implementation
Proper error handling for base64 decoding (try/except with fallback)
Consistent formatting with existing code style
Good use of list comprehension for filtering corresponding authors

3. User Experience

Professional output format matches academic standards
Proper spacing and typography (Pt(10), Pt(12) spacing)
Superscript markers for visual consistency

4. Testing

All 37 existing DOCX tests pass
Manual verification confirms correct behavior

🔍 Observations & Suggestions

1. Merge Conflict with PR #273
⚠️ IMPORTANT: This PR conflicts with PR #273, which modifies the exact same method (_add_title_page). Both PRs:

Add import base64 at the top
Add corresponding author handling in _add_title_page

Recommendation:

Coordinate the merge order with PR feat: DOCX export enhancements - configuration options, formatting fixes, and code centralization #273
The second PR to merge will need to resolve conflicts
Consider merging fix: Add corresponding author indication to DOCX export #272 first since it's a bug fix, then feat: DOCX export enhancements - configuration options, formatting fixes, and code centralization #273 as an enhancement

2. Test Coverage Enhancement
Consider adding a dedicated test for corresponding author indication:

def test_corresponding_author_indication(sample_manuscript_path, tmp_path):
    """Test that corresponding authors are marked with asterisks."""
    # Ensure test fixture has corresponding_author metadata
    exporter = DocxExporter(manuscript_path=sample_manuscript_path)
    result_path = exporter.export()
    doc = Document(str(result_path))
    
    all_text = "\n".join([p.text for p in doc.paragraphs])
    
    # Check for asterisk markers in author line
    author_para_text = doc.paragraphs[1].text  # Adjust index as needed
    assert "*" in author_para_text, "Expected asterisk marker for corresponding author"
    
    # Check for correspondence section
    assert "Correspondence:" in all_text
    assert "(" in all_text and ")" in all_text  # Check for email format

3. Edge Case: No Email Available
The code handles missing emails gracefully (lines 283-285), but consider logging a warning:

if email:
    corr_para.add_run(f"{name} ({email})")
else:
    corr_para.add_run(name)
    logger.warning(f"Corresponding author {name} has no email address")

4. Base64 Decoding Error Handling
The bare except Exception: at line 282 is appropriate for robustness, but consider being more specific:

try:
    email = base64.b64decode(email64).decode("utf-8")
except (ValueError, UnicodeDecodeError, Exception) as e:
    logger.warning(f"Failed to decode email64 for {name}: {e}")
    email = ""

5. Font Size Inconsistency
Line 290 sets font size on the last run only:

corr_para.runs[-1].font.size = Pt(10)

This should probably apply to all runs in the correspondence paragraph for consistency. Consider:

# Set font size for entire correspondence section
for run in corr_para.runs:
    run.font.size = Pt(10)

6. Documentation
The PR description mentions this brings DOCX in line with PDF/LaTeX, which uses \Letter symbol. Consider:

Adding a comment in the code referencing this parity
Updating documentation to mention this feature
Adding to the changelog

🛡️ Security Considerations

✅ Good security practices

Base64 decoding is properly error-handled
No user input is directly processed
Email addresses are from trusted metadata files

Minor suggestion: The bare Exception catch is fine for robustness, but being more specific would help catch unexpected issues in development.

🚀 Performance Considerations

✅ No performance concerns

Simple list comprehension for filtering
Minimal overhead (one additional paragraph)
No expensive operations

🐛 Potential Issues

1. Index-based Font Size Assignment (Line 290)

corr_para.runs[-1].font.size = Pt(10)

This only sets the font size on the last run. If the intention is to make the entire correspondence section smaller, this should loop through all runs:

for run in corr_para.runs:
    run.font.size = Pt(10)

2. Multiple Corresponding Authors Formatting
The semicolon separation is good, but consider adding a space after the semicolon for better readability:

if i > 0:
    corr_para.add_run("; ")  # Already has space - good!

✅ This is already correct in the code.

📊 Code Metrics

Lines changed: +40/-1 (net +39)
Complexity: Low (straightforward logic)
Maintainability: High (clear and readable)
Test coverage: Good (all tests pass, could add specific test)

🎯 Final Recommendation

APPROVE with merge coordination

This is a well-implemented bug fix that:

Addresses a real gap in DOCX export functionality
Uses clean, maintainable code
Handles edge cases appropriately
Brings DOCX exports to parity with PDF/LaTeX output

Required Actions:

⚠️ Coordinate merge with PR feat: DOCX export enhancements - configuration options, formatting fixes, and code centralization #273 to avoid conflicts
Consider merging this PR first (it's a bug fix)

Suggested Follow-ups (not blocking):

Fix font size application to all runs (line 290)
Add specific unit test for corresponding author feature
Enhance error logging for base64 decoding
Update user documentation
Add entry to CHANGELOG.md

📋 Checklist

✅ Code quality and best practices
⚠️ Minor issue with font size application (line 290)
✅ Good performance characteristics
✅ Good security practices
⚠️ Merge conflict with PR feat: DOCX export enhancements - configuration options, formatting fixes, and code centralization #273 needs coordination
⚠️ Test coverage could be enhanced with specific test

Excellent work on bringing DOCX export to feature parity with PDF/LaTeX output!

Review conducted following rxiv-maker CLAUDE.md guidelines

github-actions · 2025-12-23T09:29:00Z

Code Review: PR 272 - Add corresponding author indication to DOCX export

Overview: This PR adds corresponding author markers (asterisks) and a Correspondence section to DOCX exports, bringing it in line with PDF/LaTeX generation. This is a focused, well-scoped bug fix.

STRENGTHS:

Clear problem statement: The issue is well-defined (missing corresponding author indication)
Minimal, focused changes: Only modifies what is necessary to fix the issue
Good documentation: PR description includes before/after examples
Proper formatting: Uses superscript asterisks and adds dedicated Correspondence section
Email handling: Supports both plain text email and base64-encoded email64 fields
Multiple authors: Handles multiple corresponding authors with semicolon separation
All existing tests pass (37 passed, 9 skipped)

ISSUES AND CONCERNS:

MERGE CONFLICT WITH PR 273
This PR modifies the same section of docx_writer.py as PR 273. PR 273 actually includes ALL changes from this PR plus additional features.
Impact: If PR 273 merges first, this PR becomes redundant. If this PR merges first, PR 273 will have conflicts.
Recommendation: Coordinate merge order - suggest merging this PR first (smaller, focused fix), then have PR 273 rebase.
MISSING TEST COVERAGE
No new tests were added to verify the corresponding author functionality.
Missing test cases needed in tests/integration/test_docx_export_integration.py:

test_export_with_corresponding_author() - Verify asterisk markers appear
test_export_with_multiple_corresponding_authors() - Verify semicolon separation
test_export_with_corresponding_author_email64() - Verify base64 decoding
test_export_without_corresponding_author() - Verify graceful handling when none present

According to CLAUDE.md, the project aims for more than 80 percent test coverage.

SECURITY: BROAD EXCEPTION HANDLING
Lines 303-306 (in the PR diff) catch Exception which is too broad:
try: email = base64.b64decode(email64).decode("utf-8")
except Exception: email = ""

Recommendation: Catch specific exceptions like (ValueError, UnicodeDecodeError, binascii.Error) to avoid hiding unexpected errors.

INCONSISTENT FONT SIZING
The code sets font size to 8pt for most runs (marker, label, separator, info) which is good for consistency. However, verify there is no conflicting font size assignment elsewhere that could override this.
MINOR: IMPORT STATEMENT
The base64 module is imported but only used in one place. This is fine, but consider adding a comment or docstring noting that it is for email64 decoding.

RECOMMENDATIONS:

High Priority:

Coordinate with PR 273 to avoid merge conflicts
Add test coverage for corresponding author functionality
Improve exception handling to catch specific exceptions

Medium Priority:
4. Verify font sizing consistency throughout the correspondence section
5. Add CHANGELOG.md entry documenting this bug fix

TESTING CHECKLIST:

All existing tests pass (37 passed, 9 skipped)
New tests added for corresponding author markers
New tests added for multiple corresponding authors
New tests added for email64 decoding
Manual verification with test manuscripts
CHANGELOG.md updated

CODE STYLE COMPLIANCE (per CLAUDE.md):

PEP 8 compliance: Code appears properly formatted
Self-documenting: Variable names are clear (corresponding_authors, corr_para, etc)
Security: Exception handling needs improvement
Test coverage: Missing tests for new functionality
Backward compatibility: Maintains existing behavior when no corresponding authors present

OVERALL ASSESSMENT:

This is a well-scoped bug fix that addresses a real gap in DOCX export functionality. The implementation is clean and follows existing patterns in the codebase.

However, the PR needs:

Critical: Coordination with PR 273 to avoid merge conflicts
Important: Test coverage for the new functionality
Nice-to-have: Improved exception handling

Recommendation: Add tests and coordinate merge order with PR 273, then this will be ready to merge.

Great work identifying and fixing this gap! The implementation is clean and follows the existing code patterns well.

github-actions · 2025-12-24T11:31:50Z

Code Review: Corresponding Author Indication in DOCX Export

Thank you for this well-implemented fix! The PR successfully addresses the missing corresponding author indication in DOCX exports.

Strengths

Code Quality & Consistency

Excellent reuse of existing infrastructure: Leverages AuthorAffiliationProcessor to extract corresponding authors, maintaining consistency with co-first author handling
Clean implementation: Follows the same pattern as co-first author markers, making it easy to understand and maintain
Proper formatting: Consistent use of font sizing and spacing matching existing affiliation formatting

Security & Robustness

Safe base64 handling with try-except block to gracefully handle malformed data
Input validation: Checks for empty email strings before attempting decode
Fallback behavior: Falls back to name-only display if email is unavailable

Feature Completeness

Dual indication: Adds both inline asterisk markers and detailed correspondence section
Multi-author support: Properly handles multiple corresponding authors with semicolon separation
Format parity: Brings DOCX export in line with PDF/LaTeX generation

Minor Observations & Suggestions

1. Overly Broad Exception Handling (Low Priority)
Location: docx_writer.py:345-348 - Consider catching specific exceptions (ValueError, UnicodeDecodeError, binascii.Error) and adding logger.warning for better debugging.

2. Missing Direct Unit Tests (Medium Priority)
While integration tests pass, add specific unit tests for: single corresponding author with email, multiple corresponding authors, missing email field fallback, and malformed base64 email handling.

3. DRY Principle Opportunity (Low Priority)
The base64 email decoding logic could be extracted to email_encoder.py as a decode_email64() helper if this pattern repeats.

Security Analysis

No security concerns identified: Base64 decoding is memory-safe, no SQL injection vectors, no XSS risk, no command injection, input validation present.

Performance

Efficient implementation: AuthorAffiliationProcessor extracts in single pass, no redundant loops, base64 decode is O(n) negligible.

Overall Assessment

Quality Score: 4.5/5
Recommendation: Approve with minor suggestions

This PR is production-ready. The suggestions above are optional improvements for follow-up PRs.

Checklist Against CLAUDE.md

✅ Follows PEP 8
✅ Uses type hints
✅ Validates inputs
✅ No over-engineering
✅ Security best practices
⚠️ Test coverage: Integration yes / Unit tests recommended
✅ Performance: No concerns

Great work on maintaining feature parity across export formats!

Copilot AI review requested due to automatic review settings December 23, 2025 09:06

Copilot started reviewing on behalf of paxcalpt December 23, 2025 09:07 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Add corresponding author indication to DOCX export #272

fix: Add corresponding author indication to DOCX export #272

Uh oh!

paxcalpt commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	corr_para.runs[-1].font.size = Pt(10)
	for run in corr_para.runs:
	run.font.size = Pt(10)

fix: Add corresponding author indication to DOCX export #272

Are you sure you want to change the base?

fix: Add corresponding author indication to DOCX export #272

Uh oh!

Conversation

paxcalpt commented Dec 23, 2025

Issue

Changes

Example Output

Testing

Related

Uh oh!

github-actions bot commented Dec 23, 2025

Code Review for PR #272: Add corresponding author indication to DOCX export

✅ Strengths

1. Code Quality

2. Security & Error Handling

3. Consistency

🔍 Observations & Minor Suggestions

1. Exception Handling (Line 281)

2. Font Size Application (Line 290)

3. Test Coverage

4. Base64 Decoding Consistency

🚀 Performance Considerations

📋 Testing Checklist

📝 Documentation

🎯 Final Recommendation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 23, 2025

Code Review: PR #272 - Corresponding Author Indication

Summary

✅ Strengths

🔍 Observations & Suggestions

🛡️ Security Considerations

🚀 Performance Considerations

🐛 Potential Issues

📊 Code Metrics

🎯 Final Recommendation

📋 Checklist

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 24, 2025

Code Review: Corresponding Author Indication in DOCX Export

Strengths

Minor Observations & Suggestions

Security Analysis

Performance

Overall Assessment

Checklist Against CLAUDE.md

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants