Document AI contribution rules and modernise hashing + test suite in Chronicle-Algorithms #32

peter-lawrey · 2025-11-20T18:20:16Z

This PR adds contributor guidance (for humans and AI tools), replaces the Markdown README with a richer AsciiDoc version, bumps the shared BOM, and refactors core hashing and test code to be more robust and maintainable while preserving behaviour.

Functional changes

Hashing implementations

CityHash 1.1 (CityHash_1_1)
- Replaces the static NATIVE_CITY singleton with a nativeCity() helper that selects the appropriate implementation based on endianness, improving clarity while keeping behaviour the same.
- Refactors the cityHash64 long-path logic to:
  - Remove unused intermediate variables.
  - Make the data flow around vFirst/vSecond, wFirst/wSecond, x, y, and z clearer and more obviously aligned with the reference algorithm.
- AsLongHashFunctionSeeded.voidHash is now a non-transient field, so the precomputed hash for empty input is preserved across serialisation. Previously, it was recomputed or defaulted after deserialisation; now serialised instances retain their original voidHash value.
MurmurHash3 (MurmurHash_3)
- Replaces the static NATIVE_MURMUR with a nativeMurmur() helper, mirroring the CityHash change and centralising native-endianness selection.
- Simplifies the main hash implementation:
  - Uses a blockOffset variable inside the 16-byte loop for clearer offset management.
  - Extracts tail handling into dedicated methods tailK1(...) and tailK2(...), each with a compact switch that is explicitly marked with @SuppressWarnings("fallthrough").
  - This cleans up the large tail switch and reduces duplication while keeping the hash output stable.
- As with CityHash, AsLongHashFunctionSeeded.voidHash is no longer transient, so empty-input hash precomputation survives serialisation.
xxHash (XxHash_r39)
- Replaces static NATIVE_XX with a nativeXx() helper to choose the correct endianness implementation.
- Refactors hashLong, hashInt, and hashShort to:
  - Call nativeXx().toLittleEndian(...) explicitly.
  - Make the mixing order in hashLong clearer (compute mixed value, then initialise hash, then fold).
- These changes are intended to maintain bit-for-bit compatibility while improving readability and making the endianness boundary more explicit.

All three hashers are still expected to be bit-for-bit compatible with the original implementations on all supported platforms. The new test fixtures (see below) provide reference data to guard against this.

Locking behaviour test

LockingStrategyTest
- In testUpdateLockUpgradeToWriteLock, the write-lock downgrade on the first executor is now awaited via .get():
```
e1.submit(() -> rwuls().downgradeWriteToUpdateLock()).get();
```
- This ensures the downgrade completes before subsequent assertions, reducing the chance of flaky behaviour due to out-of-order task completion in multi-threaded tests.

Non-functional changes

New contributor guidance

AGENTS.md
- New, company-wide guidance for AI agents, bots and humans contributing to Chronicle / OpenHFT projects:
  - Language and character-set rules:
    - British English spelling (with reasonable exceptions for technical US spellings).
    - ISO-8859-1 (0–255) outside string literals; no Unicode “smart” punctuation.
  - Javadoc guidelines emphasising:
    - Behavioural contracts, edge cases, thread-safety and performance details.
    - Avoiding boilerplate comments that simply restate the method signature.
  - Build and test requirements:
    - mvn -q verify must pass before raising a PR.
  - Commit/PR etiquette and what to ask reviewers (traceability to decisions, requirements, docs).
  - Real-time documentation practices (keeping AsciiDoc, tests and code in sync).
  - AI-agent-specific rules (avoid redundancy, always review AI output, keep to ISO-8859-1, etc.).
  - Company-wide tag taxonomy (FN, NF-P, NF-S, NF-O, TEST, DOC, OPS, UX, RISK, ALL-*) and a standard decision-record template.
CLAUDE.md
- Repository-local guide for Claude Code:
  - Overview of Chronicle-Algorithms (hashing, bit sets, bytes access, off-heap locks).
  - Build, test and coverage targets (JaCoCo line/branch coverage).
  - Architecture summary:
    - Hashing via LongHashFunction.
    - Access/Accessor abstraction for raw bytes.
    - BitSet / frames / reusable wrappers.
    - Off-heap locking strategies and state interfaces.
  - Dependency overview and test stack (JUnit 4/5, Mockito).

These documents are documentation only and do not affect runtime behaviour, but they should materially improve the quality and consistency of future changes.

Documentation overhaul

README.md → README.adoc
- Replace the short Markdown README with a comprehensive AsciiDoc README:
  - Adds project description, features, and usage examples for:
    - Hashing (LongHashFunction usage on Strings, primitives, arrays, and off-heap memory).
    - Bit sets (BitSetFrame, ReusableBitSet, SingleThreadedFlatBitSetFrame, ConcurrentFlatBitSetFrame).
    - Bytes access (Access/Accessor pattern and supported sources).
    - Locking strategies (read/update/write semantics).
  - Documents Maven coordinates and points to Maven Central for current version.
  - Adds standard badges for Maven Central, Javadoc and Apache 2.0 licence.
  - Includes build instructions and links to related Chronicle projects.
- This aligns the project’s entry-point documentation with other Chronicle modules and better reflects current capabilities.

Dependency alignment

pom.xml
- Bump net.openhft:third-party-bom from 3.27ea5 to 3.27ea7.
- This keeps third-party dependencies in sync with the broader OpenHFT stack. No direct code is changed in this module to rely on new versions, but transitive versions (e.g., test and logging libraries) may move forward.

Code quality and style

General clean-ups
- MemoryUnit, SingleThreadedFlatBitSetFrame: comment alignment and section markers for readability only.
- HotSpotStringAccessor.handle:
  - Removes an unnecessary (T) cast:
```
return MEMORY.getObject(source, valueOffset);
```
  - Behaviour is unchanged; the method still returns the underlying backing array for the String on supported HotSpot versions.
- LongHashFunction: adjust annotation placement on array parameters (e.g. boolean @NotNull [] input) to match modern style and static-analysis expectations.
- TryAcquireOperations: replace anonymous inner classes with concise method references (e.g. LockingStrategy::tryLock), without changing semantics.

Test and tooling improvements

Hashing reference data
- Add reference Java files under src/test/resources/net/openhft/chronicle/algo/hashing/reference:
  - City64_1_1_Test.java
  - XxHashTest.java
- These contain known-good output arrays for:
  - CityHash64 with/without seeds.
  - xxHash with/without seed.
- They serve as reference fixtures for validating our implementations against upstream C reference code (and help justify the refactors to tail and endianness handling).
JUnit modernisation
- Migrate older JUnit 3 style tests to JUnit 5 Jupiter:
  - Remove extends TestCase.
  - Use @BeforeEach / @Test from Jupiter.
  - Replace assertEquals("msg", expected, actual) and instanceof checks with:
    - assertTrue, assertFalse, assertEquals from org.junit.jupiter.api.Assertions.
    - assertInstanceOf where appropriate.
- Updated classes include (non-exhaustive list):
  - BitSetTest
  - ConcurrentFlatBitSetFrameTest
  - DirectBitSetTest
  - FlatBitSetAlgorithmTest
  - ReusableBitSetTest
  - AccessTest
  - AccessUtilitiesTest
  - ByteBufferAccessTest
  - CharSequenceAccessTest
  - RandomDataInputAccessTest
  - RandomDataOutputAccessTest
  - WriteAccessTest (assumptions now use org.junit.jupiter.api.Assumptions.assumeFalse)
Stronger test assertions
- ReusableBitSetTest:
  - When iterating BitSet.Bits, now asserts that the number of visited bits equals bs.cardinality(), not just that the iterator runs. This gives stronger coverage of the iterator implementation.
- AccessUtilitiesTest:
  - Simplifies casting of BytesStore in Full access tests.
  - Uses assertInstanceOf instead of instanceof checks for HotSpotStringAccessor handles.
- LockingStrategyTest:
  - As mentioned above, the write-lock downgrade is now awaited, tightening the concurrency semantics of the test.
Randomness and performance in tests
- City64MoreTest, MurmurHash3MoreTest, MurmurHash3Test:
  - Switch from SecureRandom/Random to ThreadLocalRandom for generating test data, reducing overhead in long-running randomness tests while maintaining sufficient randomness for statistical checks.
  - Average score and timing prints now use doubles (scoreSum / 500.0, (time / (double) timeCount) / 1e3) for improved accuracy.
Misc test code hygiene
- Make inner classes such as RandomDataOutputAccessTest.RandomDataOutputImpl static to avoid accidental implicit references.
- Remove unused fields and parameters in fixture classes (e.g. unused offset in BitSetFixture).
- Minor comment and generics clean-ups (List<Object[]> instead of raw ArrayList in parameterised tests, escape #include <stdlib.h> as <...> in Javadoc).

Risk & compatibility

Core hashing algorithms (CityHash, MurmurHash3, xxHash) have been refactored but are expected to remain bit-for-bit compatible. The new reference data and existing tests should catch deviations.
Making voidHash non-transient in seeded hash implementations is a behaviourally observable change for serialised instances:
- New serialisations will preserve voidHash across serialise/deserialise cycles.
- Old serialised instances that assumed re-computation at deserialisation time may now behave more predictably, but this is a niche edge case.
The BOM bump may change transitive dependency versions, but there are no intentional API-level changes in this module to depend on new behaviour.
All other changes are documentation, test, or style-only and should not affect production runtime behaviour.

…nts, method references, and variable declarations

…tion

…detailed method descriptions and clarifying purpose

…w, and testing strategy documentation

…arifying purpose and usage

…s, outlining project overview, build commands, code architecture, and development conventions

…umentation for assertion utilities and benchmarks

…eadability

… cleaner code

peter-lawrey added 3 commits November 19, 2025 10:08

Refactor code for clarity and consistency, including updates to comme…

c702535

…nts, method references, and variable declarations

Add documentation files for AI agents and project overview

552335a

Add parameterized tests for XxHash functionality

43208dd

peter-lawrey changed the title ~~Refactor code for clarity and consistency, including updates to comme…~~ Document AI contribution rules and modernise hashing + test suite in Chronicle-Algorithms Nov 20, 2025

peter-lawrey requested review from james-mcsherry and tgd November 20, 2025 19:58

peter-lawrey added 2 commits November 20, 2025 20:14

Add architectural decision records and project requirements documenta…

9536fcd

…tion

Update .gitignore to include additional IDE and temporary files

cdda006

james-mcsherry approved these changes Nov 25, 2025

View reviewed changes

peter-lawrey added 8 commits November 28, 2025 09:40

Code Analysis fixes

e3b9066

Enhance documentation for AddressWrapper and related classes, adding …

8496347

…detailed method descriptions and clarifying purpose

Add architecture overview, decision log, requirements, security revie…

0f94d37

…w, and testing strategy documentation

Improve documentation for BitSet algorithms and access strategies, cl…

3a888fe

…arifying purpose and usage

Add initial documentation for AGENTS, CLAUDE, GEMINI, and README file…

75dfc2c

…s, outlining project overview, build commands, code architecture, and development conventions

Refactor MemoryUnit and HashTestSupport classes; add package-info doc…

b59e95f

…umentation for assertion utilities and benchmarks

Remove unnecessary blank lines in various classes for improved code r…

acf4dd7

…eadability

Remove commented-out import statement in ChronicleStampedLockTest for…

d0df9ce

… cleaner code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document AI contribution rules and modernise hashing + test suite in Chronicle-Algorithms #32

Document AI contribution rules and modernise hashing + test suite in Chronicle-Algorithms #32

Uh oh!

peter-lawrey commented Nov 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Document AI contribution rules and modernise hashing + test suite in Chronicle-Algorithms #32

Are you sure you want to change the base?

Document AI contribution rules and modernise hashing + test suite in Chronicle-Algorithms #32

Uh oh!

Conversation

peter-lawrey commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Functional changes

Hashing implementations

Locking behaviour test

Non-functional changes

New contributor guidance

Documentation overhaul

Dependency alignment

Code quality and style

Test and tooling improvements

Risk & compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peter-lawrey commented Nov 20, 2025 •

edited

Loading