Skip to content

Conversation

@peter-lawrey
Copy link
Member

@peter-lawrey peter-lawrey commented Nov 20, 2025

This PR adds contributor guidance (for humans and AI tools), replaces the Markdown README with a richer AsciiDoc version, bumps the shared BOM, and refactors core hashing and test code to be more robust and maintainable while preserving behaviour.


Functional changes

Hashing implementations

  • CityHash 1.1 (CityHash_1_1)

    • Replaces the static NATIVE_CITY singleton with a nativeCity() helper that selects the appropriate implementation based on endianness, improving clarity while keeping behaviour the same.

    • Refactors the cityHash64 long-path logic to:

      • Remove unused intermediate variables.
      • Make the data flow around vFirst/vSecond, wFirst/wSecond, x, y, and z clearer and more obviously aligned with the reference algorithm.
    • AsLongHashFunctionSeeded.voidHash is now a non-transient field, so the precomputed hash for empty input is preserved across serialisation. Previously, it was recomputed or defaulted after deserialisation; now serialised instances retain their original voidHash value.

  • MurmurHash3 (MurmurHash_3)

    • Replaces the static NATIVE_MURMUR with a nativeMurmur() helper, mirroring the CityHash change and centralising native-endianness selection.

    • Simplifies the main hash implementation:

      • Uses a blockOffset variable inside the 16-byte loop for clearer offset management.
      • Extracts tail handling into dedicated methods tailK1(...) and tailK2(...), each with a compact switch that is explicitly marked with @SuppressWarnings("fallthrough").
      • This cleans up the large tail switch and reduces duplication while keeping the hash output stable.
    • As with CityHash, AsLongHashFunctionSeeded.voidHash is no longer transient, so empty-input hash precomputation survives serialisation.

  • xxHash (XxHash_r39)

    • Replaces static NATIVE_XX with a nativeXx() helper to choose the correct endianness implementation.

    • Refactors hashLong, hashInt, and hashShort to:

      • Call nativeXx().toLittleEndian(...) explicitly.
      • Make the mixing order in hashLong clearer (compute mixed value, then initialise hash, then fold).
    • These changes are intended to maintain bit-for-bit compatibility while improving readability and making the endianness boundary more explicit.

All three hashers are still expected to be bit-for-bit compatible with the original implementations on all supported platforms. The new test fixtures (see below) provide reference data to guard against this.

Locking behaviour test

  • LockingStrategyTest

    • In testUpdateLockUpgradeToWriteLock, the write-lock downgrade on the first executor is now awaited via .get():

      e1.submit(() -> rwuls().downgradeWriteToUpdateLock()).get();
    • This ensures the downgrade completes before subsequent assertions, reducing the chance of flaky behaviour due to out-of-order task completion in multi-threaded tests.


Non-functional changes

New contributor guidance

  • AGENTS.md

    • New, company-wide guidance for AI agents, bots and humans contributing to Chronicle / OpenHFT projects:

      • Language and character-set rules:

        • British English spelling (with reasonable exceptions for technical US spellings).
        • ISO-8859-1 (0–255) outside string literals; no Unicode “smart” punctuation.
      • Javadoc guidelines emphasising:

        • Behavioural contracts, edge cases, thread-safety and performance details.
        • Avoiding boilerplate comments that simply restate the method signature.
      • Build and test requirements:

        • mvn -q verify must pass before raising a PR.
      • Commit/PR etiquette and what to ask reviewers (traceability to decisions, requirements, docs).

      • Real-time documentation practices (keeping AsciiDoc, tests and code in sync).

      • AI-agent-specific rules (avoid redundancy, always review AI output, keep to ISO-8859-1, etc.).

      • Company-wide tag taxonomy (FN, NF-P, NF-S, NF-O, TEST, DOC, OPS, UX, RISK, ALL-*) and a standard decision-record template.

  • CLAUDE.md

    • Repository-local guide for Claude Code:

      • Overview of Chronicle-Algorithms (hashing, bit sets, bytes access, off-heap locks).

      • Build, test and coverage targets (JaCoCo line/branch coverage).

      • Architecture summary:

        • Hashing via LongHashFunction.
        • Access/Accessor abstraction for raw bytes.
        • BitSet / frames / reusable wrappers.
        • Off-heap locking strategies and state interfaces.
      • Dependency overview and test stack (JUnit 4/5, Mockito).

These documents are documentation only and do not affect runtime behaviour, but they should materially improve the quality and consistency of future changes.

Documentation overhaul

  • README.mdREADME.adoc

    • Replace the short Markdown README with a comprehensive AsciiDoc README:

      • Adds project description, features, and usage examples for:

        • Hashing (LongHashFunction usage on Strings, primitives, arrays, and off-heap memory).
        • Bit sets (BitSetFrame, ReusableBitSet, SingleThreadedFlatBitSetFrame, ConcurrentFlatBitSetFrame).
        • Bytes access (Access/Accessor pattern and supported sources).
        • Locking strategies (read/update/write semantics).
      • Documents Maven coordinates and points to Maven Central for current version.

      • Adds standard badges for Maven Central, Javadoc and Apache 2.0 licence.

      • Includes build instructions and links to related Chronicle projects.

    • This aligns the project’s entry-point documentation with other Chronicle modules and better reflects current capabilities.

Dependency alignment

  • pom.xml

    • Bump net.openhft:third-party-bom from 3.27ea5 to 3.27ea7.
    • This keeps third-party dependencies in sync with the broader OpenHFT stack. No direct code is changed in this module to rely on new versions, but transitive versions (e.g., test and logging libraries) may move forward.

Code quality and style

  • General clean-ups

    • MemoryUnit, SingleThreadedFlatBitSetFrame: comment alignment and section markers for readability only.

    • HotSpotStringAccessor.handle:

      • Removes an unnecessary (T) cast:

        return MEMORY.getObject(source, valueOffset);
      • Behaviour is unchanged; the method still returns the underlying backing array for the String on supported HotSpot versions.

    • LongHashFunction: adjust annotation placement on array parameters (e.g. boolean @NotNull [] input) to match modern style and static-analysis expectations.

    • TryAcquireOperations: replace anonymous inner classes with concise method references (e.g. LockingStrategy::tryLock), without changing semantics.


Test and tooling improvements

  • Hashing reference data

    • Add reference Java files under src/test/resources/net/openhft/chronicle/algo/hashing/reference:

      • City64_1_1_Test.java
      • XxHashTest.java
    • These contain known-good output arrays for:

      • CityHash64 with/without seeds.
      • xxHash with/without seed.
    • They serve as reference fixtures for validating our implementations against upstream C reference code (and help justify the refactors to tail and endianness handling).

  • JUnit modernisation

    • Migrate older JUnit 3 style tests to JUnit 5 Jupiter:

      • Remove extends TestCase.

      • Use @BeforeEach / @Test from Jupiter.

      • Replace assertEquals("msg", expected, actual) and instanceof checks with:

        • assertTrue, assertFalse, assertEquals from org.junit.jupiter.api.Assertions.
        • assertInstanceOf where appropriate.
    • Updated classes include (non-exhaustive list):

      • BitSetTest
      • ConcurrentFlatBitSetFrameTest
      • DirectBitSetTest
      • FlatBitSetAlgorithmTest
      • ReusableBitSetTest
      • AccessTest
      • AccessUtilitiesTest
      • ByteBufferAccessTest
      • CharSequenceAccessTest
      • RandomDataInputAccessTest
      • RandomDataOutputAccessTest
      • WriteAccessTest (assumptions now use org.junit.jupiter.api.Assumptions.assumeFalse)
  • Stronger test assertions

    • ReusableBitSetTest:

      • When iterating BitSet.Bits, now asserts that the number of visited bits equals bs.cardinality(), not just that the iterator runs. This gives stronger coverage of the iterator implementation.
    • AccessUtilitiesTest:

      • Simplifies casting of BytesStore in Full access tests.
      • Uses assertInstanceOf instead of instanceof checks for HotSpotStringAccessor handles.
    • LockingStrategyTest:

      • As mentioned above, the write-lock downgrade is now awaited, tightening the concurrency semantics of the test.
  • Randomness and performance in tests

    • City64MoreTest, MurmurHash3MoreTest, MurmurHash3Test:

      • Switch from SecureRandom/Random to ThreadLocalRandom for generating test data, reducing overhead in long-running randomness tests while maintaining sufficient randomness for statistical checks.
      • Average score and timing prints now use doubles (scoreSum / 500.0, (time / (double) timeCount) / 1e3) for improved accuracy.
  • Misc test code hygiene

    • Make inner classes such as RandomDataOutputAccessTest.RandomDataOutputImpl static to avoid accidental implicit references.
    • Remove unused fields and parameters in fixture classes (e.g. unused offset in BitSetFixture).
    • Minor comment and generics clean-ups (List<Object[]> instead of raw ArrayList in parameterised tests, escape #include <stdlib.h> as &lt;...&gt; in Javadoc).

Risk & compatibility

  • Core hashing algorithms (CityHash, MurmurHash3, xxHash) have been refactored but are expected to remain bit-for-bit compatible. The new reference data and existing tests should catch deviations.

  • Making voidHash non-transient in seeded hash implementations is a behaviourally observable change for serialised instances:

    • New serialisations will preserve voidHash across serialise/deserialise cycles.
    • Old serialised instances that assumed re-computation at deserialisation time may now behave more predictably, but this is a niche edge case.
  • The BOM bump may change transitive dependency versions, but there are no intentional API-level changes in this module to depend on new behaviour.

  • All other changes are documentation, test, or style-only and should not affect production runtime behaviour.

@peter-lawrey peter-lawrey changed the title Refactor code for clarity and consistency, including updates to comme… Document AI contribution rules and modernise hashing + test suite in Chronicle-Algorithms Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants