Skip to content

Add streaming interface to era transition for initial funds injection#5572

Merged
f-f merged 2 commits intomasterfrom
f-f/streaming-genesis-data1
Apr 16, 2026
Merged

Add streaming interface to era transition for initial funds injection#5572
f-f merged 2 commits intomasterfrom
f-f/streaming-genesis-data1

Conversation

@f-f
Copy link
Copy Markdown
Contributor

@f-f f-f commented Feb 11, 2026

Description

This patch implements the ground work for #5549, to support data injection from genesis files using a streaming framework rather than relying on lazy IO.

The spec is the one that @lehins laid out in the issue, but here we only implement streaming for initialFunds, as the rest can be easily added after the approach is solid enough.

I have a few code-specific notes that I will add as comments on the diff.

Checklist

  • Commits in meaningful sequence and with useful messages.
  • Tests added or updated when needed.
  • CHANGELOG.md files updated for packages with externally visible changes.
    NOTE: New section is never added with the code changes. (See RELEASING.md).
  • Versions updated in .cabal and CHANGELOG.md files when necessary, according to the
    versioning process.
  • Version bounds in .cabal files updated when necessary.
    NOTE: If bounds change in a cabal file, that package itself must have a version increase. (See RELEASING.md).
  • Code formatted (use scripts/fourmolize.sh).
  • Cabal files formatted (use scripts/cabal-format.sh).
  • CDDL files are up to date (use scripts/gen-cddl.sh)
  • hie.yaml updated (use scripts/gen-hie.sh).
  • Self-reviewed the diff.

@f-f f-f requested a review from a team as a code owner February 11, 2026 14:03
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Transition.hs Outdated
@f-f f-f force-pushed the f-f/streaming-genesis-data1 branch from 3e9ddc0 to a949d55 Compare February 11, 2026 14:11
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread libs/cardano-ledger-core/cardano-ledger-core.cabal Outdated
@f-f
Copy link
Copy Markdown
Contributor Author

f-f commented Feb 11, 2026

The elephant in the room is "tests": currently there are no tests, and I'm not sure what kind of tests we would like for this.

I have a few scripts that I used to verify that:

  1. the incremental hashing is correct against the standard hashing
  2. the streaming code is properly streaming

The tests could be some variation of these, feedback welcome:

Hash checking, foldInjectionSource fails if the hash doesn't match (obviously the tests for this should go in base since the hashing code goes there, but if we merge it here first then it makes sense to have some tests for that here)

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeApplications #-}

module Main where

import Cardano.Crypto.Hash (Blake2b_256)
import qualified Cardano.Crypto.Hash.Class as Hash
import Cardano.Crypto.Libsodium (sodiumInit)
import Cardano.Ledger.Shelley.Genesis (InjectionSource (..), foldInjectionSource)
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BS8
import Data.Text (Text)
import System.Directory (removeFile)
import System.IO (BufferMode (..), IOMode (..), hSetBuffering, withFile)

main :: IO ()
main = do
  sodiumInit
  let path = "/tmp/hash-check.json"
      n = 1000 :: Int

  generateJsonFile path n

  fileHash <- Hash.hashWith @Blake2b_256 id <$> BS.readFile path

  let source = FromFile path fileHash :: InjectionSource Text Text
      wfh fp k = withFile fp ReadMode k
  count <- foldInjectionSource wfh source (\(!acc) _ -> acc + 1 :: Int) 0

  removeFile path
  putStrLn $ "Entries: " <> show count <> " (expected " <> show n <> ")"

generateJsonFile :: FilePath -> Int -> IO ()
generateJsonFile path n =
  withFile path WriteMode $ \h -> do
    hSetBuffering h (BlockBuffering (Just (64 * 1024)))
    BS8.hPut h "{"
    mapM_
      ( \i -> do
          let comma = if i == (1 :: Int) then "" else ","
          BS8.hPut h $ comma <> "\"key_" <> BS8.pack (show i) <> "\": \"val_" <> BS8.pack (show i) <> "\""
      )
      [1 .. n]
    BS8.hPut h "}"

Streaming check: this creates a file with 3GB of json, then reads it back with streaming.
Limit the heap when running, e.g. to 300MB - program should fail if trying to read the file all at once: cabal run streaming-check -- +RTS -s -M300M

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE NumericUnderscores #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeApplications #-}

module Main where

import Cardano.Crypto.Hash (Blake2b_256)
import qualified Cardano.Crypto.Hash.Class as Hash
import Cardano.Crypto.Libsodium (sodiumInit)
import qualified Cardano.Ledger.Crypto.Blake2b.Incremental as Blake2b
import Cardano.Ledger.Shelley.Genesis (InjectionSource (..), foldInjectionSource)
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BS8
import Data.Text (Text)
import System.Directory (removeFile)
import System.IO (BufferMode (..), IOMode (..), hSetBuffering, withFile)

main :: IO ()
main = do
  sodiumInit
  let path = "/tmp/streaming-check.json"
      n = 100_000_000 :: Int

  putStrLn $ "Generating " <> show n <> " entries..."
  fileHash <- generateAndHash path n

  putStrLn "Streaming fold..."
  let source = FromFile path fileHash :: InjectionSource Text Text
      wfh fp k = withFile fp ReadMode k
  count <- foldInjectionSource wfh source (\(!acc) _ -> acc + 1 :: Int) 0

  removeFile path
  putStrLn $ "Entries: " <> show count <> " (expected " <> show n <> ")"

generateAndHash :: FilePath -> Int -> IO (Hash.Hash Blake2b_256 BS.ByteString)
generateAndHash path n = do
  ctx <- Blake2b.blake2bInit @32
  withFile path WriteMode $ \h -> do
    hSetBuffering h (BlockBuffering (Just (64 * 1024)))
    let w bs = BS8.hPut h bs >> Blake2b.blake2bUpdate ctx bs
    w "{"
    mapM_
      ( \i -> do
          let comma = if i == (1 :: Int) then "" else ","
          w $ comma <> "\"key_" <> BS8.pack (show i) <> "\": \"val_" <> BS8.pack (show i) <> "\""
      )
      [1 .. n]
    w "}"
  Blake2b.blake2b256Finalize ctx

Cabal stanzas (to go in the shelley cabal file):

executable hash-check
  main-is: HashCheck.hs
  hs-source-dirs: app
  default-language: Haskell2010
  ghc-options:
    -Wall
    -Wcompat
    -Wincomplete-record-updates
    -Wincomplete-uni-patterns
    -Wredundant-constraints
    -Wpartial-fields
    -Wunused-packages
    -threaded
    -rtsopts
    -with-rtsopts=-N

  build-depends:
    base >=4.18 && <5,
    bytestring,
    cardano-crypto-class,
    cardano-ledger-shelley,
    directory,
    text,

executable streaming-check
  main-is: StreamingCheck.hs
  hs-source-dirs: app
  default-language: Haskell2010
  ghc-options:
    -Wall
    -Wcompat
    -Wincomplete-record-updates
    -Wincomplete-uni-patterns
    -Wredundant-constraints
    -Wpartial-fields
    -Wunused-packages
    -threaded
    -rtsopts
    -with-rtsopts=-N

  build-depends:
    base >=4.18 && <5,
    bytestring,
    cardano-crypto-class,
    cardano-ledger-core,
    cardano-ledger-shelley,
    directory,
    text,

Comment thread libs/cardano-ledger-core/src/Cardano/Ledger/Crypto/Blake2b/Incremental.hs Outdated
Comment thread libs/cardano-ledger-core/src/Cardano/Ledger/Crypto/Blake2b/Incremental.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Copy link
Copy Markdown
Collaborator

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some preliminary feedback.
So far this is looking really good, despite all my pivoting suggestions 👍

Comment thread libs/cardano-ledger-core/src/Cardano/Ledger/Crypto/Blake2b/Incremental.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Transition.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
@f-f f-f force-pushed the f-f/streaming-genesis-data1 branch 3 times, most recently from 254646e to bbc0ada Compare March 10, 2026 19:18
@f-f
Copy link
Copy Markdown
Contributor Author

f-f commented Mar 10, 2026

I think I have resolved all the review comments, so this should be good for another look!

Moreover: we discussed some time ago about how to test this, so I have added a test that uses weigh to check that the memory does not blow up when streaming data with foldInjectionData

Copy link
Copy Markdown
Collaborator

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful!

Copy link
Copy Markdown
Collaborator

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome piece of work! Thank you!
Looking forward to reviewing follow up PRs.

Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Genesis.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Transition.hs Outdated
Comment thread eras/shelley/impl/src/Cardano/Ledger/Shelley/Transition.hs Outdated
Comment thread eras/shelley/impl/test/memory/Main.hs Outdated
Comment thread eras/shelley/impl/test/memory/Main.hs Outdated
Comment thread eras/shelley/impl/test/memory/Main.hs
Comment thread eras/shelley/impl/test/memory/Main.hs Outdated
Comment thread eras/shelley/impl/test/memory/Main.hs Outdated
Comment thread eras/shelley/impl/test/memory/Main.hs Outdated
@f-f f-f force-pushed the f-f/streaming-genesis-data1 branch 2 times, most recently from 09844e1 to ff215f8 Compare April 15, 2026 21:13
@f-f f-f force-pushed the f-f/streaming-genesis-data1 branch 4 times, most recently from 0c61c64 to 8cbe918 Compare April 16, 2026 13:28
@lehins lehins force-pushed the f-f/streaming-genesis-data1 branch from 8cbe918 to 6145cc0 Compare April 16, 2026 16:08
@f-f f-f merged commit 8399814 into master Apr 16, 2026
51 of 53 checks passed
@f-f f-f deleted the f-f/streaming-genesis-data1 branch April 16, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants