Skip to content

fix(provider): eliminate data race on concurrent init failure in Provider.Get()#222

Open
MayankSharmaCSE wants to merge 2 commits into
hyperledger:mainfrom
MayankSharmaCSE:fix/provider-data-race
Open

fix(provider): eliminate data race on concurrent init failure in Provider.Get()#222
MayankSharmaCSE wants to merge 2 commits into
hyperledger:mainfrom
MayankSharmaCSE:fix/provider-data-race

Conversation

@MayankSharmaCSE

Copy link
Copy Markdown
Contributor

Type of change

  • Bug fix
  • Test update

Description

Provider.Get() wrote to p.instance and p.err inside sync.Once.Do but read them outside the closure. A goroutine that lost the initialization race could read stale zero values before the writing goroutine's stores were visible , a data race on the failure path.

Fixed by consolidating p.instance and p.err into a single initResult[T] struct field. A single struct assignment inside Do is safe because sync.Once guarantees a happens-before edge between the closure completing and all subsequent returns from Do.

Additional details

Added TestProvider_Get_ThreadSafety_InitFailure , 20 goroutines racing on a failing init, all asserting they observe the error (not nil). All tests pass clean under -race.

Related issues

Closes #221

The notification dispatcher in NotificationClient.listen used a bare
non-blocking send with a  comment, silently losing
status events whenever the subscriber buffer was full or the receiver
had given up. Because the txID is removed from the subscribers map
before delivery is attempted, a dropped event is unrecoverable — and
indistinguishable from a genuine committer timeout from the caller's
perspective.

Signed-off-by: mayanksharmaCSE <mayanksharmacse1@gmail.com>
…ider.Get()

Signed-off-by: mayanksharmaCSE <mayanksharmacse1@gmail.com>
Copilot AI review requested due to automatic review settings May 1, 2026 15:56

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to address a reported thread-safety issue in Provider.Get() during concurrent initialization failures, and adds/updates tests around concurrent behavior. It also refactors notification dispatching in the notification client and adds drop observability.

Changes:

  • Consolidate Provider’s cached init outputs into a single initResult[T] field written within sync.Once.Do.
  • Add a concurrent init-failure test for Provider.Get() to ensure all racing goroutines observe the validation error.
  • Refactor NotificationClient dispatch into helper methods and add dropped-notification counting + tests.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tools/fxconfig/internal/provider/provider.go Stores init results in a single struct and returns from it after sync.Once.Do.
tools/fxconfig/internal/provider/provider_test.go Adds a concurrency test covering init/validation failure under race.
tools/fxconfig/internal/client/notifications.go Extracts dispatch/collection helpers; adds dropped-notification counter, logging, and accessor.
tools/fxconfig/internal/client/notifications_test.go Adds unit tests for dispatch/collection behavior and drop counting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +41 to 46
// droppedNotifications counts status events that the dispatcher could not
// deliver because the subscriber's channel buffer was full or the receiver
// had already given up. Exposed via DroppedNotifications for diagnostics
// and tests.
droppedNotifications atomic.Uint64
}
Comment on lines 47 to +56
func (p *Provider[T, K]) Get() (T, error) {
p.once.Do(func() {
if err := p.cfg.Validate(p.validationContext); err != nil {
p.err = err
p.result = initResult[T]{err: err}
return
}
p.instance, p.err = p.factory(p.cfg)
inst, err := p.factory(p.cfg)
p.result = initResult[T]{instance: inst, err: err}
})
return p.instance, p.err
return p.result.instance, p.result.err
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data race: unsynced read of p.err outside sync.Once in Provider.Get()

2 participants