Skip to content

Conversation

@mlarribe
Copy link

@mlarribe mlarribe commented Oct 15, 2025

Context:

I was testing the feature Mongo.replace_one with the enabled params : upsert: true, retryable_writes: true when I found out that none of my document was getting updated despite the change on the documents every time from an extract to next the very first attempt to replace_one.

The digging

After some investigation I found that the issue was linked to the retryable_writes behaviour that was "using/re-using" the same sessions lsid and txnNumber.
From the documentation reusing the same lsid was supposed to be ok but not for txnNumber, I was surprised to discover that the lsid and txnNumber was staying exactly the same after multiple find_one followed by some minor change on the doc followed by an attempt to be Mongo.replace_one again. (the session_server was reused without increasing the txnNumber)

Thanks

Big heads up to log feature: log: &IO.inspect/1 that helped a start diagnose it even from a novice experience on the driver it self

I did some dummy test on own, and it solved my issue.
I may still have miss edge case so, Peers review will be very appreciated, I'm still very on the learning curve of the driver

…ion to prevent silent no-write operation

fix retryable_write faulty, fix the issue that prevent replace_one to occures on concecutive update on the same document when sessions is checkin and then reused again (same document or different does matter, it's stays silently blocked until the session expires)
@zookzook
Copy link
Owner

zookzook commented Nov 16, 2025

Thank you for providing a detailed description of your issue. Let's check the documentation that I found for the lsid txNumber:

The server requires each operation executed within a transaction to provide an lsid and txnNumber in its command
document. Each field is obtained from the ClientSession object passed to the operation from the application. Drivers
will be responsible for maintaining a monotonically increasing transaction number for each ServerSession used by a
ClientSession object. The txnNumber is incremented by the call to startTransaction and remains the same for all
commands in the transaction.

Drivers that pool ServerSessions MUST preserve the transaction number when reusing a server session from the pool with a
new ClientSession (this can be tracked as another property on the driver's object for the server session).

Drivers MUST ensure that each transaction specifies a transaction number larger than any previously used transaction
number for its session ID.

The driver increments the txNumber when call startTransation:

  def handle_call_event(:start_transaction, transaction, %Session{server_session: session} = data) when transaction in [:no_transaction, :transaction_aborted, :transaction_committed] do
    {:next_state, :starting_transaction, %Session{data | recovery_token: nil, server_session: ServerSession.next_txn_num(session)}, :ok}
  end

Your merge request would increment the txNumber two times in case of startTransaction. Could you provide an example and explain what do you expect and what the drive does? If would be easier for me the unterstand the issue better and to reproduce it, which is very important, so I can add a unit test for this issue.

@mlarribe
Copy link
Author

mlarribe commented Dec 2, 2025

I managed to reproduce the edge case in a simple without my fix

##Context: Cluster with Replication set to 3 
conn = :poll_test_mongo

Mongo.start_link([
  name: conn,
  url: url,
  pool_size: 100,
  socket_options: [:inet6]
])

id = 42

##Run example
my-app(78)> Mongo.insert_one(conn, "test", %{_id: id, val: 42, toto: "azerty"}) #insert document first
{:ok, %Mongo.InsertOneResult{acknowledged: true, inserted_id: 42}}
my-app(79)>
nil
my-app(80)> obj = Mongo.find_one(conn, "test", %{_id: id}, retryable_reads: true) #get the document
%{"_id" => 42, "toto" => "azerty", "val" => 42}
my-app(81)> transformed_object = obj |> Map.update("fieldA", 0, fn val -> val + 1 end) #prepare a modification of the document
%{"_id" => 42, "fieldA" => 0, "toto" => "azerty", "val" => 42}
my-app(82)> Mongo.replace_one(conn, "test", %{_id: id}, transformed_object, [upsert: true, retryable_writes: true]) #replace the doc
{:ok,
 %Mongo.UpdateResult{
   acknowledged: true,
   matched_count: 1,
   modified_count: 1,
   upserted_ids: []
 }}
my-app(83)>
nil
my-app(84)> obj = Mongo.find_one(conn, "test", %{_id: id}, retryable_reads: true) #[BREAK] turn out the doc didn't changed
%{"_id" => 42, "toto" => "azerty", "val" => 42}
my-app(85)>
nil
my-app(86)>
nil
my-app(87)>
nil
my-app(88)> obj = Mongo.find_one(conn, "test", %{_id: id}, retryable_reads: true)
%{"_id" => 42, "toto" => "azerty", "val" => 42}
my-app(89)> transformed_object = obj |> Map.update("fieldA", 0, fn val -> val + 1 end)
%{"_id" => 42, "fieldA" => 0, "toto" => "azerty", "val" => 42}
my-app(90)> Mongo.replace_one(conn, "test", %{_id: id}, transformed_object, [upsert: true, retryable_writes: true])
{:ok,
 %Mongo.UpdateResult{
   acknowledged: true,
   matched_count: 1,
   modified_count: 1,
   upserted_ids: []
 }}
my-app(91)>
nil
my-app(92)> obj = Mongo.find_one(conn, "test", %{_id: id}, retryable_reads: true) #[BREAK] still broken
%{"_id" => 42, "toto" => "azerty", "val" => 42}

On the test we can see that the document can't be updated what so ever till the sessions expires causing no update on the DB when ever we conduct replace_one several times with retry_writes after a first entrie.
From my investigation it was due to the reuse of session with a faulty non increasing of the counter (txnNumber) on the driver side.

So when a new query show-up it tries to use an active session but since it wasn't updated properly (not sure about the best spot for the increase were supposed to be done to eventually avoid this double increase that you mention) it re-serves the same sessions so basically same lsid with the same txnNumber so that Mongo behave as expect, knowing already the result for the combo indices so it returns the results without cares.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants