Skip to content

Add retry logic to SendTCP in e2e framework#4465

Open
markmandel wants to merge 3 commits intoagones-dev:mainfrom
markmandel:e2e/tcp-redundancy
Open

Add retry logic to SendTCP in e2e framework#4465
markmandel wants to merge 3 commits intoagones-dev:mainfrom
markmandel:e2e/tcp-redundancy

Conversation

@markmandel
Copy link
Collaborator

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug

/kind cleanup

/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:

Not sure if this will solve the issues with Autopilot, but this at least gives us a step in the right direction.

Which issue(s) this PR fixes:

Work on #4464

Special notes for your reviewer:

N/A

Not sure if this will solve the issues with Autopilot, but this at least
gives us a step in the right direction.

Work on agones-dev#4464
@markmandel markmandel added the area/tests Unit tests, e2e tests, anything to make sure things don't break label Feb 28, 2026
@github-actions github-actions bot added kind/cleanup Refactoring code, fixing up documentation, etc size/S labels Feb 28, 2026
// make sure the container value points to a valid container
if !gss.HasContainer(gss.Container, false) {
allErrs = append(allErrs, field.Invalid(fldPath.Child("container"), gss.Container, "Could not find a container named " + gss.Container))
allErrs = append(allErrs, field.Invalid(fldPath.Child("container"), gss.Container, "Could not find a container named "+gss.Container))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My updated linter was failing on this one, so this got blended in.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 01a4b98e-524c-43fb-99c4-376c85bea92a

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Collaborator Author

Yeah there is something definitely wrong with the Autopilot clusters, even with the retry logic.

=== RUN   TestGameServerTcpUdpProtocol
=== PAUSE TestGameServerTcpUdpProtocol
=== CONT  TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:17.220" level=info msg="GameServer created, waiting for Ready" gs=game-servermssq6 test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:17.369" level=info msg="Waiting for states to match" awaitingState=Ready currentState=Creating gs=game-servermssq6 test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:18.353" level=info msg="Waiting for states to match" awaitingState=Ready currentState=Starting gs=game-servermssq6 test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:19.355" level=info msg="Waiting for states to match" awaitingState=Ready currentState=Starting gs=game-servermssq6 test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:20.354" level=info msg="Waiting for states to match" awaitingState=Ready currentState=RequestReady gs=game-servermssq6 test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:21.355" level=info msg="GameServer states match" awaitingState=Ready currentState=Ready gs=game-servermssq6 test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:21.356" level=info msg="GameServer Ready" gs=game-servermssq6
time="2026-02-28 18:00:21.356" level=info msg="GameServer created, sending UDP ping" name=game-servermssq6
time="2026-02-28 18:00:21.477" level=info msg="UDP ping passed, sending TCP ping" name=game-servermssq6
time="2026-02-28 18:00:21.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:23.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:25.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:27.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:29.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:31.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:33.598" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:35.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:37.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:39.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:41.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:43.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:45.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:47.598" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:49.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:51.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:53.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:55.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:57.598" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:00:59.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:01.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:03.598" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:05.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:07.598" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:09.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:11.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:13.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:15.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:17.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:19.597" level=info msg="could not dial address" address="35.229.225.100:7824" error="dial tcp 35.229.225.100:7824: connect: connection refused" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:21.478" level=info msg="Failed to send TCP packet to GameServer. Dumping Events!" gs=game-servermssq6 status="{State:Ready Ports:[{Name:gameserver-tcp Port:7824} {Name:gameserver-udp Port:7824}] Address:35.229.225.100 Addresses:[{Type:InternalIP Address:10.140.0.27} {Type:ExternalIP Address:35.229.225.100} {Type:Hostname Address:gk3-gke-autopilot-e2e-test-clu-pool-1-12d37f0a-8s2r} {Type:PodIP Address:10.32.130.86}] NodeName:gk3-gke-autopilot-e2e-test-clu-pool-1-12d37f0a-8s2r ReservedUntil:<nil> Players:0xc000746750 Counters:map[] Lists:map[] Eviction:0xc000744490}" test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:21.478" level=info msg="Dumping Events:" kind= test=TestGameServerTcpUdpProtocol
time="2026-02-28 18:01:21.638" level=info msg="Event!" lastTimestamp="2026-02-28 18:00:17 +0000 UTC" message="Port allocated" reason=PortAllocation test=TestGameServerTcpUdpProtocol type=Normal
time="2026-02-28 18:01:21.638" level=info msg="Event!" lastTimestamp="2026-02-28 18:00:17 +0000 UTC" message="Pod game-servermssq6 created" reason=Creating test=TestGameServerTcpUdpProtocol type=Normal
time="2026-02-28 18:01:21.638" level=info msg="Event!" lastTimestamp="2026-02-28 18:00:19 +0000 UTC" message="SDK state change" reason=RequestReady test=TestGameServerTcpUdpProtocol type=Normal
time="2026-02-28 18:01:21.638" level=info msg="Event!" lastTimestamp="2026-02-28 18:00:20 +0000 UTC" message="Address and port populated" reason=Ready test=TestGameServerTcpUdpProtocol type=Normal
time="2026-02-28 18:01:21.638" level=info msg="Event!" lastTimestamp="2026-02-28 18:00:20 +0000 UTC" message="SDK.Ready() complete" reason=Ready test=TestGameServerTcpUdpProtocol type=Normal
    gameserver_test.go:1093: Could not ping TCP GameServer: timed out attempting to send TCP message to address: context deadline exceeded
--- FAIL: TestGameServerTcpUdpProtocol (64.63s)
FAIL test/e2e.TestGameServerTcpUdpProtocol (re-run 1) (64.63s)
time="2026-02-28 18:01:21.953" level=info msg="Namespace 1772301615 is deleted"
FAIL test/e2e

I remember us having to recreate those clusters occasionally? Is that issue still a thing?

@markmandel
Copy link
Collaborator Author

@igooch @peterzhongyi - check above please.

@markmandel
Copy link
Collaborator Author

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: a111e873-f944-40ff-9f85-1a00120301b4

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 5ea64a27-5f5f-48c7-bb81-6ae13defbd5a

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@Sivasankaran25
Copy link
Collaborator

/gcbrun

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: faa02cc3-656d-419d-8e85-c674c6259e30

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@Sivasankaran25
Copy link
Collaborator

/gcbrun

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 4b4f53b3-587b-4067-826c-7018fd5ed68b

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4465/head:pr_4465 && git checkout pr_4465
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.57.0-dev-dd54f32

@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 1d12e91e-3fe6-4ccd-b511-4d286a6a3cc3

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4465/head:pr_4465 && git checkout pr_4465
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.57.0-dev-3f57ad8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tests Unit tests, e2e tests, anything to make sure things don't break kind/cleanup Refactoring code, fixing up documentation, etc size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants