Skip to content

Bug: Inventory updates should tolerate drift (and overwrite it) #559

@karlkfi

Description

@karlkfi

Right now, inventory updates may return a conflict error from Kubernetes. The inventory client should detect this (apierrors.IsConflict(err)) and retry with a new Get (to update the ResourceVersion) + Update.

Example retry code:

type retriable func(ctx context.Context) (retry bool, err error)

func retryWithBackoff(ctx context.Context, timeout time.Duration, fn retriable) error {
	var err error
	var retry bool
	ctx, cancel := context.WithTimeout(ctx, timeout)
	defer cancel()
	delay := 1 + time.Second
	for {
		// attempt to update status
		retry, err = fn(ctx)
		if !retry {
			return err
		}

		// wait until delay or timeout
		timer := time.NewTimer(delay)
		select {
		case <-ctx.Done():
			timer.Stop()
			return fmt.Errorf("timed out after retrying for %v: %w", timeout, err)
		case <-timer.C:
			// continue
		}
		// retry backoff
		delay = delay * 2
	}
}

example usage:

	// attempt to update status until timeout
	ctx := context.TODO()
	timeout := 1 * time.Minute
	return retryWithBackoff(ctx, timeout, func(ctx context.Context) (retry bool, err error) {
		// Get the object to get the latest ResourceVersion.
		latestObj, err := resource.Get(ctx, obj.GetName(), metav1.GetOptions{TypeMeta: meta})
		if err != nil {
			return false, fmt.Errorf("failed to get inventory status from cluster: %w", err)
		}
		// Ignore any status changes made remotely.
		// This update will replace them.
		obj.SetResourceVersion(latestObj.GetResourceVersion())

		_, err = resource.UpdateStatus(ctx, obj, metav1.UpdateOptions{TypeMeta: meta})
		if err != nil {
			// retry if conflict
			return apierrors.IsConflict(err), fmt.Errorf("failed to write updated inventory status to cluster: %w", err)
		}
		return false, nil
	})

Another option is to use https://github.com/flowchartsman/retry which is nice and generic. gcloud and client-go also have retry libs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions