Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion aiopslab-applications
Submodule aiopslab-applications updated 94 files
+42 −0 BlueprintHotelReservation/kubernetes/frontend/frontend-service-deployment.yaml
+14 −0 BlueprintHotelReservation/kubernetes/frontend/frontend-service-env-configmap.yaml
+21 −0 BlueprintHotelReservation/kubernetes/frontend/frontend-service-service.yaml
+36 −0 BlueprintHotelReservation/kubernetes/geo/geo-db-deployment.yaml
+21 −0 BlueprintHotelReservation/kubernetes/geo/geo-db-service.yaml
+42 −0 BlueprintHotelReservation/kubernetes/geo/geo-service-deployment.yaml
+10 −0 BlueprintHotelReservation/kubernetes/geo/geo-service-env-configmap.yaml
+21 −0 BlueprintHotelReservation/kubernetes/geo/geo-service-service.yaml
+38 −0 BlueprintHotelReservation/kubernetes/jaeger/jaeger-deployment.yaml
+28 −0 BlueprintHotelReservation/kubernetes/jaeger/jaeger-service.yaml
+36 −0 BlueprintHotelReservation/kubernetes/profile/profile-cache-deployment.yaml
+21 −0 BlueprintHotelReservation/kubernetes/profile/profile-cache-service.yaml
+36 −0 BlueprintHotelReservation/kubernetes/profile/profile-db-deployment.yaml
+21 −0 BlueprintHotelReservation/kubernetes/profile/profile-db-service.yaml
+42 −0 BlueprintHotelReservation/kubernetes/profile/profile-service-deployment.yaml
+11 −0 BlueprintHotelReservation/kubernetes/profile/profile-service-env-configmap.yaml
+21 −0 BlueprintHotelReservation/kubernetes/profile/profile-service-service.yaml
+36 −0 BlueprintHotelReservation/kubernetes/rate/rate-cache-deployment.yaml
+21 −0 BlueprintHotelReservation/kubernetes/rate/rate-cache-service.yaml
+36 −0 BlueprintHotelReservation/kubernetes/rate/rate-db-deployment.yaml
+21 −0 BlueprintHotelReservation/kubernetes/rate/rate-db-service.yaml
+42 −0 BlueprintHotelReservation/kubernetes/rate/rate-service-deployment.yaml
+11 −0 BlueprintHotelReservation/kubernetes/rate/rate-service-env-configmap.yaml
+21 −0 BlueprintHotelReservation/kubernetes/rate/rate-service-service.yaml
+36 −0 BlueprintHotelReservation/kubernetes/recommend/recomd-db-deployment.yaml
+21 −0 BlueprintHotelReservation/kubernetes/recommend/recomd-db-service.yaml
+42 −0 BlueprintHotelReservation/kubernetes/recommend/recomd-service-deployment.yaml
+10 −0 BlueprintHotelReservation/kubernetes/recommend/recomd-service-env-configmap.yaml
+21 −0 BlueprintHotelReservation/kubernetes/recommend/recomd-service-service.yaml
+36 −0 BlueprintHotelReservation/kubernetes/reservation/reserv-cache-deployment.yaml
+21 −0 BlueprintHotelReservation/kubernetes/reservation/reserv-cache-service.yaml
+36 −0 BlueprintHotelReservation/kubernetes/reservation/reserv-db-deployment.yaml
+21 −0 BlueprintHotelReservation/kubernetes/reservation/reserv-db-service.yaml
+42 −0 BlueprintHotelReservation/kubernetes/reservation/reserv-service-deployment.yaml
+11 −0 BlueprintHotelReservation/kubernetes/reservation/reserv-service-env-configmap.yaml
+21 −0 BlueprintHotelReservation/kubernetes/reservation/reserv-service-service.yaml
+10 −0 BlueprintHotelReservation/kubernetes/rpc-configmap.yaml
+42 −0 BlueprintHotelReservation/kubernetes/search/search-service-deployment.yaml
+11 −0 BlueprintHotelReservation/kubernetes/search/search-service-env-configmap.yaml
+21 −0 BlueprintHotelReservation/kubernetes/search/search-service-service.yaml
+36 −0 BlueprintHotelReservation/kubernetes/user/user-db-deployment.yaml
+21 −0 BlueprintHotelReservation/kubernetes/user/user-db-service.yaml
+42 −0 BlueprintHotelReservation/kubernetes/user/user-service-deployment.yaml
+10 −0 BlueprintHotelReservation/kubernetes/user/user-service-env-configmap.yaml
+21 −0 BlueprintHotelReservation/kubernetes/user/user-service-service.yaml
+17 −0 BlueprintHotelReservation/wlgen/wlgen_proc-configmap.yaml
+26 −0 BlueprintHotelReservation/wlgen/wlgen_proc-job.yaml
+1 −1 FleetCast
+1 −1 astronomy-shop
+1 −1 flight-ticket
+1 −0 flower/kubernetes/clientapp/clientapp-1-deployment.yaml
+1 −0 flower/kubernetes/clientapp/clientapp-2-deployment.yaml
+1 −0 flower/kubernetes/serverapp/serverapp-pod.yaml
+1 −0 flower/kubernetes/superlink/superlink-deployment.yaml
+1 −0 flower/kubernetes/supernode/supernode-1-deployment.yaml
+1 −0 flower/kubernetes/supernode/supernode-2-deployment.yaml
+2 −2 flower/train/pyproject.toml
+1 −0 hotelReservation/kubernetes/consul/consul-deployment.yaml
+1 −0 hotelReservation/kubernetes/frontend/frontend-deployment.yaml
+1 −0 hotelReservation/kubernetes/geo/geo-deployment.yaml
+0 −14 hotelReservation/kubernetes/geo/geo-persistent-volume.yaml
+0 −1 hotelReservation/kubernetes/geo/geo-pvc.yaml
+1 −0 hotelReservation/kubernetes/geo/mongodb-geo-deployment.yaml
+1 −0 hotelReservation/kubernetes/jaeger/jaeger-deployment.yaml
+1 −0 hotelReservation/kubernetes/profile/memcached-profile-deployment.yaml
+1 −0 hotelReservation/kubernetes/profile/mongodb-profile-deployment.yaml
+1 −0 hotelReservation/kubernetes/profile/profile-deployment.yaml
+0 −17 hotelReservation/kubernetes/profile/profile-persistent-volume.yaml
+0 −1 hotelReservation/kubernetes/profile/profile-pvc.yaml
+1 −0 hotelReservation/kubernetes/rate/memcached-rate-deployment.yaml
+1 −0 hotelReservation/kubernetes/rate/mongodb-rate-deployment.yaml
+1 −0 hotelReservation/kubernetes/rate/rate-deployment.yaml
+0 −14 hotelReservation/kubernetes/rate/rate-persistent-volume.yaml
+0 −1 hotelReservation/kubernetes/rate/rate-pvc.yaml
+1 −0 hotelReservation/kubernetes/reccomend/mongodb-recommendation-deployment.yaml
+1 −0 hotelReservation/kubernetes/reccomend/recommendation-deployment.yaml
+0 −14 hotelReservation/kubernetes/reccomend/recommendation-persistent-volume.yaml
+0 −1 hotelReservation/kubernetes/reccomend/recommendation-pvc.yaml
+1 −0 hotelReservation/kubernetes/reserve/memcached-reservation-deployment.yaml
+1 −0 hotelReservation/kubernetes/reserve/mongodb-reservation-deployment.yaml
+1 −0 hotelReservation/kubernetes/reserve/reservation-deployment.yaml
+0 −14 hotelReservation/kubernetes/reserve/reservation-persistent-volume.yaml
+0 −1 hotelReservation/kubernetes/reserve/reservation-pvc.yaml
+1 −0 hotelReservation/kubernetes/search/search-deployment.yaml
+1 −0 hotelReservation/kubernetes/user/mongodb-user-deployment.yaml
+1 −0 hotelReservation/kubernetes/user/user-deployment.yaml
+0 −14 hotelReservation/kubernetes/user/user-persistent-volume.yaml
+0 −1 hotelReservation/kubernetes/user/user-pvc.yaml
+ socialNetwork/helm-chart/socialnetwork/tmpcharts-67806/mcrouter-0.3.0.tgz
+ socialNetwork/helm-chart/socialnetwork/tmpcharts-67806/mongodb-sharded-6.0.1.tgz
+ socialNetwork/helm-chart/socialnetwork/tmpcharts-76120/mcrouter-0.3.0.tgz
+ socialNetwork/helm-chart/socialnetwork/tmpcharts-76120/mongodb-sharded-6.0.1.tgz
+1 −1 socialNetwork/helm-chart/socialnetwork/values.yaml
+1 −1 train-ticket
6 changes: 6 additions & 0 deletions aiopslab/generators/fault/inject_symp.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ def __init__(self, namespace: str):

container_runtime = self.kubectl.get_container_runtime()

if container_runtime is None:
raise ValueError(
"Could not detect container runtime. "
"Ensure the cluster is running and at least one node is Ready."
)

if "docker" in container_runtime:
pass
elif "containerd" in container_runtime:
Expand Down
22 changes: 17 additions & 5 deletions aiopslab/service/kubectl.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,15 +54,27 @@ def get_cluster_ip(self, service_name, namespace):
service_info = self.core_v1_api.read_namespaced_service(service_name, namespace)
return service_info.spec.cluster_ip # type: ignore

def get_container_runtime(self):
def get_container_runtime(self, max_wait: int = 60, poll_interval: int = 2):
"""
Retrieve the container runtime used by the cluster.
If the cluster uses multiple container runtimes, the first one found will be returned.

Args:
max_wait: Maximum seconds to wait for a Ready node (default: 60)
poll_interval: Seconds between checks (default: 2)

Returns:
Container runtime version string, or None if no Ready node found within max_wait.
"""
for node in self.core_v1_api.list_node().items:
for status in node.status.conditions:
if status.type == "Ready" and status.status == "True":
return node.status.node_info.container_runtime_version
elapsed = 0
while elapsed < max_wait:
for node in self.core_v1_api.list_node().items:
for status in node.status.conditions:
if status.type == "Ready" and status.status == "True":
return node.status.node_info.container_runtime_version
time.sleep(poll_interval)
elapsed += poll_interval
Comment on lines +69 to +76
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry loop can exceed max_wait when poll_interval is larger than the remaining time (e.g., max_wait=1, poll_interval=10 will still sleep 10s). Consider using a monotonic deadline and sleeping min(poll_interval, remaining) (or validating/clamping poll_interval) so max_wait is an actual upper bound.

Suggested change
elapsed = 0
while elapsed < max_wait:
for node in self.core_v1_api.list_node().items:
for status in node.status.conditions:
if status.type == "Ready" and status.status == "True":
return node.status.node_info.container_runtime_version
time.sleep(poll_interval)
elapsed += poll_interval
deadline = time.monotonic() + max_wait
while time.monotonic() < deadline:
for node in self.core_v1_api.list_node().items:
for status in node.status.conditions:
if status.type == "Ready" and status.status == "True":
return node.status.node_info.container_runtime_version
remaining = deadline - time.monotonic()
if remaining <= 0:
break
time.sleep(min(poll_interval, remaining))

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be an extreme case but a good practice

Comment on lines +70 to +76
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says this returns None when no Ready node is found, but self.core_v1_api.list_node() can raise (e.g., transient API errors / auth / connectivity) and will currently break out of the retry loop immediately. Either update the docstring to document the possible exception(s), or catch/log exceptions during polling and continue until the deadline (similar to wait_for_ready).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We should catch exceptions and update docstrings

return None

def get_pod_name(self, namespace, label_selector):
"""Get the name of the first pod in a namespace that matches a given label selector."""
Expand Down
Loading