diff --git a/docs/user-guide/troubleshooting/edgecore-logs.md b/docs/user-guide/troubleshooting/edgecore-logs.md new file mode 100644 index 000000000..c84f897de --- /dev/null +++ b/docs/user-guide/troubleshooting/edgecore-logs.md @@ -0,0 +1,270 @@ +--- +title: Checking EdgeCore Logs +sidebar_position: 13 +--- + +> **Applies to:** KubeEdge v1.13+ +> **Related issue:** [kubeedge#6704](https://github.com/kubeedge/kubeedge/issues/6704) + +When an edge node fails to join the cluster or pods fail to start on the edge, inspecting EdgeCore logs is the first and most effective diagnostic step. This guide explains **where logs are stored**, **which commands to run**, and **what log entries to look for** depending on how EdgeCore is deployed. + +--- + +## Deployment Modes Overview + +EdgeCore can be deployed in two fundamentally different ways, each with a different log location: + +| Deployment Mode | How Installed | Log Location | +|---|---|---| +| **Host binary / systemd** | `keadm join` (default) or manual binary | `journalctl` or `/var/log/kubeedge/` | +| **Containerized** | Container runtime | `docker logs` or `crictl logs` | + +Identifying your deployment mode before proceeding will save time. + +```bash +# Check if EdgeCore is running as a systemd service (most common) +systemctl is-active edgecore.service + +# Check if EdgeCore is running as a container via docker +docker ps | grep edgecore + +# Check if EdgeCore is running as a container via containerd or CRI-O +crictl ps | grep edgecore +``` + +If the first command returns `active`, proceed to [Mode 1](#mode-1-edgecore-as-a-host-binary--systemd-service). If EdgeCore is containerized, proceed to [Mode 2](#mode-2-edgecore-as-a-container). + +--- + +## Mode 1: EdgeCore as a Host Binary / systemd Service + +When EdgeCore is installed directly on the host using `keadm join` in binary mode (the default), logs are written to **two possible locations**: + +1. `journald` (captured by `systemd`) — the primary source. +2. `/var/log/kubeedge/edgecore.log` — if the `--log-file` flag is set. + +### Using `journalctl` + +`journalctl` is the primary tool for reading systemd-managed service logs: + +```bash +# View all logs for the EdgeCore service +journalctl -u edgecore.service + +# Follow live logs (equivalent to tail -f) +journalctl -u edgecore.service -f + +# View the last 200 lines +journalctl -u edgecore.service -n 200 + +# View logs since last boot only +journalctl -u edgecore.service -b + +# Filter to show only error-level and above +journalctl -u edgecore.service -p err + +# Combine: tail and follow only ERRORs +journalctl -u edgecore.service -f -p err +``` + +### Checking `/var/log/kubeedge/` + +If EdgeCore was started with `--logtostderr=false --log-file=/var/log/kubeedge/edgecore.log` (common in older deployments or manual setups), logs are written to a file: + +```bash +# View the log file +cat /var/log/kubeedge/edgecore.log + +# Tail the log file in real time +tail -f /var/log/kubeedge/edgecore.log + +# View only the last 50 lines +tail -n 50 /var/log/kubeedge/edgecore.log + +# Filter for errors and fatals +grep -E "^[EF][0-9]{4}" /var/log/kubeedge/edgecore.log +``` + +:::note No Log File Found? +If `/var/log/kubeedge/` is empty or does not exist, EdgeCore is most likely writing to `journald` instead. Use `journalctl -u edgecore.service` as described above. +::: + +--- + +## Mode 2: EdgeCore as a Container + +When EdgeCore is deployed inside a container (e.g., using `keadm join --edgecore-image`), logs are captured by the container runtime. + +### Using Docker + +If your container runtime is Docker: + +```bash +# Find the EdgeCore container ID or name +docker ps | grep edgecore + +# View logs for the EdgeCore container +docker logs + +# Tail live logs +docker logs -f + +# Filter for errors +docker logs 2>&1 | grep -E "E[0-9]{4}|F[0-9]{4}" +``` + +### Using crictl (containerd / CRI-O) + +If your container runtime is containerd or CRI-O, use `crictl`: + +```bash +# Find the EdgeCore container ID +crictl ps | grep edgecore + +# View logs for the EdgeCore container +crictl logs + +# Tail live logs +crictl logs -f + +# Filter for errors +crictl logs 2>&1 | grep -E "E[0-9]{4}|F[0-9]{4}" +``` + +--- + +## Diagnosing Common Failures + +The following sections describe specific log patterns to search for during common failure scenarios at the edge. The `grep` commands below work against systemd logs, but you can adapt them for container logs or log files. + +### Failures Connecting to CloudCore + +Connection issues are the most frequent cause of EdgeCore startup failure. This includes WebSocket drops and certificate mismatches. + +```bash +journalctl -u edgecore.service | grep -i "cloud\|websocket\|cert\|tls" +``` + +**What to look for:** + +```text +# Certificate mismatch or expired certificate +E0115 10:24:15.654321 1 hub.go:112] x509: certificate has expired or is not yet valid + +# Unknown Authority (CA mismatch) +E0115 10:25:00.111111 1 client.go:87] tls: failed to verify server's certificate: x509: certificate signed by unknown authority + +# WebSocket connection failure (CloudCore unreachable) +E0115 10:30:05.555555 1 websocket.go:234] failed to connect to cloudcore: dial tcp 192.168.1.100:10000: connect: connection refused + +# Token-based bootstrapping failure (keadm join) +E0115 10:26:30.222222 1 bootstrap.go:78] failed to get edge certificate: rpc error: code = Unauthenticated desc = Invalid token +``` + +**Resolution steps:** +- Verify CloudCore is running and accessible from the edge node on port 10000 (CloudHub) and 10002 (HTTPS): `nc -zv 10000 10002`. +- If certificates are expired or mismatched, regenerate them or re-run `keadm join` with a valid token. +- Check edge node time synchronization, as skewed clocks will cause `x509` certificate validation failures. + +--- + +### Internal MQTT Broker Issues + +EdgeCore uses an MQTT broker (internal or external) for device twin communication. EventBus errors usually point to MQTT connectivity problems. + +```bash +journalctl -u edgecore.service | grep -i "mqtt\|eventbus" +``` + +**What to look for:** + +```text +# EventBus failing to connect to local MQTT +E0115 10:00:01.123456 1 eventbus.go:89] Failed to connect to MQTT broker at tcp://127.0.0.1:1883: network Error : dial tcp 127.0.0.1:1883: connect: connection refused + +# Authentication failure with external MQTT broker +E0115 10:00:01.234567 1 client.go:112] MQTT connection error: Connection Refused: not authorised +``` + +**Resolution steps:** +- If using the built-in MQTT broker (default), ensure `mqttMode: 0` (Internal) is properly configured in `/etc/kubeedge/config/edgecore.yaml`. +- If using an external Mosquitto broker, verify it is running (`systemctl status mosquitto`) and listening on the configured port (usually `1883`). +- Check firewall rules blocking local traffic on port `1883`. + +--- + +### Container Runtime Errors + +If EdgeCore connects to CloudCore successfully but pods fail to deploy or start on the edge, the issue usually lies with the container runtime (CRI integration). + +```bash +journalctl -u edgecore.service | grep -i "cri\|sandbox\|runtime\|container" +``` + +**What to look for:** + +```text +# Failed to start a pod sandbox (usually CNI or CRI configuration issue) +E0115 10:45:00.111111 1 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "nginx-edge-1" failed: rpc error: code = Unknown desc = failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/sys" to rootfs at "/sys" caused: mount through procfd: operation not permitted: unknown + +# Docker/containerd socket not found +E0115 10:45:05.222222 1 remote_runtime.go:102] dial unix /run/containerd/containerd.sock: connect: no such file or directory + +# Image pull failure +E0115 10:46:00.333333 1 image_manager.go:88] Failed to pull image "nginx:latest": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp: lookup registry-1.docker.io on 8.8.8.8:53: read udp 192.168.1.50:45123->8.8.8.8:53: i/o timeout +``` + +**Resolution steps:** +- Verify the container runtime endpoint is correctly configured in `/etc/kubeedge/config/edgecore.yaml` (`remoteRuntimeEndpoint` and `remoteImageEndpoint`). +- Check if the container runtime service is active (`systemctl status containerd` or `systemctl status docker`). +- If pods remain in `ContainerCreating`, inspect the runtime logs directly (e.g., `journalctl -u containerd.service`). + +--- + +## Quick-Reference Command Cheatsheet + +```bash +# ── systemd Mode (Default) ─────────────────────────────────────────────────── +# Live tail all logs +journalctl -u edgecore.service -f + +# Errors and above only, follow +journalctl -u edgecore.service -f -p err + +# Last 200 lines +journalctl -u edgecore.service -n 200 + +# Time-bounded window +journalctl -u edgecore.service --since "10 minutes ago" + +# ── File Mode ───────────────────────────────────────────────────────────────── +# Live tail log file +tail -f /var/log/kubeedge/edgecore.log + +# Errors and fatals only +grep -E "^[EF][0-9]{4}" /var/log/kubeedge/edgecore.log + +# Cloud connection issues +grep -i "cloud\|websocket" /var/log/kubeedge/edgecore.log + +# ── Container Mode ──────────────────────────────────────────────────────────── +# Docker: Live tail logs +docker logs -f + +# containerd/CRI-O: Live tail logs +crictl logs -f + +# ── Service Status ──────────────────────────────────────────────────────────── +# Overview of service health + last log lines +systemctl status edgecore.service +``` + +--- + +## Related Documentation + +- [Checking CloudCore Logs](../../advanced/cloudcore-logs.md) +- [Enable kubectl logs/exec to debug pods on the edge](../../advanced/debug.md) +- [KubeEdge Installation with keadm](../../setup/install-with-keadm.md) +- [EdgeCore Configuration Reference](../../setup/config.md) +- [KubeEdge Troubleshooting FAQ](../../faq/setup.md) diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/advanced/cloudcore-logs.md b/i18n/zh/docusaurus-plugin-content-docs/current/advanced/cloudcore-logs.md new file mode 100644 index 000000000..c8e890782 --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/advanced/cloudcore-logs.md @@ -0,0 +1,354 @@ +--- +title: Checking CloudCore Logs +sidebar_position: 12 +--- + +> **Applies to:** KubeEdge v1.13+ +> **Related issue:** [kubeedge#6703](https://github.com/kubeedge/kubeedge/issues/6703) + +When a KubeEdge installation fails or CloudCore behaves unexpectedly at runtime, inspecting CloudCore logs is the first and most effective diagnostic step. This guide explains **where logs are stored**, **which commands to run**, and **what log entries to look for** depending on how CloudCore is deployed. + +--- + +## Deployment Modes Overview + +CloudCore can be deployed in two fundamentally different ways, each with a different log location: + +| Deployment Mode | How Installed | Log Location | +|---|---|---| +| **Kubernetes Pod** | Helm chart or `keadm init` | `kubectl logs` (stdout) | +| **Host binary / systemd** | `keadm init --with-edge-node=false` or manual binary | `journalctl` or `/var/log/kubeedge/` | + +Identifying your deployment mode before proceeding will save time. + +```bash +# Check if CloudCore is running as a Pod +kubectl get pods -n kubeedge -l kubeedge=cloudcore + +# Check if CloudCore is running as a systemd service +systemctl is-active cloudcore.service +``` + +If the first command returns a running pod, proceed to [Mode 1](#mode-1-cloudcore-as-a-kubernetes-pod). If the second command returns `active`, proceed to [Mode 2](#mode-2-cloudcore-as-a-host-binary--systemd-service). + +--- + +## Mode 1: CloudCore as a Kubernetes Pod + +When CloudCore is deployed as a pod (the default mode for `keadm init` and Helm installations), all logs are written to **stdout/stderr** and captured by the Kubernetes logging subsystem. + +### Basic Log Retrieval + +```bash +# Find the CloudCore pod name +kubectl get pods -n kubeedge + +# View logs for the CloudCore pod (replace with actual name) +kubectl logs -n kubeedge + +# Shorthand using a label selector — no need to look up the pod name +kubectl logs -n kubeedge -l kubeedge=cloudcore +``` + +### Tailing Live Logs + +Use the `-f` flag to follow logs in real time. This is the most useful mode when testing a new edge node connection: + +```bash +kubectl logs -n kubeedge -l kubeedge=cloudcore -f +``` + +### Viewing Logs from a Previous (Crashed) Container + +If CloudCore crashed and restarted, the current container's logs will not contain the failure. Use `--previous` to retrieve logs from the terminated container: + +```bash +kubectl logs -n kubeedge -l kubeedge=cloudcore --previous +``` + +### Filtering by Severity + +CloudCore uses structured Go logging. Pipe the output through `grep` to filter noise: + +```bash +# Show only ERROR and FATAL lines +kubectl logs -n kubeedge -l kubeedge=cloudcore | grep -E "E[0-9]{4}|F[0-9]{4}" + +# Show the last 100 lines and then follow +kubectl logs -n kubeedge -l kubeedge=cloudcore --tail=100 -f + +# Show logs since a specific time +kubectl logs -n kubeedge -l kubeedge=cloudcore --since=30m +kubectl logs -n kubeedge -l kubeedge=cloudcore --since-time="2026-01-15T10:00:00Z" +``` + +:::tip Go Log Format +CloudCore uses `klog` (the Kubernetes logging library). Log lines begin with a severity prefix: +- `I` = INFO +- `W` = WARNING +- `E` = ERROR +- `F` = FATAL (process will exit immediately after a FATAL log) +::: + +--- + +## Mode 2: CloudCore as a Host Binary / systemd Service + +When CloudCore is installed directly on the host using `keadm` in binary mode, or run via a systemd unit file, logs are written to **two possible locations**: + +1. `journald` (captured by `systemd`) — the primary source +2. `/var/log/kubeedge/cloudcore.log` — if the `--log-file` flag is set + +### Using `journalctl` + +`journalctl` is the primary tool for reading systemd-managed service logs: + +```bash +# View all logs for the CloudCore service +journalctl -u cloudcore.service + +# Follow live logs (equivalent to tail -f) +journalctl -u cloudcore.service -f + +# View the last 200 lines +journalctl -u cloudcore.service -n 200 + +# View logs since last boot only +journalctl -u cloudcore.service -b + +# View logs from a specific time window +journalctl -u cloudcore.service --since "2026-01-15 10:00:00" --until "2026-01-15 10:30:00" + +# Filter to show only error-level and above +journalctl -u cloudcore.service -p err + +# Combine: tail and follow only ERRORs +journalctl -u cloudcore.service -f -p err +``` + +### Checking `/var/log/kubeedge/` + +If CloudCore was started with `--logtostderr=false --log-file=/var/log/kubeedge/cloudcore.log` (common in older deployments or manual binary setups), logs are written to a file: + +```bash +# View the log file +cat /var/log/kubeedge/cloudcore.log + +# Tail the log file in real time +tail -f /var/log/kubeedge/cloudcore.log + +# View only the last 50 lines +tail -n 50 /var/log/kubeedge/cloudcore.log + +# Filter for errors and fatals +grep -E "^[EF][0-9]{4}" /var/log/kubeedge/cloudcore.log + +# Confirm where logs are actually being written +ls -lh /var/log/kubeedge/ +``` + +:::note No Log File Found? +If `/var/log/kubeedge/` is empty or does not exist, CloudCore is most likely writing to `journald` instead. Use `journalctl -u cloudcore.service` as described above. +::: + +### Checking the systemd Unit File + +To understand exactly how CloudCore was started (and which log flags are in use), inspect the unit file: + +```bash +# View the active unit file +systemctl cat cloudcore.service + +# Check the service status and last few log lines together +systemctl status cloudcore.service +``` + +--- + +## Diagnosing Common Failures + +The following sections describe specific log patterns to search for during common failure scenarios. In each case, the `grep` commands work against both `kubectl logs` output (pipe into grep) and log files (`grep` the file directly). + +### Certificate Errors + +Certificate issues are the most common cause of CloudCore startup failure and EdgeCore connection drops. Look for these patterns: + +```bash +# When inspecting a pod: +kubectl logs -n kubeedge -l kubeedge=cloudcore | grep -i "cert\|tls\|x509\|ca" + +# When inspecting a log file: +grep -i "cert\|tls\|x509\|ca" /var/log/kubeedge/cloudcore.log +``` + +**What to look for:** + +``` +# Certificate not yet generated or path wrong +E0115 10:23:01.123456 1 server.go:45] Failed to load TLS credentials: open /etc/kubeedge/certs/server.crt: no such file or directory + +# Edge node presents an expired or untrusted certificate +E0115 10:24:15.654321 1 handler.go:112] x509: certificate has expired or is not yet valid + +# CA mismatch between CloudCore and EdgeCore +E0115 10:25:00.111111 1 handler.go:87] tls: failed to verify client's certificate: x509: certificate signed by unknown authority + +# Token-based bootstrapping failure (keadm join) +E0115 10:26:30.222222 1 certmanager.go:78] failed to get edge certificate from cloudcore: rpc error: code = Unauthenticated desc = Invalid token +``` + +**Resolution steps:** +- Verify `/etc/kubeedge/certs/` exists and is populated: `ls -la /etc/kubeedge/certs/` +- Check certificate expiry: `openssl x509 -in /etc/kubeedge/certs/server.crt -noout -dates` +- Re-run `keadm init` with `--force` to regenerate certificates + +--- + +### EdgeCore Connection Drops + +After initial setup, if edge nodes repeatedly disconnect and reconnect, look for these patterns in CloudCore logs: + +```bash +kubectl logs -n kubeedge -l kubeedge=cloudcore | grep -i "edge\|disconnect\|reconnect\|websocket\|tunnel" +``` + +**What to look for:** + +``` +# Edge node disconnected (normal during restart, abnormal if repeated) +I0115 10:30:00.000000 1 nodeconnection.go:201] edge node edge-node-01 disconnected + +# WebSocket/tunnel handshake failure — usually a firewall or port issue +E0115 10:30:05.555555 1 handler.go:234] failed to upgrade websocket connection from edge-node-01: websocket: the client is not using the websocket protocol + +# EdgeCore is connecting to the wrong address or port +W0115 10:31:00.777777 1 server.go:156] no active connection for node edge-node-01 + +# Hub controller cannot list nodes — cloud API server connectivity issue +E0115 10:32:00.999999 1 hubcontroller.go:88] failed to list edge nodes: Get "https://10.96.0.1:443/api/v1/nodes": dial tcp 10.96.0.1:443: connect: connection refused +``` + +**Resolution steps:** +- Verify port 10000 (CloudHub) and 10002 (HTTPS) are open between edge and cloud: `nc -zv 10000` +- Check if a firewall or security group is blocking traffic +- Confirm the `advertiseAddress` in CloudCore config matches the IP EdgeCore is using to reach it + +--- + +### Port Conflicts at Startup + +If CloudCore fails to bind to its required ports (typically when another process is already listening), you will see FATAL log entries immediately at startup: + +```bash +kubectl logs -n kubeedge -l kubeedge=cloudcore --previous | grep -E "bind|address already in use|listen" +``` + +**What to look for:** + +``` +# Port already in use — fatal at startup +F0115 10:00:01.123456 1 server.go:89] Failed to listen on :10000: listen tcp :10000: bind: address already in use + +# Similar failure for the HTTPS port +F0115 10:00:01.234567 1 server.go:112] Failed to listen on :10002: listen tcp :10002: bind: address already in use + +# CloudStream port conflict +F0115 10:00:01.345678 1 server.go:134] Failed to listen on :10003: listen tcp :10003: bind: address already in use +``` + +**Resolution steps:** +- Identify the conflicting process: + ```bash + # Find which process is using port 10000 + sudo ss -tlnp | grep 10000 + # or + sudo lsof -i :10000 + ``` +- Stop the conflicting service or reconfigure CloudCore to use a different port in `/etc/kubeedge/config/cloudcore.yaml` + +--- + +### API Server Connectivity Failures + +CloudCore must be able to reach the Kubernetes API server. If it cannot, it fails to start or loses cluster state: + +```bash +kubectl logs -n kubeedge -l kubeedge=cloudcore | grep -i "apiserver\|kubeconfig\|dial tcp\|connection refused" +``` + +**What to look for:** + +``` +# Kubeconfig missing or malformed +F0115 09:59:00.111111 1 app.go:67] failed to build kubeconfig: stat /etc/kubeedge/config/cloudcore.yaml: no such file or directory + +# API server not reachable +E0115 10:00:05.222222 1 reflector.go:138] Failed to watch *v1.Node: failed to list *v1.Node: Get "https://10.96.0.1:443/api/v1/nodes": dial tcp 10.96.0.1:443: connect: connection refused + +# RBAC permission denied +E0115 10:01:00.333333 1 controller.go:55] Failed to list pods: pods is forbidden: User "system:serviceaccount:kubeedge:cloudcore" cannot list resource "pods" in API group "" at the cluster scope +``` + +--- + +## Quick-Reference Command Cheatsheet + +```bash +# ── Pod Mode ───────────────────────────────────────────────────────────────── +# Live tail all logs +kubectl logs -n kubeedge -l kubeedge=cloudcore -f + +# Last 100 lines, then follow +kubectl logs -n kubeedge -l kubeedge=cloudcore --tail=100 -f + +# Errors only +kubectl logs -n kubeedge -l kubeedge=cloudcore | grep -E "^[EF][0-9]{4}" + +# Logs from crashed previous container +kubectl logs -n kubeedge -l kubeedge=cloudcore --previous + +# Logs from last 30 minutes +kubectl logs -n kubeedge -l kubeedge=cloudcore --since=30m + +# ── systemd Mode ───────────────────────────────────────────────────────────── +# Live tail +journalctl -u cloudcore.service -f + +# Errors and above only, follow +journalctl -u cloudcore.service -f -p err + +# Last 200 lines +journalctl -u cloudcore.service -n 200 + +# Time-bounded window +journalctl -u cloudcore.service --since "10 minutes ago" + +# ── File Mode ───────────────────────────────────────────────────────────────── +# Live tail log file +tail -f /var/log/kubeedge/cloudcore.log + +# Errors and fatals only +grep -E "^[EF][0-9]{4}" /var/log/kubeedge/cloudcore.log + +# Certificate-related lines +grep -i "cert\|x509\|tls" /var/log/kubeedge/cloudcore.log + +# Port binding failures +grep -i "bind\|address already in use\|listen" /var/log/kubeedge/cloudcore.log + +# ── Service Status ──────────────────────────────────────────────────────────── +# Overview of service health + last log lines +systemctl status cloudcore.service + +# Inspect how the service was started +systemctl cat cloudcore.service +``` + +--- + +## Related Documentation + +- [Enable kubectl logs/exec to debug pods on the edge](./debug.md) +- [KubeEdge Installation with keadm](../setup/install-with-keadm.md) +- [CloudCore Configuration Reference](../setup/config.md) +- [KubeEdge Troubleshooting FAQ](../faq/setup.md) diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/troubleshooting/edgecore-logs.md b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/troubleshooting/edgecore-logs.md new file mode 100644 index 000000000..c84f897de --- /dev/null +++ b/i18n/zh/docusaurus-plugin-content-docs/current/user-guide/troubleshooting/edgecore-logs.md @@ -0,0 +1,270 @@ +--- +title: Checking EdgeCore Logs +sidebar_position: 13 +--- + +> **Applies to:** KubeEdge v1.13+ +> **Related issue:** [kubeedge#6704](https://github.com/kubeedge/kubeedge/issues/6704) + +When an edge node fails to join the cluster or pods fail to start on the edge, inspecting EdgeCore logs is the first and most effective diagnostic step. This guide explains **where logs are stored**, **which commands to run**, and **what log entries to look for** depending on how EdgeCore is deployed. + +--- + +## Deployment Modes Overview + +EdgeCore can be deployed in two fundamentally different ways, each with a different log location: + +| Deployment Mode | How Installed | Log Location | +|---|---|---| +| **Host binary / systemd** | `keadm join` (default) or manual binary | `journalctl` or `/var/log/kubeedge/` | +| **Containerized** | Container runtime | `docker logs` or `crictl logs` | + +Identifying your deployment mode before proceeding will save time. + +```bash +# Check if EdgeCore is running as a systemd service (most common) +systemctl is-active edgecore.service + +# Check if EdgeCore is running as a container via docker +docker ps | grep edgecore + +# Check if EdgeCore is running as a container via containerd or CRI-O +crictl ps | grep edgecore +``` + +If the first command returns `active`, proceed to [Mode 1](#mode-1-edgecore-as-a-host-binary--systemd-service). If EdgeCore is containerized, proceed to [Mode 2](#mode-2-edgecore-as-a-container). + +--- + +## Mode 1: EdgeCore as a Host Binary / systemd Service + +When EdgeCore is installed directly on the host using `keadm join` in binary mode (the default), logs are written to **two possible locations**: + +1. `journald` (captured by `systemd`) — the primary source. +2. `/var/log/kubeedge/edgecore.log` — if the `--log-file` flag is set. + +### Using `journalctl` + +`journalctl` is the primary tool for reading systemd-managed service logs: + +```bash +# View all logs for the EdgeCore service +journalctl -u edgecore.service + +# Follow live logs (equivalent to tail -f) +journalctl -u edgecore.service -f + +# View the last 200 lines +journalctl -u edgecore.service -n 200 + +# View logs since last boot only +journalctl -u edgecore.service -b + +# Filter to show only error-level and above +journalctl -u edgecore.service -p err + +# Combine: tail and follow only ERRORs +journalctl -u edgecore.service -f -p err +``` + +### Checking `/var/log/kubeedge/` + +If EdgeCore was started with `--logtostderr=false --log-file=/var/log/kubeedge/edgecore.log` (common in older deployments or manual setups), logs are written to a file: + +```bash +# View the log file +cat /var/log/kubeedge/edgecore.log + +# Tail the log file in real time +tail -f /var/log/kubeedge/edgecore.log + +# View only the last 50 lines +tail -n 50 /var/log/kubeedge/edgecore.log + +# Filter for errors and fatals +grep -E "^[EF][0-9]{4}" /var/log/kubeedge/edgecore.log +``` + +:::note No Log File Found? +If `/var/log/kubeedge/` is empty or does not exist, EdgeCore is most likely writing to `journald` instead. Use `journalctl -u edgecore.service` as described above. +::: + +--- + +## Mode 2: EdgeCore as a Container + +When EdgeCore is deployed inside a container (e.g., using `keadm join --edgecore-image`), logs are captured by the container runtime. + +### Using Docker + +If your container runtime is Docker: + +```bash +# Find the EdgeCore container ID or name +docker ps | grep edgecore + +# View logs for the EdgeCore container +docker logs + +# Tail live logs +docker logs -f + +# Filter for errors +docker logs 2>&1 | grep -E "E[0-9]{4}|F[0-9]{4}" +``` + +### Using crictl (containerd / CRI-O) + +If your container runtime is containerd or CRI-O, use `crictl`: + +```bash +# Find the EdgeCore container ID +crictl ps | grep edgecore + +# View logs for the EdgeCore container +crictl logs + +# Tail live logs +crictl logs -f + +# Filter for errors +crictl logs 2>&1 | grep -E "E[0-9]{4}|F[0-9]{4}" +``` + +--- + +## Diagnosing Common Failures + +The following sections describe specific log patterns to search for during common failure scenarios at the edge. The `grep` commands below work against systemd logs, but you can adapt them for container logs or log files. + +### Failures Connecting to CloudCore + +Connection issues are the most frequent cause of EdgeCore startup failure. This includes WebSocket drops and certificate mismatches. + +```bash +journalctl -u edgecore.service | grep -i "cloud\|websocket\|cert\|tls" +``` + +**What to look for:** + +```text +# Certificate mismatch or expired certificate +E0115 10:24:15.654321 1 hub.go:112] x509: certificate has expired or is not yet valid + +# Unknown Authority (CA mismatch) +E0115 10:25:00.111111 1 client.go:87] tls: failed to verify server's certificate: x509: certificate signed by unknown authority + +# WebSocket connection failure (CloudCore unreachable) +E0115 10:30:05.555555 1 websocket.go:234] failed to connect to cloudcore: dial tcp 192.168.1.100:10000: connect: connection refused + +# Token-based bootstrapping failure (keadm join) +E0115 10:26:30.222222 1 bootstrap.go:78] failed to get edge certificate: rpc error: code = Unauthenticated desc = Invalid token +``` + +**Resolution steps:** +- Verify CloudCore is running and accessible from the edge node on port 10000 (CloudHub) and 10002 (HTTPS): `nc -zv 10000 10002`. +- If certificates are expired or mismatched, regenerate them or re-run `keadm join` with a valid token. +- Check edge node time synchronization, as skewed clocks will cause `x509` certificate validation failures. + +--- + +### Internal MQTT Broker Issues + +EdgeCore uses an MQTT broker (internal or external) for device twin communication. EventBus errors usually point to MQTT connectivity problems. + +```bash +journalctl -u edgecore.service | grep -i "mqtt\|eventbus" +``` + +**What to look for:** + +```text +# EventBus failing to connect to local MQTT +E0115 10:00:01.123456 1 eventbus.go:89] Failed to connect to MQTT broker at tcp://127.0.0.1:1883: network Error : dial tcp 127.0.0.1:1883: connect: connection refused + +# Authentication failure with external MQTT broker +E0115 10:00:01.234567 1 client.go:112] MQTT connection error: Connection Refused: not authorised +``` + +**Resolution steps:** +- If using the built-in MQTT broker (default), ensure `mqttMode: 0` (Internal) is properly configured in `/etc/kubeedge/config/edgecore.yaml`. +- If using an external Mosquitto broker, verify it is running (`systemctl status mosquitto`) and listening on the configured port (usually `1883`). +- Check firewall rules blocking local traffic on port `1883`. + +--- + +### Container Runtime Errors + +If EdgeCore connects to CloudCore successfully but pods fail to deploy or start on the edge, the issue usually lies with the container runtime (CRI integration). + +```bash +journalctl -u edgecore.service | grep -i "cri\|sandbox\|runtime\|container" +``` + +**What to look for:** + +```text +# Failed to start a pod sandbox (usually CNI or CRI configuration issue) +E0115 10:45:00.111111 1 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "nginx-edge-1" failed: rpc error: code = Unknown desc = failed to create containerd task: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/sys" to rootfs at "/sys" caused: mount through procfd: operation not permitted: unknown + +# Docker/containerd socket not found +E0115 10:45:05.222222 1 remote_runtime.go:102] dial unix /run/containerd/containerd.sock: connect: no such file or directory + +# Image pull failure +E0115 10:46:00.333333 1 image_manager.go:88] Failed to pull image "nginx:latest": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to resolve reference "docker.io/library/nginx:latest": failed to do request: Head "https://registry-1.docker.io/v2/library/nginx/manifests/latest": dial tcp: lookup registry-1.docker.io on 8.8.8.8:53: read udp 192.168.1.50:45123->8.8.8.8:53: i/o timeout +``` + +**Resolution steps:** +- Verify the container runtime endpoint is correctly configured in `/etc/kubeedge/config/edgecore.yaml` (`remoteRuntimeEndpoint` and `remoteImageEndpoint`). +- Check if the container runtime service is active (`systemctl status containerd` or `systemctl status docker`). +- If pods remain in `ContainerCreating`, inspect the runtime logs directly (e.g., `journalctl -u containerd.service`). + +--- + +## Quick-Reference Command Cheatsheet + +```bash +# ── systemd Mode (Default) ─────────────────────────────────────────────────── +# Live tail all logs +journalctl -u edgecore.service -f + +# Errors and above only, follow +journalctl -u edgecore.service -f -p err + +# Last 200 lines +journalctl -u edgecore.service -n 200 + +# Time-bounded window +journalctl -u edgecore.service --since "10 minutes ago" + +# ── File Mode ───────────────────────────────────────────────────────────────── +# Live tail log file +tail -f /var/log/kubeedge/edgecore.log + +# Errors and fatals only +grep -E "^[EF][0-9]{4}" /var/log/kubeedge/edgecore.log + +# Cloud connection issues +grep -i "cloud\|websocket" /var/log/kubeedge/edgecore.log + +# ── Container Mode ──────────────────────────────────────────────────────────── +# Docker: Live tail logs +docker logs -f + +# containerd/CRI-O: Live tail logs +crictl logs -f + +# ── Service Status ──────────────────────────────────────────────────────────── +# Overview of service health + last log lines +systemctl status edgecore.service +``` + +--- + +## Related Documentation + +- [Checking CloudCore Logs](../../advanced/cloudcore-logs.md) +- [Enable kubectl logs/exec to debug pods on the edge](../../advanced/debug.md) +- [KubeEdge Installation with keadm](../../setup/install-with-keadm.md) +- [EdgeCore Configuration Reference](../../setup/config.md) +- [KubeEdge Troubleshooting FAQ](../../faq/setup.md)