Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions README_SERVER_HEALTH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Server Health Monitoring Script

## Overview
This script monitors the health of a Linux server by checking disk space, CPU usage, and memory usage against a 60% threshold.

## Usage

### Basic Usage
```bash
./server_health_check.sh
```
Returns either `healthy` or `unhealthy` based on current system metrics.

### Detailed Explanation
```bash
./server_health_check.sh explain
```
Returns the health status along with:
- Current metrics for disk, CPU, and memory usage
- Threshold value (60%)
- Detailed reasons if the server is unhealthy

## Health Criteria
- **Healthy**: All metrics (disk, CPU, memory) are below 60%
- **Unhealthy**: One or more metrics exceed 60%

## Exit Codes
- `0`: Server is healthy
- `1`: Server is unhealthy

## Example Output

### Without explain argument:
```
healthy
```

### With explain argument:
```
Server Status: unhealthy

Current Metrics:
- Disk usage: 76%
- CPU usage: 15%
- Memory usage: 45%

Threshold: 60%

Reasons for unhealthy status:
- Disk usage is 76% (threshold: 60%)
```

## Requirements
- Linux operating system
- Standard utilities: `df`, `free`, `top`, `awk`, `grep`, `sed`
83 changes: 83 additions & 0 deletions server_health_check.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/bin/bash

# Server Health Monitoring Script
# Checks disk space, CPU usage, and memory usage
# Returns "healthy" if all metrics are below 60%, "unhealthy" otherwise
# Usage: ./server_health_check.sh [explain]

THRESHOLD=60

# Function to get disk usage percentage (highest mount point)
get_disk_usage() {
df -h | grep -vE '^Filesystem|tmpfs|cdrom|loop' | awk '{ print $5 }' | sed 's/%//g' | sort -rn | head -1
}

# Function to get CPU usage percentage
get_cpu_usage() {
# Using top to get CPU usage (100 - idle percentage)
# We'll sample for 2 seconds to get an accurate reading
top -bn2 -d 1 | grep "Cpu(s)" | tail -1 | awk '{print $2}' | sed 's/%us,//g' | awk '{printf "%.0f", $1}'
}

# Function to get memory usage percentage
get_memory_usage() {
free | grep Mem | awk '{printf "%.0f", ($3/$2) * 100.0}'
}

# Get current metrics
DISK_USAGE=$(get_disk_usage)
CPU_USAGE=$(get_cpu_usage)
MEMORY_USAGE=$(get_memory_usage)

# Check if any metric exceeds threshold
UNHEALTHY=0
REASONS=()

if [ "$DISK_USAGE" -ge "$THRESHOLD" ]; then
UNHEALTHY=1
REASONS+=("Disk usage is ${DISK_USAGE}% (threshold: ${THRESHOLD}%)")
fi

if [ "$CPU_USAGE" -ge "$THRESHOLD" ]; then
UNHEALTHY=1
REASONS+=("CPU usage is ${CPU_USAGE}% (threshold: ${THRESHOLD}%)")
fi

if [ "$MEMORY_USAGE" -ge "$THRESHOLD" ]; then
UNHEALTHY=1
REASONS+=("Memory usage is ${MEMORY_USAGE}% (threshold: ${THRESHOLD}%)")
fi

# Determine health status
if [ "$UNHEALTHY" -eq 1 ]; then
STATUS="unhealthy"
else
STATUS="healthy"
fi

# Output based on argument
if [ "$1" == "explain" ]; then
echo "Server Status: $STATUS"
echo ""
echo "Current Metrics:"
echo " - Disk usage: ${DISK_USAGE}%"
echo " - CPU usage: ${CPU_USAGE}%"
echo " - Memory usage: ${MEMORY_USAGE}%"
echo ""
echo "Threshold: ${THRESHOLD}%"
echo ""

if [ "$UNHEALTHY" -eq 1 ]; then
echo "Reasons for unhealthy status:"
for reason in "${REASONS[@]}"; do
echo " - $reason"
done
else
echo "All metrics are below the ${THRESHOLD}% threshold."
fi
else
echo "$STATUS"
fi

# Exit with appropriate code
exit $UNHEALTHY