Fat Komodo

Pretty sure PID exhaustion is not a good thing, and it leads to mental exhaustion too.

So I'm nearing completion on a big fat migration of all my services. Consolidating services across five servers into one is no small task, especially when I can't make up my mind on how things should be organized, but we're here to talk about the server's challenges not mine. Though I suppose they might as well be one and the same. The point is, when you suddenly grow to 35+ containers on a system you start to run into issues here and there, simply due to the scale.

I already wrote a post covering the issue I had with the default docker address pools, so now I will break down a more frustrating and confusing problem I had with stability. For reference, my primary server has 8 threads and 100GB of RAM more or less dedicated to services, and is running Unraid. I'm using Komodo to manage everything docker related and some more things besides. Shoutout to Komodo.

The Problem

I have my Komodo stack running in Docker Compose, everything is healthy and I have just finished configuring Traefik for everything. I'm excited to go switch my wildcard DNS entry to point at Traefik. For but a moment everything is working great once I've flushed DNS on my laptop, and then I am hit with a bunch of notifications for services going down with mainly 504 errors. Docker shows a bunch of things as unhealthy, and Traefik is shitting itself but everything still seems to be working just fine when I connect directly. It's already late in the evening and I'm pulling my hair out, so I just revert back to NPM which is still somehow working fine.

After a good night's rest I can really dig into it, and here's a quick overview of the symptoms:

  • Anything with a health check shows 'unhealthy' with an error like OCI runtime exec failed: open /run/user/0/runc-process980265188: no such file or directory: unknown
  • Unable to get a terminal in the container within Komodo, but I can with docker exec in a shell
  • The services themselves seem to be working fine, and I can reach their web pages directly, but Traefik fails with a gateway timeout
  • Restarting docker helps for a while, but eventually it happens again

A Journey of Discovery

With only a small amount of frustration mixed in for good measure. Given these strange symptoms, my first thought is that my very liberal use of /mnt/user/appdata (which is on the array, because I'm a fool) is causing stability issues. I knew that using an SSD based cache pool was much better for performance and I needed to fix that anyway, so I moved all my data mounts to /mnt/cache/appdata and made sure the Docker vdisk and everything were entirely on the cache disk. This did not help.

Or, it did, just not with the present dilemma. Next I started looking into potential issues with user namespaces. Namespaces are very important for containerization, I recommend reading about them if you aren't familiar. My system already had a functionally unlimited max value of 514194 namespaces:

# sysctl -a | grep namespace
user.max_cgroup_namespaces = 514194
user.max_ipc_namespaces = 514194
user.max_mnt_namespaces = 514194
user.max_net_namespaces = 514194
user.max_pid_namespaces = 514194
user.max_time_namespaces = 514194
user.max_user_namespaces = 514194
user.max_uts_namespaces = 514194

While I can pretty conclusively say it's not a namespace issue, I'm running out of ideas. I mess around with docker a bit more, and after some time of waffling around not really knowing what to do I notice a setting in Unraid: 'Docker PID limit'. This got me thinking about how Komodo periphery works when run as a container, because I know a systemd service is generally more reliable but I didn't realize that might extend beyond things like mount paths.

A Solution in Sight

Taking a fresh look at my symptoms, it seems pretty obvious that it was an issue with Komodo itself, being unable to start a terminal etc. My next step was pretty obvious, run periphery directly instead of in a container. One more small hiccup though, Unraid doesn't use systemd so I can't just run the handy install script and be done.

So I download the binary and create a simple init script for it which is triggered by a user script in Unraid at array start...
#!/usr/bin/env bash
# Safe init-style script for Komodo Periphery (Unraid)

set -o errexit
set -o nounset
set -o pipefail

# Configuration (edit as needed)
PERIPHERY_BIN="/mnt/cache/appdata/komodo/periphery/periphery"
CONFIG_DIR="/mnt/cache/appdata/komodo/periphery"
LOG_FILE="/var/log/komodo-periphery.log"
PID_FILE="/var/run/komodo-periphery.pid"
APPDATA="/mnt/cache/appdata"
export APPDATA # used in container configs

log() { printf '%s %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$*"; }

ensure_dirs() {
    mkdir -p "$(dirname "$LOG_FILE")" "$(dirname "$PID_FILE")"
    touch "$LOG_FILE" || true
}

is_running() {
    local pid
    if [ -f "$PID_FILE" ]; then
        pid=$(<"$PID_FILE")
        if [ -n "$pid" ] && ps -p "$pid" > /dev/null 2>&1; then
            printf '%s' "$pid"
            return 0
        fi
    fi
    return 1
}

start() {
    log "Starting Komodo Periphery..."

    local pid
    if pid=$(is_running); then
        log "Periphery is already running (PID: $pid)"
        return 0
    fi

    if [ ! -x "$PERIPHERY_BIN" ]; then
        log "ERROR: executable not found or not executable: $PERIPHERY_BIN"
        return 2
    fi

    ensure_dirs

    # use setsid to avoid job-control issues and capture PID reliably
    setsid "$PERIPHERY_BIN" --config-path "$CONFIG_DIR" >> "$LOG_FILE" 2>&1 &
    echo "$!" > "$PID_FILE"

    sleep 1
    if pid=$(is_running); then
        log "Periphery started (PID: $pid)"
        return 0
    else
        log "Failed to start Periphery; check $LOG_FILE"
        [ -f "$PID_FILE" ] && rm -f "$PID_FILE"
        return 3
    fi
}

stop() {
    local pid
    if pid=$(is_running); then
        log "Stopping Periphery (PID: $pid)..."
        if kill "$pid" > /dev/null 2>&1; then
            # Wait up to 60s for graceful shutdown
            for _ in {1..60}; do
                if ! ps -p "$pid" > /dev/null 2>&1; then
                    rm -f "$PID_FILE"
                    log "Periphery stopped successfully (PID: $pid)"
                    return 0
                fi
                sleep 1
            done
            log "Graceful stop timeout; forcing kill (PID: $pid)"
            kill -9 "$pid" > /dev/null 2>&1 || true
            rm -f "$PID_FILE"
            return 0
        else
            log "Failed to send SIGTERM to PID $pid; cleaning up PID file"
            rm -f "$PID_FILE"
            return 4
        fi
    else
        if [ -f "$PID_FILE" ]; then
            rm -f "$PID_FILE"
            log "Removed stale PID file"
        fi
        log "Periphery is not running"
        return 0
    fi
}

status() {
    if pid=$(is_running); then
        echo "Periphery running (PID: $pid)"
        return 0
    else
        echo "Periphery not running"
        return 1
    fi
}

restart() {
    stop
    sleep 1
    start
}

case "${1:-}" in
    start)
        start
        exit $?
        ;;
    stop)
        stop
        exit $?
        ;;
    restart)
        restart
        exit $?
        ;;
    status)
        status
        exit $?
        ;;
    *)
        echo "Usage: $0 {start|stop|restart|status}"
        exit 1
        ;;
esac

I then had to redeploy everything in Komodo, but with this my containers are once again stable as they should be. In hindsight it seems like such a simple thing, but it took me a couple days of frustrated confusion to get to the solution.

The Root

To be honest my understanding of the underlying cause is still only a hunch, albeit a pretty strong one. If I really wanted to confirm I would switch back to the containerized periphery agent and reproduce the issue, then disable the PID limit (2048 by default) for the container and see if the problems are resolved. Unfortunately I don't really feel like doing that, so here's my logic (and take it with a grain of salt):

  • The issue only happens some time after Docker is running, suggesting there is a threshold for the issues to start (i.e. running out of PIDs)
  • Services are still running just fine, i.e. existing processes work as expected
  • The error is a bit misleading, but it happens when attempting to run a process like a healthcheck command
  • The Komodo container cannot start a terminal in a container, but docker exec directly on the host can
  • Traefik fails while NPM works, because Traefik running on the same host cannot create any new processes to handle incoming connections
  • Restarting Docker helps temporarily, until we once again run out of PIDs

There are still a couple things that confuse me though, because this assumes that the Docker container PIDs started by the periphery container belong to the container, but in theory the Docker daemon should own them. I didn't dig enough into how the cgroups were allocated either, so again take this reasoning with a grain of salt.

Summary Execution

Lesson learned, just use the binary periphery agent instead of the containerized one. It just makes everything aside from deploying updates to the agent itself so much easier, and you can easily automate updates to the agent anyway. It is a bit more challenging and convoluted on Unraid given the lack of a persistent root filesystem, but it's still not too hard to work around.

Sometimes it still surprises me, the type of issues you come across once you get past a certain scale. Hopefully this story of my pain is helpful to you, and maybe one day I'll be able to commit to the debugging commandments and actually verify the root cause and solution :)