Testing the whole stack | Valentin Vivier

In the previous parts I covered the Go internals of stamusctl: the template system, Docker plumbing, the daemon, and unit testing with a fake filesystem. This part is about testing the templates themselves, the thing stamusctl actually deploys.

Unit tests on the Go side tell me the CLI works. They don’t tell me that a new template version actually produces a healthy stack, that PCAP replay indexes data into OpenSearch, or that an upgrade from the published version to a development build doesn’t lose documents. For that I need integration tests that deploy real containers and assert against real services.

The dev shell

The repo has five development dependencies: Go, GNU Make, hadolint, just, and bats. A Nix flake pins them all:

buildInputs = with pkgs; [
  go
  gnumake
  hadolint
  just
  bats
];

nix develop and you’re ready. No version mismatches, no “works on my machine.” The shell works on Linux and macOS, x86 and ARM.

Justfile as orchestrator

I use just instead of Make because the tasks aren’t builds, there are no source/target dependencies. Just is a command runner. The top-level justfile wraps builds, linting, and testing:

set shell := ["bash", "-euo", "pipefail", "-c"]

# Lint Dockerfile with hadolint
lint-dockerfile:
    hadolint Dockerfile

# Validate YAML files with yamllint (skip Go template files)
lint-yaml:
    @find data/ \( -name '*.yaml' -o -name '*.yml' \) \
        ! -name '*.compose.yaml' ! -name 'compose.yml' \
        -exec grep -rL '{''{'  {} + | xargs yamllint

# Run integration tests (delegates to tests/justfile)
test *args:
    just -f tests/justfile {{args}}

The YAML linting has a wrinkle. Template files contain Go syntax ({{ .Values.x }}) that isn’t valid YAML, so grep -L '{{' filters them out before yamllint sees them. Compose files are excluded entirely because they’re almost entirely template directives.

The test recipes delegate to a nested tests/justfile that handles the full lifecycle: deploy, test, teardown. This separation keeps the top-level justfile focused on build tasks while the test justfile manages a complex stateful workflow.

Test structure

Tests are Bats (Bash Automated Testing System). Each .bats file covers a functional area:

tests/
├── justfile              # Deploy → test → teardown orchestration
├── helpers/
│   └── common.bash       # Shared assertions and helpers
├── fixtures/
│   ├── eternalblue.pcap
│   └── eternalromance-meterpreter.pcap
└── integration/
    ├── setup.bats        # Wait for all services to be healthy
    ├── services.bats     # Container naming, volumes, network
    ├── opensearch.bats   # Cluster health, version, MCP plugin
    ├── data-pipeline.bats # PCAP → Suricata → Fluentd → OpenSearch
    ├── ism-policy.bats   # Index lifecycle management
    ├── scirius.bats      # Web UI, auth, REST API
    ├── upgrade.bats      # Data survival across template versions
    └── aichat.bats       # LibreChat deployment

The test justfile runs them in order: deploy a stack from the local template, run the core test groups, then the upgrade tests, then the aichat tests (which need a separate deployment with aichat.enabled=true), then teardown.

# Run full test suite: deploy → test → teardown
test: deploy test-core test-upgrade test-aichat teardown

# Run core tests only (no upgrade, no aichat)
test-core: test-services test-opensearch test-pipeline test-ism test-scirius

PCAP fixtures and the data pipeline

The test fixtures are PCAPs from the testmynids.org project: an EternalBlue exploit and an EternalRomance-to-Meterpreter full kill chain. Both trigger well-known Suricata ET Open rules, so the alerts are deterministic.

The data pipeline tests exercise the full path from PCAP file to searchable data:

@test "readpcap injects data without error" {
    run stamusctl_compose readpcap "$PCAP_FILE"
    [[ "$status" -eq 0 ]]
}

@test "suricata wrote events to eve.json" {
    local eve
    eve="$(eve_json_path)"
    [[ -f "$eve" ]] && [[ -s "$eve" ]]
}

@test "opensearch has indexed documents after injection" {
    wait_for_documents 1 60
    local count
    count="$(opensearch_count)"
    save_state "baseline_count" "$count"
}

@test "suricata generated alerts from PCAP" {
    # Poll for alert events in OpenSearch
    local result
    result="$(opensearch_query '/_search?q=event_type:alert&size=1')"
    local hit_count
    hit_count="$(echo "$result" | grep -oP '"total"\s*:\s*\{"value"\s*:\s*\K\d+')"
    [[ "$hit_count" -gt 0 ]]
}

Each test isolates a different failure mode. If readpcap succeeds but eve.json is empty, Suricata didn’t process the capture. If eve.json has events but OpenSearch has zero documents, Fluentd isn’t forwarding. If documents exist but no alerts, the ruleset isn’t loaded. When a test fails, the diagnostics tell you where the chain broke.

The helpers poll instead of sleeping. wait_for_documents checks the OpenSearch document count every 5 seconds until a threshold is met or a timeout is reached. wait_for_healthy does the same for service health. This matters in CI where startup time varies.

Upgrade tests

The upgrade suite tests the most common deployment scenario: someone running the published release upgrades to the latest template.

setup_file() {
    # Deploy from the registry (released version)
    "$STAMUSCTL_BIN" compose init --default -c "$TEST_CONFIG" \
        "suricata.interfaces=${iface}"
    stamusctl_compose up -d
    wait_for_healthy "$STARTUP_TIMEOUT"

    # Inject data
    stamusctl_compose readpcap "$pcap"
    wait_for_documents 1 60
}

@test "baseline: capture document count" {
    local count
    count="$(opensearch_count)"
    save_state "pre_upgrade_count" "$count"
}

@test "upgrade to local template succeeds" {
    run stamusctl_compose update --template "$TEMPLATE_PATH"
    [[ "$status" -eq 0 ]]
}

@test "data survives upgrade" {
    local pre_count post_count
    pre_count="$(load_state "pre_upgrade_count")"
    post_count="$(opensearch_count)"
    local threshold=$(( pre_count * 95 / 100 ))
    [[ "$post_count" -ge "$threshold" ]]
}

The flow: deploy the released version, inject data via PCAP, record the document count, run compose update to switch to the local template, bring the stack back up, and verify that at least 95% of documents survived. The 5% margin accounts for background processes that might add or remove a few documents during the transition.

Cross-test state (save_state / load_state) writes to files in a temp directory. Bats runs tests in subshells, so you can’t share state through variables. A state file for baseline_count carries the pre-upgrade document count to the assertion test.

Scirius auth testing

The Scirius tests verify the full authentication flow, not just “is the service up”:

@test "scirius login with default credentials succeeds" {
    # Fetch CSRF token from login page
    docker exec "$scirius" curl -sf -c /tmp/test-cookies \
        "http://localhost:8000/accounts/login/"

    local csrf
    csrf="$(docker exec "$scirius" cat /tmp/test-cookies \
        | grep csrftoken | awk '{print $NF}')"

    # POST login: 302 redirect means success
    local http_code
    http_code="$(docker exec "$scirius" curl -sf -o /dev/null -w '%{http_code}' \
        -b /tmp/test-cookies -c /tmp/test-cookies \
        -X POST "http://localhost:8000/accounts/login/" \
        -H "Referer: http://localhost:8000/accounts/login/" \
        -d "csrfmiddlewaretoken=${csrf}&username=clearndr&password=clearndr")"
    [[ "$http_code" == "302" ]]
}

This catches two classes of bugs: Django configuration issues (CSRF misconfigured, auth backends not loaded) and default user provisioning failures. Both happened in past releases. The test does a real Django login with CSRF token extraction, cookie handling, and redirect verification.

The REST API test goes further: it generates a DRF auth token via manage.py shell, then makes an authenticated API request. If the REST framework package isn’t installed or the token model isn’t migrated, this catches it.

CI pipeline

GitHub Actions runs the full suite on every PR:

integration:
    needs: [lint, build]
    runs-on: ubuntu-latest
    timeout-minutes: 30
    steps:
        - uses: actions/checkout@v4
        - name: Install stamusctl
          run: |
              LATEST=$(curl -s https://api.github.com/repos/.../releases/latest \
                  | grep tag_name | cut -d '"' -f 4)
              curl -sL ".../stamusctl-linux-amd64" -o /usr/local/bin/stamusctl
        - name: Run integration tests
          run: just test
        - name: Collect logs on failure
          if: failure()
          run: just test-status || true && just test-logs || true

The workflow installs the latest released stamusctl binary, not a dev build. This is intentional: the tests verify that the templates work with the tool users actually have. If a template change requires a CLI change, this catches the incompatibility before the template ships.

Lint and build jobs run in parallel. Integration runs after both pass. On failure, it dumps the service status and logs so you can diagnose without re-running locally. The 30-minute timeout is generous but necessary. Pulling 10+ Docker images on a cold CI runner takes time.

The workflow accepts an optional pcap_url input for manual runs, so you can test with a specific capture file. The fixture PCAPs are committed to the repo for the default case.

The readpcap path bug

There’s a workaround in the test justfile that’s worth mentioning because it’s the kind of bug integration tests exist to catch:

# Workaround: when -c is an absolute path, readpcap constructs
# a broken double-absolute mount path for containers-data.
_fix-readpcap-paths:
    #!/usr/bin/env bash
    config="{{TEST_CONFIG}}"
    if [[ "$config" == /* ]]; then
        broken_base="$(pwd)/${config#/}/containers-data"
        clean="$config/containers-data"
        if [[ -d "$broken_base" ]] && [[ ! -e "$clean" ]]; then
            ln -sf "$broken_base" "$clean"
        fi
    fi

When you pass an absolute config path (-c /full/path), stamusctl’s readpcap command concatenates the current directory with the absolute path, producing a path like /workdir//full/path/containers-data. Docker creates the directory at that broken path. The symlink bridges the two locations so later commands find the data.

I found this because the tests use absolute paths (justfile resolves them) while manual usage typically uses relative paths. The unit tests with the fake filesystem didn’t catch it because they test the path logic in isolation, not the actual mount behavior. Integration tests caught it in the first run.

The template repo is at github.com/StamusNetworks/stamusctl-public-templates.