Logging infrastructure

The ToolHive Kubernetes operator provides comprehensive logging capabilities for monitoring, auditing, and troubleshooting MCP servers in production environments. This guide covers the logging infrastructure, configuration options, and integration with logging tools.

Overview

ToolHive Operator provides two types of logs:

Standard application logs - Structured operational logs from the ToolHive operator and proxy components
Audit logs - Security and compliance logs tracking all MCP operations

Log formats and content specifications

Standard application logs

ToolHive uses structured JSON logging. All logs are output in a consistent JSON format for easy parsing and analysis.

Log format specification

{
  "level": "info",
  "ts": 1704067200.123456,
  "caller": "controllers/mcpserver_controller.go:123",
  "msg": "Starting MCP server",
  "server": "github",
  "transport": "sse",
  "container": "thv-github-abc123",
  "namespace": "default",
  "version": "0.1.0"
}

Field definitions

Field	Type	Description
`level`	string	Log level: `debug`, `info`, `warn`, `error`
`ts`	float	Unix timestamp with microseconds
`caller`	string	Source code location
`msg`	string	Log message
`server`	string	MCP server name
`transport`	string	Transport type: `stdio`, `sse`, `streamable-http`
`container`	string	Container name
`namespace`	string	Kubernetes namespace
`version`	string	Component version

Audit logs

Audit logs provide detailed, structured records of all MCP operations for security and compliance purposes. When audit is enabled, the ToolHive proxy generates structured audit events for every MCP operation.

Audit log format specification

{
  "time": "2024-01-01T12:00:00.123456789Z",
  "level": "INFO+2",
  "msg": "audit_event",
  "audit_id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "mcp_tool_call",
  "logged_at": "2024-01-01T12:00:00.123456Z",
  "outcome": "success",
  "component": "github-server",
  "source": {
    "type": "network",
    "value": "10.0.1.5",
    "extra": {
      "user_agent": "node"
    }
  },
  "subjects": {
    "user": "john.doe@example.com",
    "user_id": "user-123"
  },
  "target": {
    "endpoint": "/messages",
    "method": "tools/call",
    "name": "search_issues",
    "type": "tool"
  },
  "metadata": {
    "extra": {
      "duration_ms": 245,
      "transport": "http"
    }
  }
}

Audit field definitions

Field	Type	Description
`time`	string	Timestamp when the log was generated
`level`	string	Log level (INFO+2 for audit events)
`msg`	string	Always "audit_event" for audit logs
`audit_id`	string	Unique identifier for the audit event
`type`	string	Type of MCP operation (see event types below)
`logged_at`	string	UTC timestamp of the event
`outcome`	string	Result of the operation: `success` or `failure`
`component`	string	Name of the MCP server
`source`	object	Request source information
`source.type`	string	Source type (e.g., "network")
`source.value`	string	Source identifier (e.g., IP address)
`source.extra`	object	Additional source metadata
`subjects`	object	User/identity information
`subjects.user`	string	User display name (from JWT claims: name, preferred_username, or email)
`subjects.user_id`	string	User identifier (from JWT sub claim)
`subjects.client_name`	string	Optional: Client application name (if present in JWT claims)
`subjects.client_version`	string	Optional: Client version (if present in JWT claims)
`target`	object	Target resource information
`target.endpoint`	string	API endpoint path
`target.method`	string	MCP method called
`target.name`	string	Tool/resource name
`target.type`	string	Target type (e.g., "tool")
`metadata`	object	Additional metadata
`metadata.extra.duration_ms`	number	Operation duration in milliseconds
`metadata.extra.transport`	string	Transport protocol used

Audit event types

Event Type	Description
`mcp_initialize`	MCP server initialization
`mcp_tool_call`	Tool execution request
`mcp_tools_list`	List available tools
`mcp_resource_read`	Resource access
`mcp_resources_list`	List available resources
`mcp_prompt_get`	Prompt retrieval
`mcp_prompts_list`	List available prompts
`mcp_notification`	MCP notifications
`mcp_ping`	Health check pings
`mcp_completion`	Request completion

Configuration

Operator-level logging

Configure logging for the ToolHive operator in the Helm values:

# values.yaml
operator:
  # Log level is controlled by the debug flag
  debug: false # Production: use info level (set to true for debug level)

MCPServer logging configuration

Configure audit logging for individual MCP servers in the MCPServer resource:

apiVersion: mcp.toolhive.io/v1alpha1
kind: MCPServer
metadata:
  name: github-server
spec:
  image: ghcr.io/stacklok/toolhive/servers/github:latest

  # Audit logging configuration
  audit:
    enabled: true # Audit logs are output to stdout alongside standard logs

:::info Audit logs are output to stdout alongside standard application logs. Log collectors can differentiate between standard and audit logs by checking for the presence of the audit_id field.

Note: User information in the subjects field is populated from JWT claims when OIDC authentication is configured. The system will use the name, preferred_username, or email claim (in that order) for the user display name. Without authentication middleware, the user will appear as "anonymous". :::

Log collection strategies

Using Kubernetes native logging

Kubernetes automatically collects container stdout/stderr logs. Access them using:

# View operator logs
kubectl logs -n toolhive-system deployment/toolhive-operator

# View MCP server logs with JSON formatting
kubectl logs -n default deployment/github-server-proxy | jq '.'

# Follow logs in real-time
kubectl logs -f -n default deployment/github-server-proxy

# View logs from the last hour
kubectl logs --since=1h -n default deployment/github-server-proxy

Log collection with Fluent Bit

Since both standard and audit logs are output to stdout, configure Fluent Bit to parse and route them appropriately:

# fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: toolhive-system
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         5
        Daemon        off
        Log_Level     info
        Parsers_File  parsers.conf

    [INPUT]
        Name              tail
        Path              /var/log/containers/*toolhive*.log
        Parser            docker
        Tag               kube.*
        Refresh_Interval  5
        Mem_Buf_Limit     5MB

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off

    [FILTER]
        Name    parser
        Match   kube.*
        Key_Name log
        Parser  json
        Reserve_Data On

    [FILTER]
        Name    rewrite_tag
        Match   kube.*
        Rule    $audit_id ^.+$ audit.${TAG} false
        Emitter_Name re_emitted

    [OUTPUT]
        Name  es
        Match kube.*
        Host  elasticsearch.logging.svc.cluster.local
        Port  9200
        Index toolhive-logs
        Type  _doc

    [OUTPUT]
        Name  es
        Match audit.*
        Host  elasticsearch.logging.svc.cluster.local
        Port  9200
        Index toolhive-audit
        Type  _doc

  parsers.conf: |
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%LZ

    [PARSER]
        Name        json
        Format      json

Deploy Fluent Bit as a DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: toolhive-system
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      serviceAccountName: fluent-bit
      containers:
        - name: fluent-bit
          image: fluent/fluent-bit:latest
          volumeMounts:
            - name: varlog
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluent-bit-config
              mountPath: /fluent-bit/etc/
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: fluent-bit-config
          configMap:
            name: fluent-bit-config

Parsing standard logs

Using jq for JSON log analysis

Parse structured logs efficiently with jq:

# Extract error messages with timestamps
kubectl logs deployment/toolhive-operator -n toolhive-system | \
  jq -r 'select(.level=="error") | "\(.ts | todate) [\(.level)] \(.msg)"'

# Count log entries by level
kubectl logs deployment/github-server-proxy | \
  jq -r '.level' | sort | uniq -c

# Find slow operations (>1000ms)
kubectl logs deployment/github-server-proxy | \
  jq 'select(.duration_ms > 1000) | {time: .ts | todate, msg: .msg, duration: .duration_ms}'

# Extract logs for specific MCP server
kubectl logs deployment/github-server-proxy | \
  jq 'select(.server=="github")'

# Group errors by source
kubectl logs deployment/toolhive-operator -n toolhive-system | \
  jq -r 'select(.level=="error") | .caller' | sort | uniq -c | sort -rn

Log streaming and filtering

Stream and filter logs in real-time:

# Stream logs and filter for errors
kubectl logs -f deployment/toolhive-operator | \
  jq 'select(.level=="error" or .level=="warn")'

# Monitor specific MCP operations
kubectl logs -f deployment/github-server-proxy | \
  jq 'select(.msg | contains("tool_call"))'

# Watch for authentication issues
kubectl logs -f deployment/github-server-proxy | \
  jq 'select(.msg | contains("auth") or .msg | contains("unauthorized"))'

Parsing audit logs

Security analysis queries

Extract security-relevant information from audit logs (which are mixed with standard logs in stdout):

# Extract only audit logs (they have audit_id field)
kubectl logs deployment/github-server-proxy | \
  jq 'select(.audit_id)'

# Find all tool calls by a specific user
kubectl logs deployment/github-server-proxy | \
  jq 'select(.audit_id and .subjects.user_id=="user-123" and .event_type=="mcp_tool_call")'

# Generate audit report CSV
kubectl logs deployment/github-server-proxy | \
  jq -r 'select(.audit_id) | [.logged_at, .subjects.user_id, .event_type, .outcome, .duration_ms] | @csv' > audit_report.csv

# Detect suspicious activity (multiple failures)
kubectl logs deployment/github-server-proxy | \
  jq 'select(.audit_id and .outcome=="failure")' | \
  jq -s 'group_by(.subjects.user_id) | map({user: .[0].subjects.user_id, failures: length}) | sort_by(.failures) | reverse'

# Track resource access patterns
kubectl logs deployment/github-server-proxy | \
  jq 'select(.audit_id and .event_type=="mcp_resource_read")' | \
  jq -r '[.logged_at, .subjects.user_id, .target.name] | @csv'

# Analyze tool usage frequency
kubectl logs deployment/github-server-proxy | \
  jq 'select(.audit_id and .event_type=="mcp_tool_call") | .target.name' | \
  sort | uniq -c | sort -rn

Compliance reporting

Generate compliance reports from audit logs:

# Daily activity summary
kubectl logs --since=24h deployment/github-server-proxy | \
  jq 'select(.audit_id)' | \
  jq -s '{
    total_operations: length,
    unique_users: [.[].subjects.user_id] | unique | length,
    tool_calls: [.[] | select(.event_type=="mcp_tool_call")] | length,
    failures: [.[] | select(.outcome=="failure")] | length
  }'

# User activity timeline
kubectl logs deployment/github-server-proxy | \
  jq 'select(.audit_id and .subjects.user_id=="user-123")' | \
  jq -r '[.logged_at, .event_type, .outcome] | @tsv' | \
  sort

# Export audit logs for specific time period
kubectl logs --since-time="2024-01-01T00:00:00Z" --until-time="2024-01-02T00:00:00Z" \
  deployment/github-server-proxy | \
  jq 'select(.audit_id)' > audit_export_20240101.json

Integration with logging tools

ELK Stack (Elasticsearch, Logstash, Kibana)

Configure Filebeat to collect ToolHive structured logs:

# filebeat-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: logging
data:
  filebeat.yml: |
    filebeat.inputs:
    - type: container
      paths:
        - /var/log/containers/*toolhive*.log
      json.keys_under_root: true
      json.add_error_key: true
      json.message_key: msg
      processors:
        - add_kubernetes_metadata:
            host: ${NODE_NAME}
            matchers:
            - logs_path:
                logs_path: "/var/log/containers/"
        - drop_event:
            when:
              equals:
                level: "debug"  # Don't forward debug logs to save space

    # Separate audit logs from standard logs
    - type: container
      paths:
        - /var/log/containers/*toolhive*.log
      json.keys_under_root: true
      include_lines: ['audit_id']
      fields:
        log_type: audit
      fields_under_root: true

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
      indices:
        - index: "toolhive-audit-%{+yyyy.MM.dd}"
          when.equals:
            log_type: "audit"
        - index: "toolhive-%{+yyyy.MM.dd}"

Logstash pipeline for processing ToolHive logs:

# logstash-pipeline.conf
input {
  beats {
    port => 5044
  }
}

filter {
  if [component] == "toolhive" {
    # Parse timestamp
    date {
      match => [ "ts", "UNIX" ]
      target => "@timestamp"
    }

    # Add severity field for Kibana
    mutate {
      add_field => {
        "severity" => "%{level}"
      }
    }

    # Identify and tag audit logs
    if [audit_id] {
      mutate {
        add_tag => [ "audit" ]
        add_field => { "log_type" => "audit" }
      }
    }
  }
}

output {
  if "audit" in [tags] {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
      index => "toolhive-audit-%{+YYYY.MM.dd}"
    }
  } else {
    elasticsearch {
      hosts => ["elasticsearch:9200"]
      index => "toolhive-%{+YYYY.MM.dd}"
    }
  }
}

Splunk

Configure Splunk Universal Forwarder for structured logs:

# inputs.conf
[monitor:///var/log/containers/*toolhive*]
sourcetype = _json
index = toolhive

# Since audit logs are in stdout, use props.conf to route them

Configure Splunk props.conf to identify and route audit logs:

# props.conf
[_json]
SHOULD_LINEMERGE = false
KV_MODE = json
TRUNCATE = 0

# Route audit logs to separate index
TRANSFORMS-route_audit = route_audit_logs

Configure Splunk transforms.conf:

# transforms.conf
[route_audit_logs]
REGEX = "audit_id":\s*"[^"]+"
DEST_KEY = _MetaData:Index
FORMAT = toolhive_audit

Splunk search queries for ToolHive:

# MCP tool usage dashboard
index=toolhive event_type="mcp_tool_call"
| stats count by component, target.name
| sort -count

# Error rate by component
index=toolhive level="error"
| timechart span=5m count by component

# Audit trail for compliance
index=toolhive_audit
| table logged_at, subjects.user_id, event_type, outcome, duration_ms
| sort logged_at

Datadog

Configure Datadog Agent for structured log collection:

# datadog-values.yaml
datadog:
  logs:
    enabled: true
    containerCollectAll: true
    containerCollectUsingFiles: true

  # Process JSON logs
  confd:
    toolhive.yaml: |-
      logs:
        - type: file
          path: /var/log/containers/*toolhive*.log
          service: toolhive
          source: kubernetes
          sourcecategory: kubernetes
          processing_rules:
            - type: multi_line
              name: json_logs
              pattern: ^\{

Best practices

Security considerations

Encrypt audit logs: Ensure the audit log storage is encrypted
Implement RBAC: Restrict access to log data using Kubernetes RBAC

# RBAC for log access
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: log-reader
  namespace: toolhive-system
rules:
  - apiGroups: ['']
    resources: ['pods/log']
    verbs: ['get', 'list']

Performance optimization

Set appropriate log levels: Use info level in production
Exclude high-frequency events: Don't audit ping/health checks
Use log sampling: For high-volume environments, sample non-critical logs
Implement log rotation: Prevent disk exhaustion

Troubleshooting

Common issues

Missing audit events

Verify audit configuration and check logs:

# Check audit config
kubectl get mcpserver github-server -o jsonpath='{.spec.audit}'

# Check if audit logs are being generated (look for audit_id field)
kubectl logs deployment/github-server-proxy | jq 'select(.audit_id)' | head -5

# Count audit events in the last hour
kubectl logs --since=1h deployment/github-server-proxy | \
  jq 'select(.audit_id)' | wc -l

Example configurations

Basic production setup

apiVersion: mcp.toolhive.io/v1alpha1
kind: MCPServer
metadata:
  name: github-server
  namespace: production
spec:
  image: ghcr.io/stacklok/toolhive/servers/github:latest

  # Enable audit logging (outputs to stdout)
  audit:
    enabled: true

High-compliance environment

For environments requiring audit log separation and long-term retention:

apiVersion: mcp.toolhive.io/v1alpha1
kind: MCPServer
metadata:
  name: secure-server
  namespace: secure
spec:
  image: ghcr.io/stacklok/toolhive/servers/secure:latest

  # Enable comprehensive audit logging
  audit:
    enabled: true
---
# Deploy Fluent Bit to separate and forward audit logs
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-audit-config
  namespace: secure
data:
  fluent-bit.conf: |
    [INPUT]
        Name              tail
        Path              /var/log/containers/*secure-server*.log
        Parser            json
        Tag               audit.*

    [FILTER]
        Name    grep
        Match   audit.*
        Regex   audit_id .+

    [OUTPUT]
        Name  s3
        Match audit.*
        bucket audit-logs-compliance
        region us-east-1
        use_put_object On
        s3_key_format /logs/%Y/%m/%d/${TAG}_%{time:yyyyMMdd-HHmmss}.json

High log volume

Reduce logging verbosity:

# Disable debug logs
kubectl patch deployment toolhive-operator -n toolhive-system --type='json' \
  -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["--debug=false"]}]'

Log parsing errors

Ensure logs are valid JSON:

# Validate JSON format
kubectl logs deployment/github-server-proxy | head -1 | jq '.'

# If parsing fails, check for non-JSON output
kubectl logs deployment/github-server-proxy | grep -v '^{'

Overview​

Log formats and content specifications​

Standard application logs​

Log format specification​

Field definitions​

Audit logs​

Audit log format specification​

Audit field definitions​

Audit event types​

Configuration​

Operator-level logging​

MCPServer logging configuration​

Log collection strategies​

Using Kubernetes native logging​

Log collection with Fluent Bit​

Parsing standard logs​

Using jq for JSON log analysis​

Log streaming and filtering​

Parsing audit logs​

Security analysis queries​

Compliance reporting​

Integration with logging tools​

ELK Stack (Elasticsearch, Logstash, Kibana)​

Splunk​

Datadog​

Best practices​

Security considerations​

Performance optimization​

Troubleshooting​

Common issues​

Missing audit events​

Example configurations​

Basic production setup​

High-compliance environment​

High log volume​

Log parsing errors​