Skip to main content

Logging infrastructure

The ToolHive Kubernetes operator provides comprehensive logging capabilities for monitoring, auditing, and troubleshooting MCP servers in production environments. This guide covers the logging infrastructure, configuration options, and integration with logging tools.

Overview

ToolHive Operator provides two types of logs:

  1. Standard application logs - Structured operational logs from the ToolHive operator and proxy components
  2. Audit logs - Security and compliance logs tracking all MCP operations

Log formats and content specifications

Standard application logs

ToolHive uses structured JSON logging. All logs are output in a consistent JSON format for easy parsing and analysis.

Log format specification

{
"level": "info",
"ts": 1704067200.123456,
"caller": "controllers/mcpserver_controller.go:123",
"msg": "Starting MCP server",
"server": "github",
"transport": "sse",
"container": "thv-github-abc123",
"namespace": "default",
"version": "0.1.0"
}

Field definitions

FieldTypeDescription
levelstringLog level: debug, info, warn, error
tsfloatUnix timestamp with microseconds
callerstringSource code location
msgstringLog message
serverstringMCP server name
transportstringTransport type: stdio, sse, streamable-http
containerstringContainer name
namespacestringKubernetes namespace
versionstringComponent version

Audit logs

Audit logs provide detailed, structured records of all MCP operations for security and compliance purposes. When audit is enabled, the ToolHive proxy generates structured audit events for every MCP operation.

Audit log format specification

{
"time": "2024-01-01T12:00:00.123456789Z",
"level": "INFO+2",
"msg": "audit_event",
"audit_id": "550e8400-e29b-41d4-a716-446655440000",
"type": "mcp_tool_call",
"logged_at": "2024-01-01T12:00:00.123456Z",
"outcome": "success",
"component": "github-server",
"source": {
"type": "network",
"value": "10.0.1.5",
"extra": {
"user_agent": "node"
}
},
"subjects": {
"user": "john.doe@example.com",
"user_id": "user-123"
},
"target": {
"endpoint": "/messages",
"method": "tools/call",
"name": "search_issues",
"type": "tool"
},
"metadata": {
"extra": {
"duration_ms": 245,
"transport": "http"
}
}
}

Audit field definitions

FieldTypeDescription
timestringTimestamp when the log was generated
levelstringLog level (INFO+2 for audit events)
msgstringAlways "audit_event" for audit logs
audit_idstringUnique identifier for the audit event
typestringType of MCP operation (see event types below)
logged_atstringUTC timestamp of the event
outcomestringResult of the operation: success or failure
componentstringName of the MCP server
sourceobjectRequest source information
source.typestringSource type (e.g., "network")
source.valuestringSource identifier (e.g., IP address)
source.extraobjectAdditional source metadata
subjectsobjectUser/identity information
subjects.userstringUser display name (from JWT claims: name, preferred_username, or email)
subjects.user_idstringUser identifier (from JWT sub claim)
subjects.client_namestringOptional: Client application name (if present in JWT claims)
subjects.client_versionstringOptional: Client version (if present in JWT claims)
targetobjectTarget resource information
target.endpointstringAPI endpoint path
target.methodstringMCP method called
target.namestringTool/resource name
target.typestringTarget type (e.g., "tool")
metadataobjectAdditional metadata
metadata.extra.duration_msnumberOperation duration in milliseconds
metadata.extra.transportstringTransport protocol used

Audit event types

Event TypeDescription
mcp_initializeMCP server initialization
mcp_tool_callTool execution request
mcp_tools_listList available tools
mcp_resource_readResource access
mcp_resources_listList available resources
mcp_prompt_getPrompt retrieval
mcp_prompts_listList available prompts
mcp_notificationMCP notifications
mcp_pingHealth check pings
mcp_completionRequest completion

Configuration

Operator-level logging

Configure logging for the ToolHive operator in the Helm values:

# values.yaml
operator:
# Log level is controlled by the debug flag
debug: false # Production: use info level (set to true for debug level)

MCPServer logging configuration

Configure audit logging for individual MCP servers in the MCPServer resource:

apiVersion: mcp.toolhive.io/v1alpha1
kind: MCPServer
metadata:
name: github-server
spec:
image: ghcr.io/stacklok/toolhive/servers/github:latest

# Audit logging configuration
audit:
enabled: true # Audit logs are output to stdout alongside standard logs

:::info Audit logs are output to stdout alongside standard application logs. Log collectors can differentiate between standard and audit logs by checking for the presence of the audit_id field.

Note: User information in the subjects field is populated from JWT claims when OIDC authentication is configured. The system will use the name, preferred_username, or email claim (in that order) for the user display name. Without authentication middleware, the user will appear as "anonymous". :::

Log collection strategies

Using Kubernetes native logging

Kubernetes automatically collects container stdout/stderr logs. Access them using:

# View operator logs
kubectl logs -n toolhive-system deployment/toolhive-operator

# View MCP server logs with JSON formatting
kubectl logs -n default deployment/github-server-proxy | jq '.'

# Follow logs in real-time
kubectl logs -f -n default deployment/github-server-proxy

# View logs from the last hour
kubectl logs --since=1h -n default deployment/github-server-proxy

Log collection with Fluent Bit

Since both standard and audit logs are output to stdout, configure Fluent Bit to parse and route them appropriately:

# fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: toolhive-system
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon off
Log_Level info
Parsers_File parsers.conf

[INPUT]
Name tail
Path /var/log/containers/*toolhive*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 5MB

[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude Off

[FILTER]
Name parser
Match kube.*
Key_Name log
Parser json
Reserve_Data On

[FILTER]
Name rewrite_tag
Match kube.*
Rule $audit_id ^.+$ audit.${TAG} false
Emitter_Name re_emitted

[OUTPUT]
Name es
Match kube.*
Host elasticsearch.logging.svc.cluster.local
Port 9200
Index toolhive-logs
Type _doc

[OUTPUT]
Name es
Match audit.*
Host elasticsearch.logging.svc.cluster.local
Port 9200
Index toolhive-audit
Type _doc

parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ

[PARSER]
Name json
Format json

Deploy Fluent Bit as a DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: toolhive-system
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: fluent/fluent-bit:latest
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config

Parsing standard logs

Using jq for JSON log analysis

Parse structured logs efficiently with jq:

# Extract error messages with timestamps
kubectl logs deployment/toolhive-operator -n toolhive-system | \
jq -r 'select(.level=="error") | "\(.ts | todate) [\(.level)] \(.msg)"'

# Count log entries by level
kubectl logs deployment/github-server-proxy | \
jq -r '.level' | sort | uniq -c

# Find slow operations (>1000ms)
kubectl logs deployment/github-server-proxy | \
jq 'select(.duration_ms > 1000) | {time: .ts | todate, msg: .msg, duration: .duration_ms}'

# Extract logs for specific MCP server
kubectl logs deployment/github-server-proxy | \
jq 'select(.server=="github")'

# Group errors by source
kubectl logs deployment/toolhive-operator -n toolhive-system | \
jq -r 'select(.level=="error") | .caller' | sort | uniq -c | sort -rn

Log streaming and filtering

Stream and filter logs in real-time:

# Stream logs and filter for errors
kubectl logs -f deployment/toolhive-operator | \
jq 'select(.level=="error" or .level=="warn")'

# Monitor specific MCP operations
kubectl logs -f deployment/github-server-proxy | \
jq 'select(.msg | contains("tool_call"))'

# Watch for authentication issues
kubectl logs -f deployment/github-server-proxy | \
jq 'select(.msg | contains("auth") or .msg | contains("unauthorized"))'

Parsing audit logs

Security analysis queries

Extract security-relevant information from audit logs (which are mixed with standard logs in stdout):

# Extract only audit logs (they have audit_id field)
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id)'

# Find all tool calls by a specific user
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .subjects.user_id=="user-123" and .event_type=="mcp_tool_call")'

# Generate audit report CSV
kubectl logs deployment/github-server-proxy | \
jq -r 'select(.audit_id) | [.logged_at, .subjects.user_id, .event_type, .outcome, .duration_ms] | @csv' > audit_report.csv

# Detect suspicious activity (multiple failures)
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .outcome=="failure")' | \
jq -s 'group_by(.subjects.user_id) | map({user: .[0].subjects.user_id, failures: length}) | sort_by(.failures) | reverse'

# Track resource access patterns
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .event_type=="mcp_resource_read")' | \
jq -r '[.logged_at, .subjects.user_id, .target.name] | @csv'

# Analyze tool usage frequency
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .event_type=="mcp_tool_call") | .target.name' | \
sort | uniq -c | sort -rn

Compliance reporting

Generate compliance reports from audit logs:

# Daily activity summary
kubectl logs --since=24h deployment/github-server-proxy | \
jq 'select(.audit_id)' | \
jq -s '{
total_operations: length,
unique_users: [.[].subjects.user_id] | unique | length,
tool_calls: [.[] | select(.event_type=="mcp_tool_call")] | length,
failures: [.[] | select(.outcome=="failure")] | length
}'

# User activity timeline
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .subjects.user_id=="user-123")' | \
jq -r '[.logged_at, .event_type, .outcome] | @tsv' | \
sort

# Export audit logs for specific time period
kubectl logs --since-time="2024-01-01T00:00:00Z" --until-time="2024-01-02T00:00:00Z" \
deployment/github-server-proxy | \
jq 'select(.audit_id)' > audit_export_20240101.json

Integration with logging tools

ELK Stack (Elasticsearch, Logstash, Kibana)

Configure Filebeat to collect ToolHive structured logs:

# filebeat-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: logging
data:
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*toolhive*.log
json.keys_under_root: true
json.add_error_key: true
json.message_key: msg
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- drop_event:
when:
equals:
level: "debug" # Don't forward debug logs to save space

# Separate audit logs from standard logs
- type: container
paths:
- /var/log/containers/*toolhive*.log
json.keys_under_root: true
include_lines: ['audit_id']
fields:
log_type: audit
fields_under_root: true

output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
indices:
- index: "toolhive-audit-%{+yyyy.MM.dd}"
when.equals:
log_type: "audit"
- index: "toolhive-%{+yyyy.MM.dd}"

Logstash pipeline for processing ToolHive logs:

# logstash-pipeline.conf
input {
beats {
port => 5044
}
}

filter {
if [component] == "toolhive" {
# Parse timestamp
date {
match => [ "ts", "UNIX" ]
target => "@timestamp"
}

# Add severity field for Kibana
mutate {
add_field => {
"severity" => "%{level}"
}
}

# Identify and tag audit logs
if [audit_id] {
mutate {
add_tag => [ "audit" ]
add_field => { "log_type" => "audit" }
}
}
}
}

output {
if "audit" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "toolhive-audit-%{+YYYY.MM.dd}"
}
} else {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "toolhive-%{+YYYY.MM.dd}"
}
}
}

Splunk

Configure Splunk Universal Forwarder for structured logs:

# inputs.conf
[monitor:///var/log/containers/*toolhive*]
sourcetype = _json
index = toolhive

# Since audit logs are in stdout, use props.conf to route them

Configure Splunk props.conf to identify and route audit logs:

# props.conf
[_json]
SHOULD_LINEMERGE = false
KV_MODE = json
TRUNCATE = 0

# Route audit logs to separate index
TRANSFORMS-route_audit = route_audit_logs

Configure Splunk transforms.conf:

# transforms.conf
[route_audit_logs]
REGEX = "audit_id":\s*"[^"]+"
DEST_KEY = _MetaData:Index
FORMAT = toolhive_audit

Splunk search queries for ToolHive:

# MCP tool usage dashboard
index=toolhive event_type="mcp_tool_call"
| stats count by component, target.name
| sort -count

# Error rate by component
index=toolhive level="error"
| timechart span=5m count by component

# Audit trail for compliance
index=toolhive_audit
| table logged_at, subjects.user_id, event_type, outcome, duration_ms
| sort logged_at

Datadog

Configure Datadog Agent for structured log collection:

# datadog-values.yaml
datadog:
logs:
enabled: true
containerCollectAll: true
containerCollectUsingFiles: true

# Process JSON logs
confd:
toolhive.yaml: |-
logs:
- type: file
path: /var/log/containers/*toolhive*.log
service: toolhive
source: kubernetes
sourcecategory: kubernetes
processing_rules:
- type: multi_line
name: json_logs
pattern: ^\{

Best practices

Security considerations

  1. Encrypt audit logs: Ensure the audit log storage is encrypted
  2. Implement RBAC: Restrict access to log data using Kubernetes RBAC
# RBAC for log access
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: log-reader
namespace: toolhive-system
rules:
- apiGroups: ['']
resources: ['pods/log']
verbs: ['get', 'list']

Performance optimization

  1. Set appropriate log levels: Use info level in production
  2. Exclude high-frequency events: Don't audit ping/health checks
  3. Use log sampling: For high-volume environments, sample non-critical logs
  4. Implement log rotation: Prevent disk exhaustion

Troubleshooting

Common issues

Missing audit events

Verify audit configuration and check logs:

# Check audit config
kubectl get mcpserver github-server -o jsonpath='{.spec.audit}'

# Check if audit logs are being generated (look for audit_id field)
kubectl logs deployment/github-server-proxy | jq 'select(.audit_id)' | head -5

# Count audit events in the last hour
kubectl logs --since=1h deployment/github-server-proxy | \
jq 'select(.audit_id)' | wc -l

Example configurations

Basic production setup

apiVersion: mcp.toolhive.io/v1alpha1
kind: MCPServer
metadata:
name: github-server
namespace: production
spec:
image: ghcr.io/stacklok/toolhive/servers/github:latest

# Enable audit logging (outputs to stdout)
audit:
enabled: true

High-compliance environment

For environments requiring audit log separation and long-term retention:

apiVersion: mcp.toolhive.io/v1alpha1
kind: MCPServer
metadata:
name: secure-server
namespace: secure
spec:
image: ghcr.io/stacklok/toolhive/servers/secure:latest

# Enable comprehensive audit logging
audit:
enabled: true
---
# Deploy Fluent Bit to separate and forward audit logs
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-audit-config
namespace: secure
data:
fluent-bit.conf: |
[INPUT]
Name tail
Path /var/log/containers/*secure-server*.log
Parser json
Tag audit.*

[FILTER]
Name grep
Match audit.*
Regex audit_id .+

[OUTPUT]
Name s3
Match audit.*
bucket audit-logs-compliance
region us-east-1
use_put_object On
s3_key_format /logs/%Y/%m/%d/${TAG}_%{time:yyyyMMdd-HHmmss}.json

High log volume

Reduce logging verbosity:

# Disable debug logs
kubectl patch deployment toolhive-operator -n toolhive-system --type='json' \
-p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["--debug=false"]}]'

Log parsing errors

Ensure logs are valid JSON:

# Validate JSON format
kubectl logs deployment/github-server-proxy | head -1 | jq '.'

# If parsing fails, check for non-JSON output
kubectl logs deployment/github-server-proxy | grep -v '^{'