Logging infrastructure
The ToolHive Kubernetes operator provides comprehensive logging capabilities for monitoring, auditing, and troubleshooting MCP servers in production environments. This guide covers the logging infrastructure, configuration options, and integration with logging tools.
Overview
ToolHive Operator provides two types of logs:
- Standard application logs - Structured operational logs from the ToolHive operator and proxy components
- Audit logs - Security and compliance logs tracking all MCP operations
Log formats and content specifications
Standard application logs
ToolHive uses structured JSON logging. All logs are output in a consistent JSON format for easy parsing and analysis.
Log format specification
{
"level": "info",
"ts": 1704067200.123456,
"caller": "controllers/mcpserver_controller.go:123",
"msg": "Starting MCP server",
"server": "github",
"transport": "sse",
"container": "thv-github-abc123",
"namespace": "default",
"version": "0.1.0"
}
Field definitions
| Field | Type | Description |
|---|---|---|
level | string | Log level: debug, info, warn, error |
ts | float | Unix timestamp with microseconds |
caller | string | Source code location |
msg | string | Log message |
server | string | MCP server name |
transport | string | Transport type: stdio, sse, streamable-http |
container | string | Container name |
namespace | string | Kubernetes namespace |
version | string | Component version |
Audit logs
Audit logs provide detailed, structured records of all MCP operations for security and compliance purposes. When audit is enabled, the ToolHive proxy generates structured audit events for every MCP operation.
Audit log format specification
{
"time": "2024-01-01T12:00:00.123456789Z",
"level": "INFO+2",
"msg": "audit_event",
"audit_id": "550e8400-e29b-41d4-a716-446655440000",
"type": "mcp_tool_call",
"logged_at": "2024-01-01T12:00:00.123456Z",
"outcome": "success",
"component": "github-server",
"source": {
"type": "network",
"value": "10.0.1.5",
"extra": {
"user_agent": "node"
}
},
"subjects": {
"user": "john.doe@example.com",
"user_id": "user-123"
},
"target": {
"endpoint": "/messages",
"method": "tools/call",
"name": "search_issues",
"type": "tool"
},
"metadata": {
"extra": {
"duration_ms": 245,
"transport": "http"
}
}
}
Audit field definitions
| Field | Type | Description |
|---|---|---|
time | string | Timestamp when the log was generated |
level | string | Log level (INFO+2 for audit events) |
msg | string | Always "audit_event" for audit logs |
audit_id | string | Unique identifier for the audit event |
type | string | Type of MCP operation (see event types below) |
logged_at | string | UTC timestamp of the event |
outcome | string | Result of the operation: success or failure |
component | string | Name of the MCP server |
source | object | Request source information |
source.type | string | Source type (e.g., "network") |
source.value | string | Source identifier (e.g., IP address) |
source.extra | object | Additional source metadata |
subjects | object | User/identity information |
subjects.user | string | User display name (from JWT claims: name, preferred_username, or email) |
subjects.user_id | string | User identifier (from JWT sub claim) |
subjects.client_name | string | Optional: Client application name (if present in JWT claims) |
subjects.client_version | string | Optional: Client version (if present in JWT claims) |
target | object | Target resource information |
target.endpoint | string | API endpoint path |
target.method | string | MCP method called |
target.name | string | Tool/resource name |
target.type | string | Target type (e.g., "tool") |
metadata | object | Additional metadata |
metadata.extra.duration_ms | number | Operation duration in milliseconds |
metadata.extra.transport | string | Transport protocol used |
Audit event types
| Event Type | Description |
|---|---|
mcp_initialize | MCP server initialization |
mcp_tool_call | Tool execution request |
mcp_tools_list | List available tools |
mcp_resource_read | Resource access |
mcp_resources_list | List available resources |
mcp_prompt_get | Prompt retrieval |
mcp_prompts_list | List available prompts |
mcp_notification | MCP notifications |
mcp_ping | Health check pings |
mcp_completion | Request completion |
Configuration
Operator-level logging
Configure logging for the ToolHive operator in the Helm values:
# values.yaml
operator:
# Log level is controlled by the debug flag
debug: false # Production: use info level (set to true for debug level)
MCPServer logging configuration
Configure audit logging for individual MCP servers in the MCPServer resource:
apiVersion: mcp.toolhive.io/v1alpha1
kind: MCPServer
metadata:
name: github-server
spec:
image: ghcr.io/stacklok/toolhive/servers/github:latest
# Audit logging configuration
audit:
enabled: true # Audit logs are output to stdout alongside standard logs
:::info Audit logs are output to stdout alongside standard application logs. Log
collectors can differentiate between standard and audit logs by checking for the
presence of the audit_id field.
Note: User information in the subjects field is populated from JWT claims
when OIDC authentication is configured. The system will use the name,
preferred_username, or email claim (in that order) for the user display
name. Without authentication middleware, the user will appear as "anonymous".
:::
Log collection strategies
Using Kubernetes native logging
Kubernetes automatically collects container stdout/stderr logs. Access them using:
# View operator logs
kubectl logs -n toolhive-system deployment/toolhive-operator
# View MCP server logs with JSON formatting
kubectl logs -n default deployment/github-server-proxy | jq '.'
# Follow logs in real-time
kubectl logs -f -n default deployment/github-server-proxy
# View logs from the last hour
kubectl logs --since=1h -n default deployment/github-server-proxy
Log collection with Fluent Bit
Since both standard and audit logs are output to stdout, configure Fluent Bit to parse and route them appropriately:
# fluent-bit-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: toolhive-system
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon off
Log_Level info
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*toolhive*.log
Parser docker
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 5MB
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude Off
[FILTER]
Name parser
Match kube.*
Key_Name log
Parser json
Reserve_Data On
[FILTER]
Name rewrite_tag
Match kube.*
Rule $audit_id ^.+$ audit.${TAG} false
Emitter_Name re_emitted
[OUTPUT]
Name es
Match kube.*
Host elasticsearch.logging.svc.cluster.local
Port 9200
Index toolhive-logs
Type _doc
[OUTPUT]
Name es
Match audit.*
Host elasticsearch.logging.svc.cluster.local
Port 9200
Index toolhive-audit
Type _doc
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ
[PARSER]
Name json
Format json
Deploy Fluent Bit as a DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: toolhive-system
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: fluent/fluent-bit:latest
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
Parsing standard logs
Using jq for JSON log analysis
Parse structured logs efficiently with jq:
# Extract error messages with timestamps
kubectl logs deployment/toolhive-operator -n toolhive-system | \
jq -r 'select(.level=="error") | "\(.ts | todate) [\(.level)] \(.msg)"'
# Count log entries by level
kubectl logs deployment/github-server-proxy | \
jq -r '.level' | sort | uniq -c
# Find slow operations (>1000ms)
kubectl logs deployment/github-server-proxy | \
jq 'select(.duration_ms > 1000) | {time: .ts | todate, msg: .msg, duration: .duration_ms}'
# Extract logs for specific MCP server
kubectl logs deployment/github-server-proxy | \
jq 'select(.server=="github")'
# Group errors by source
kubectl logs deployment/toolhive-operator -n toolhive-system | \
jq -r 'select(.level=="error") | .caller' | sort | uniq -c | sort -rn
Log streaming and filtering
Stream and filter logs in real-time:
# Stream logs and filter for errors
kubectl logs -f deployment/toolhive-operator | \
jq 'select(.level=="error" or .level=="warn")'
# Monitor specific MCP operations
kubectl logs -f deployment/github-server-proxy | \
jq 'select(.msg | contains("tool_call"))'
# Watch for authentication issues
kubectl logs -f deployment/github-server-proxy | \
jq 'select(.msg | contains("auth") or .msg | contains("unauthorized"))'
Parsing audit logs
Security analysis queries
Extract security-relevant information from audit logs (which are mixed with standard logs in stdout):
# Extract only audit logs (they have audit_id field)
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id)'
# Find all tool calls by a specific user
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .subjects.user_id=="user-123" and .event_type=="mcp_tool_call")'
# Generate audit report CSV
kubectl logs deployment/github-server-proxy | \
jq -r 'select(.audit_id) | [.logged_at, .subjects.user_id, .event_type, .outcome, .duration_ms] | @csv' > audit_report.csv
# Detect suspicious activity (multiple failures)
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .outcome=="failure")' | \
jq -s 'group_by(.subjects.user_id) | map({user: .[0].subjects.user_id, failures: length}) | sort_by(.failures) | reverse'
# Track resource access patterns
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .event_type=="mcp_resource_read")' | \
jq -r '[.logged_at, .subjects.user_id, .target.name] | @csv'
# Analyze tool usage frequency
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .event_type=="mcp_tool_call") | .target.name' | \
sort | uniq -c | sort -rn
Compliance reporting
Generate compliance reports from audit logs:
# Daily activity summary
kubectl logs --since=24h deployment/github-server-proxy | \
jq 'select(.audit_id)' | \
jq -s '{
total_operations: length,
unique_users: [.[].subjects.user_id] | unique | length,
tool_calls: [.[] | select(.event_type=="mcp_tool_call")] | length,
failures: [.[] | select(.outcome=="failure")] | length
}'
# User activity timeline
kubectl logs deployment/github-server-proxy | \
jq 'select(.audit_id and .subjects.user_id=="user-123")' | \
jq -r '[.logged_at, .event_type, .outcome] | @tsv' | \
sort
# Export audit logs for specific time period
kubectl logs --since-time="2024-01-01T00:00:00Z" --until-time="2024-01-02T00:00:00Z" \
deployment/github-server-proxy | \
jq 'select(.audit_id)' > audit_export_20240101.json
Integration with logging tools
ELK Stack (Elasticsearch, Logstash, Kibana)
Configure Filebeat to collect ToolHive structured logs:
# filebeat-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: logging
data:
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*toolhive*.log
json.keys_under_root: true
json.add_error_key: true
json.message_key: msg
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- drop_event:
when:
equals:
level: "debug" # Don't forward debug logs to save space
# Separate audit logs from standard logs
- type: container
paths:
- /var/log/containers/*toolhive*.log
json.keys_under_root: true
include_lines: ['audit_id']
fields:
log_type: audit
fields_under_root: true
output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
indices:
- index: "toolhive-audit-%{+yyyy.MM.dd}"
when.equals:
log_type: "audit"
- index: "toolhive-%{+yyyy.MM.dd}"
Logstash pipeline for processing ToolHive logs:
# logstash-pipeline.conf
input {
beats {
port => 5044
}
}
filter {
if [component] == "toolhive" {
# Parse timestamp
date {
match => [ "ts", "UNIX" ]
target => "@timestamp"
}
# Add severity field for Kibana
mutate {
add_field => {
"severity" => "%{level}"
}
}
# Identify and tag audit logs
if [audit_id] {
mutate {
add_tag => [ "audit" ]
add_field => { "log_type" => "audit" }
}
}
}
}
output {
if "audit" in [tags] {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "toolhive-audit-%{+YYYY.MM.dd}"
}
} else {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "toolhive-%{+YYYY.MM.dd}"
}
}
}
Splunk
Configure Splunk Universal Forwarder for structured logs:
# inputs.conf
[monitor:///var/log/containers/*toolhive*]
sourcetype = _json
index = toolhive
# Since audit logs are in stdout, use props.conf to route them
Configure Splunk props.conf to identify and route audit logs:
# props.conf
[_json]
SHOULD_LINEMERGE = false
KV_MODE = json
TRUNCATE = 0
# Route audit logs to separate index
TRANSFORMS-route_audit = route_audit_logs
Configure Splunk transforms.conf:
# transforms.conf
[route_audit_logs]
REGEX = "audit_id":\s*"[^"]+"
DEST_KEY = _MetaData:Index
FORMAT = toolhive_audit
Splunk search queries for ToolHive:
# MCP tool usage dashboard
index=toolhive event_type="mcp_tool_call"
| stats count by component, target.name
| sort -count
# Error rate by component
index=toolhive level="error"
| timechart span=5m count by component
# Audit trail for compliance
index=toolhive_audit
| table logged_at, subjects.user_id, event_type, outcome, duration_ms
| sort logged_at
Datadog
Configure Datadog Agent for structured log collection:
# datadog-values.yaml
datadog:
logs:
enabled: true
containerCollectAll: true
containerCollectUsingFiles: true
# Process JSON logs
confd:
toolhive.yaml: |-
logs:
- type: file
path: /var/log/containers/*toolhive*.log
service: toolhive
source: kubernetes
sourcecategory: kubernetes
processing_rules:
- type: multi_line
name: json_logs
pattern: ^\{
Best practices
Security considerations
- Encrypt audit logs: Ensure the audit log storage is encrypted
- Implement RBAC: Restrict access to log data using Kubernetes RBAC
# RBAC for log access
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: log-reader
namespace: toolhive-system
rules:
- apiGroups: ['']
resources: ['pods/log']
verbs: ['get', 'list']
Performance optimization
- Set appropriate log levels: Use
infolevel in production - Exclude high-frequency events: Don't audit ping/health checks
- Use log sampling: For high-volume environments, sample non-critical logs
- Implement log rotation: Prevent disk exhaustion
Troubleshooting
Common issues
Missing audit events
Verify audit configuration and check logs:
# Check audit config
kubectl get mcpserver github-server -o jsonpath='{.spec.audit}'
# Check if audit logs are being generated (look for audit_id field)
kubectl logs deployment/github-server-proxy | jq 'select(.audit_id)' | head -5
# Count audit events in the last hour
kubectl logs --since=1h deployment/github-server-proxy | \
jq 'select(.audit_id)' | wc -l
Example configurations
Basic production setup
apiVersion: mcp.toolhive.io/v1alpha1
kind: MCPServer
metadata:
name: github-server
namespace: production
spec:
image: ghcr.io/stacklok/toolhive/servers/github:latest
# Enable audit logging (outputs to stdout)
audit:
enabled: true
High-compliance environment
For environments requiring audit log separation and long-term retention:
apiVersion: mcp.toolhive.io/v1alpha1
kind: MCPServer
metadata:
name: secure-server
namespace: secure
spec:
image: ghcr.io/stacklok/toolhive/servers/secure:latest
# Enable comprehensive audit logging
audit:
enabled: true
---
# Deploy Fluent Bit to separate and forward audit logs
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-audit-config
namespace: secure
data:
fluent-bit.conf: |
[INPUT]
Name tail
Path /var/log/containers/*secure-server*.log
Parser json
Tag audit.*
[FILTER]
Name grep
Match audit.*
Regex audit_id .+
[OUTPUT]
Name s3
Match audit.*
bucket audit-logs-compliance
region us-east-1
use_put_object On
s3_key_format /logs/%Y/%m/%d/${TAG}_%{time:yyyyMMdd-HHmmss}.json
High log volume
Reduce logging verbosity:
# Disable debug logs
kubectl patch deployment toolhive-operator -n toolhive-system --type='json' \
-p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args", "value": ["--debug=false"]}]'
Log parsing errors
Ensure logs are valid JSON:
# Validate JSON format
kubectl logs deployment/github-server-proxy | head -1 | jq '.'
# If parsing fails, check for non-JSON output
kubectl logs deployment/github-server-proxy | grep -v '^{'