Skip to main content

Observability

Complete Code
The end result of the code developed in this document can be found in the GitHub monorepo springboot-demo-projects, under the tag observability.

Observability is about understanding what your application is doing in production without having to add print statements and redeploy. When something breaks at 3 AM, you need to trace requests across services, see error logs in context, and understand performance bottlenecks without guessing.

How Instrumentation Works

The process breaks down into three logical steps:

  1. Data Collection (The Application): You modify the app to include specialized code that collects various kinds of data about the app's internal state and the host server environment.

  2. Data Storage (The Telemetry Backends): The collected data ships out of the application process and goes to corresponding, optimized telemetry backends, which are databases designed specifically for logs, metrics, or traces.

  3. Visualization (The Dashboard): You use a powerful visualization tool (like Grafana) to pull the stored data from the backends and present it in cohesive, readable dashboards.

The Three Pillars of Telemetry Data

Instrumentation focuses on collecting three distinct kinds of data, often referred to as "The Three Pillars" of observability:

Kind of DataDefinitionCommon Telemetry Backend
LogsText records of specific events or states happening within the application.Loki
MetricsNumerical, aggregate data points (e.g., CPU usage, request latency counts, memory consumption).Prometheus
TracesThe complete journey of a single request as it flows through the various parts of your system.Tempo

Grafana acts as the "single pane of glass" that unifies all three data types. It is a web-based visualization platform that connects to multiple telemetry backends simultaneously.

These are the most common choices in the Grafana ecosystem, but they are not the only options. Alternatives include Elasticsearch for logs, InfluxDB for metrics, and Jaeger for traces.

Observability Architecture Overview

Here's how the moving parts integrate with each other:

Scroll to zoom • Drag corner to resize

The flow works like this:

  • Spring Boot apps expose metrics via Micrometer and send traces via OTLP to Tempo
  • Promtail scrapes Docker container logs and sends them to Loki
  • Prometheus scrapes metrics from each app's /actuator/prometheus endpoint
  • Grafana queries all three backends to display unified dashboards

Here's a summary of the new and modified files:

Files to Create/Modify
File Tree
springboot-demo-projects/
├── build.gradle
├── docker-compose.yml
├── observability/
│ ├── grafana.Dockerfile
│ ├── grafana/
│ │ ├── dashboards/
│ │ │ ├── dashboards.yml
│ │ │ └── *.json
│ │ └── datasources/
│ │ └── datasources.yml
│ ├── loki.Dockerfile
│ ├── loki-config.yml
│ ├── prometheus.Dockerfile
│ ├── prometheus.yml
│ ├── promtail.Dockerfile
│ ├── promtail-config.yml
│ ├── tempo.Dockerfile
│ └── tempo.yml
└── src/
└── main/
└── resources/
└── application.yaml

Repository Setup

Micrometer Registry Prometheus

Add micrometer-registry-prometheus to expose metrics in Prometheus format at /actuator/prometheus

build.gradle
// ...
dependencies {
// ...
implementation 'io.micrometer:micrometer-registry-prometheus:1.17.0-M2'
}
// ...

Application Configuration

Enable the observability endpoints and configure where to send traces.

resources/application.yaml
# ...
management:
endpoints:
web:
exposure:
include: health,info,prometheus,metrics
endpoint:
health:
show-details: always
metrics:
enabled: true
prometheus:
enabled: true
prometheus:
metrics:
export:
enabled: true
metrics:
distribution:
percentiles-histogram:
http:
server:
requests: true
tags:
application: ${spring.application.name}
tracing:
sampling:
probability: 1.0
otlp:
tracing:
endpoint: http://tempo:4318/v1/traces
metrics:
export:
enabled: false

logging:
pattern:
level: "trace_id=%mdc{traceId} span_id=%mdc{spanId} trace_flags=%mdc{traceFlags} %p"
  • management.endpoints.web.exposure.include: Exposes health, info, prometheus, and metrics endpoints
  • management.tracing.sampling.probability: Set to 1.0 to trace 100% of requests (reduce in production)
  • management.otlp.tracing.endpoint: Sends traces to Tempo via OTLP HTTP protocol
  • logging.pattern.level: Embeds trace context (trace_id, span_id, trace_flags) in every log line

Observability Setup

Docker Compose Configuration

Add the observability stack services to docker-compose.yml:

docker-compose.yml
services:
spring-java:
# ...
depends_on:
- tempo
networks:
- monitoring

spring-kotlin:
# ...
depends_on:
- tempo
networks:
- monitoring

spring-groovy:
# ...
depends_on:
- tempo
networks:
- monitoring

prometheus:
build:
context: .
dockerfile: observability/prometheus.Dockerfile
container_name: prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=15d'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
volumes:
- prometheus-data:/prometheus
ports:
- "9090:9090"
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
- monitoring

loki:
build:
context: .
dockerfile: observability/loki.Dockerfile
container_name: loki
ports:
- "3100:3100"
volumes:
- loki-data:/loki
command: -config.file=/etc/loki/local-config.yaml
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3100/ready"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
- monitoring

promtail:
build:
context: .
dockerfile: observability/promtail.Dockerfile
container_name: promtail
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock
command: -config.file=/etc/promtail/config.yml
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:9080/ready"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
depends_on:
- loki
networks:
- monitoring

tempo:
build:
context: .
dockerfile: observability/tempo.Dockerfile
container_name: tempo
ports:
- "3200:3200"
- "4317:4317"
- "4318:4318"
volumes:
- tempo-data:/tmp/tempo
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3200/ready"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
- monitoring

grafana:
build:
context: .
dockerfile: observability/grafana.Dockerfile
container_name: grafana
environment:
- GF_SECURITY_ADMIN_USER=${GF_SECURITY_ADMIN_USER}
- GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD}
- GF_USERS_ALLOW_SIGN_UP=false
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
depends_on:
- prometheus
- loki
- tempo
networks:
- monitoring

networks:
monitoring:
driver: bridge

volumes:
prometheus-data:
driver: local
loki-data:
driver: local
grafana-data:
driver: local
tempo-data:
driver: local

Each Spring Boot service gets two additions:

  • depends_on: Ensures Tempo starts before the apps
  • networks: Joins the monitoring network so apps can reach Tempo

The observability stack includes:

  • Prometheus: Scrapes metrics from all services
  • Loki: Stores and indexes log data
  • Promtail: Collects Docker container logs and forwards to Loki
  • Tempo: Receives and stores distributed traces
  • Grafana: Visualizes metrics, logs, and traces in unified dashboards
Real-World Setups Look Different

In some projects, it's common to find observability services living in a completely separate Docker Compose project, or even managed by third-party providers like Datadog or Grafana Cloud. Bundling everything into a single docker-compose.yml here keeps things simple for documentation and makes the setup easier to follow.

Loki

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus.

Dockerfile:

observability/loki.Dockerfile
FROM alpine:latest AS builder

RUN mkdir -p /loki/chunks /loki/rules

FROM grafana/loki:3.5.10

COPY --from=builder --chown=10001:10001 /loki /loki
COPY observability/loki-config.yml /etc/loki/local-config.yaml

USER 10001

Uses a multi-stage build to create directories with correct permissions (Loki runs as user 10001).

Configuration:

observability/loki-config.yml
auth_enabled: false

server:
http_listen_port: 3100
grpc_listen_port: 9096

common:
instance_addr: 127.0.0.1
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory

query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100

schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h

ruler:
alertmanager_url: http://localhost:9093

compactor:
working_directory: /loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
delete_request_store: filesystem

limits_config:
retention_period: 360h # 15 days, matches Prometheus and Tempo

# By default, Loki will send anonymous usage data to Grafana.
# This can be disabled by setting this to false
analytics:
reporting_enabled: false
  • auth_enabled: false: Disables authentication for local development
  • storage.filesystem: Uses local filesystem storage (suitable for single-node setups)
  • retention_period: Keeps logs for 15 days (360 hours)
  • analytics.reporting_enabled: false: Disables anonymous usage reporting

Promtail

Promtail is an agent which ships the contents of local logs to Loki.

Dockerfile:

observability/promtail.Dockerfile
FROM grafana/promtail:3.5.10
COPY observability/promtail-config.yml /etc/promtail/config.yml

Configuration:

observability/promtail-config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0

positions:
filename: /tmp/positions.yaml

clients:
- url: http://loki:3100/loki/api/v1/push

scrape_configs:
- job_name: containers
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: compose_service
- source_labels: ['compose_service']
regex: 'spring-.*'
action: keep
- source_labels: ['compose_service']
regex: 'spring-(.*)'
target_label: compose_service
replacement: 'spring_${1}'
pipeline_stages:
- regex:
expression: 'trace_id=\S+ span_id=\S+ trace_flags=\S+ (?P<type>\w+) \S+ ---'
- labels:
type:
  • docker_sd_configs: Discovers Docker containers automatically
  • relabel_configs: Filters for only spring-* services and renames labels
  • pipeline_stages: Parses log lines to extract the log level and create indexed labels

The regex pattern trace_id=\S+ span_id=\S+ trace_flags=\S+ (?P<type>\w+) \S+ --- extracts the log level from your Spring Boot log format, enabling filtering by log type (INFO, ERROR, DEBUG, etc.) in Grafana.

Tempo

Tempo is a high-volume, minimal-dependency distributed tracing backend.

Dockerfile:

observability/tempo.Dockerfile
FROM alpine:latest AS builder

RUN mkdir -p /tmp/tempo/blocks /tmp/tempo/wal /tmp/tempo/generator/wal && \
chown -R 10001:10001 /tmp/tempo

FROM grafana/tempo:2.10.0

COPY --from=builder --chown=10001:10001 /tmp/tempo /tmp/tempo
COPY observability/tempo.yml /etc/tempo/tempo.yml

CMD ["-config.file=/etc/tempo/tempo.yml"]

Creates required directories with proper ownership before copying the Tempo binary.

Configuration:

observability/tempo.yml
auth_enabled: false

server:
http_listen_port: 3200

distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"

ingester:
max_block_duration: 5m
trace_idle_period: 10s
max_block_bytes: 1_000_000

storage:
trace:
backend: local
wal:
path: /tmp/tempo/wal
local:
path: /tmp/tempo/blocks

query_frontend:
search:
duration_slo: 5s
throughput_bytes_slo: 1.073741824e+09

metrics_generator:
registry:
external_labels:
source: tempo
storage:
path: /tmp/tempo/generator/wal

overrides:
defaults:
metrics_generator:
processors: [service-graphs, span-metrics]

usage_report:
reporting_enabled: false
  • distributor.receivers.otlp: Accepts traces via OTLP on ports 4317 (gRPC) and 4318 (HTTP)
  • storage.trace.backend: local: Uses local filesystem for trace storage
  • metrics_generator: Enables service graph and span metrics generation
  • usage_report.reporting_enabled: false: Disables telemetry reporting

Prometheus

Prometheus is a systems monitoring and alerting toolkit that collects and stores its metrics as time series data.

Dockerfile:

observability/prometheus.Dockerfile
FROM prom/prometheus:v3.9.1
COPY observability/prometheus.yml /etc/prometheus/prometheus.yml

Configuration:

observability/prometheus.yml
global:
scrape_interval: 60s
evaluation_interval: 60s

scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']

- job_name: 'spring-java'
static_configs:
- targets: ['spring-java:8080']
metrics_path: '/actuator/prometheus'

- job_name: 'spring-kotlin'
static_configs:
- targets: ['spring-kotlin:8080']
metrics_path: '/actuator/prometheus'

- job_name: 'spring-groovy'
static_configs:
- targets: ['spring-groovy:8080']
metrics_path: '/actuator/prometheus'
  • scrape_interval: Collects metrics every 60 seconds
  • scrape_configs: Defines three jobs to scrape metrics from each Spring Boot service
  • metrics_path: Points to /actuator/prometheus where Micrometer exposes metrics

Grafana

Grafana provides visualization and analytics for your observability data.

Dockerfile:

observability/grafana.Dockerfile
FROM grafana/grafana:11.6.11
COPY observability/grafana/datasources /etc/grafana/provisioning/datasources
COPY observability/grafana/dashboards/dashboards.yml /etc/grafana/provisioning/dashboards/dashboards.yml
COPY observability/grafana/dashboards/*.json /var/lib/grafana/dashboards/

Copies provisioning configuration for datasources and dashboards at build time.

Datasources Configuration:

observability/grafana/datasources/datasources.yml
apiVersion: 1

datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
uid: prometheus
isDefault: true
editable: false
jsonData:
httpMethod: POST
manageAlerts: true
exemplarTraceIdDestinations:
- datasourceUid: tempo
name: TraceID
urlDisplayLabel: "View Trace"

- name: Loki
type: loki
access: proxy
url: http://loki:3100
uid: loki
editable: false
jsonData:
derivedFields:
- name: TraceID
matcherRegex: "trace_id=(\w+)"
url: "$${__value.raw}"
datasourceUid: tempo
urlDisplayLabel: "View Trace"

- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
uid: tempo
editable: false
jsonData:
nodeGraph:
enabled: true
tracesToLogs:
datasourceUid: loki
filterByTraceID: true
filterBySpanID: false
tags:
- service.name

Configures three datasources:

  • Prometheus: For metrics, marked as default
  • Loki: For logs, with trace ID extraction for correlation
  • Tempo: For traces, with links back to Loki logs

The exemplarTraceIdDestinations and derivedFields configurations enable trace-to-log correlation. When you see a metric spike, you can click to view the trace; when viewing logs, you can click the trace ID to see the full distributed trace.

Dashboards Configuration:

observability/grafana/dashboards/dashboards.yml
apiVersion: 1

providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: false
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: false

Enables automatic dashboard loading from /var/lib/grafana/dashboards.

Pre-configured Dashboards

The repository includes two pre-configured dashboards adapted from the Grafana community:

  • JVM Micrometer (

    dashboard 4701

    ): JVM metrics including memory, threads, GC, and class loading

  • Spring Boot Observability (

    dashboard 17175

    ): Application-level metrics with HTTP request rates, response times, and error rates

These are omitted from the patch due to their size (thousands of lines of JSON), but you can find them in the repository at observability/grafana/dashboards/.

Production Deployment With Coolify

When deploying to Coolify, the platform automatically detects the new services defined in your docker-compose.yml and starts them alongside your Spring Boot applications. You do not need to manually configure the monitoring stack.

The only additional step is to assign a domain to Grafana so you can access the dashboards:

  1. In Coolify, find the Grafana service in your project
  2. Click on it and set a domain (e.g., grafana.yourdomain.com)
  3. Coolify will handle SSL certificates and routing
Grafana Environment Variables

Grafana expects the environment variables GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD to be set. Make sure to define them in your Coolify service configuration before starting the stack.

Grafana Access

Once logged in, the pre-configured dashboards are available at:

https://grafana-domain-you-have-set-in-coolify/dashboards

You will find:

  • JVM Micrometer: JVM internals (memory pools, garbage collection, threads)
  • Spring Boot Observability: HTTP metrics, response times, error rates
Spring Boot Observability B4064b47d504cf799e722421c3b48a8a