Observability

Complete Code

The end result of the code developed in this document can be found in the GitHub monorepo springboot-demo-projects, commit(s) Observability

In the previous document you deployed your Spring Boot applications to a VPS with Coolify. Now that the apps are running in production, you need to know what they are actually doing—without adding print statements and redeploying. Observability is about understanding your application in production: tracing requests across services, seeing error logs in context, and understanding performance bottlenecks without guessing.

How instrumentation works

The process breaks down into three logical steps:

Data Collection (The Application): You modify the app to include specialized code that collects various kinds of data about the app's internal state and the host server environment.
Data Storage (The Telemetry Backends): The collected data ships out of the application process and goes to corresponding, optimized telemetry backends, which are databases designed specifically for logs, metrics, or traces.
Visualization (The Dashboard): You use a visualization tool (like Grafana) to pull the stored data from the backends and present it in cohesive, readable dashboards.

The three pillars of telemetry data

Instrumentation focuses on collecting three distinct kinds of data, often referred to as "The Three Pillars" of observability:

Kind of Data	Definition	Common Telemetry Backend
Logs	Text records of specific events or states happening within the application.	Loki
Metrics	Numerical, aggregate data points (e.g., CPU usage, request latency counts, memory consumption).	Prometheus
Traces	The complete journey of a single request as it flows through the various parts of your system.	Tempo

Grafana is the "single pane of glass" that unifies all three data types. It is a web-based visualization platform that connects to multiple telemetry backends simultaneously.

These are the most common choices in the Grafana ecosystem, but they are not the only options. Alternatives include Elasticsearch for logs, InfluxDB for metrics, and Jaeger for traces.

Observability architecture overview

Here's how the moving parts integrate with each other:

Scroll to zoom • Drag corner to resize

The flow works like this:

Spring Boot apps expose metrics via Micrometer and send traces via OTLP to Tempo
Promtail scrapes Docker container logs and sends them to Loki
Prometheus scrapes metrics from each app's /actuator/prometheus endpoint
Grafana queries all three backends to display unified dashboards

Here's a summary of the new and modified files:

Files to Create/Modify

File Tree
springboot-demo-projects/
├── build.gradle
├── docker-compose.yml
├── observability/
│   ├── grafana.Dockerfile
│   ├── grafana/
│   │   ├── dashboards/
│   │   │   ├── dashboards.yml
│   │   │   └── *.json
│   │   └── datasources/
│   │       └── datasources.yml
│   ├── loki.Dockerfile
│   ├── loki-config.yml
│   ├── prometheus.Dockerfile
│   ├── prometheus.yml
│   ├── promtail.Dockerfile
│   ├── promtail-config.yml
│   ├── tempo.Dockerfile
│   └── tempo.yml
└── src/
    └── main/
        └── resources/
            └── application.yaml

Repository setup

Micrometer Registry Prometheus

Add micrometer-registry-prometheus to expose metrics in Prometheus format at /actuator/prometheus

Java
Kotlin
Groovy

build.gradle (dependencies block)
  implementation 'io.micrometer:micrometer-registry-prometheus:1.17.0-M2'

build.gradle.kts (dependencies block)
  implementation("io.micrometer:micrometer-registry-prometheus:1.17.0-M2")

build.gradle (dependencies block)
  implementation 'io.micrometer:micrometer-registry-prometheus:1.17.0-M2'

Application configuration

Enable the observability endpoints and configure where to send traces.

Java
Kotlin
Groovy

resources/application.yaml
management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus,metrics
  endpoint:
    health:
      show-details: always
    metrics:
      enabled: true
    prometheus:
      enabled: true
  prometheus:
    metrics:
      export:
        enabled: true
  metrics:
    distribution:
      percentiles-histogram:
        http:
          server:
            requests: true
    tags:
      application: ${spring.application.name}
  tracing:
    sampling:
      probability: 1.0
  otlp:
    tracing:
      endpoint: http://tempo:4318/v1/traces
    metrics:
      export:
        enabled: false

logging:
  pattern:
    level: "trace_id=%mdc{traceId} span_id=%mdc{spanId} trace_flags=%mdc{traceFlags} %p"

resources/application.yaml
management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus,metrics
  endpoint:
    health:
      show-details: always
    metrics:
      enabled: true
    prometheus:
      enabled: true
  prometheus:
    metrics:
      export:
        enabled: true
  metrics:
    distribution:
      percentiles-histogram:
        http:
          server:
            requests: true
    tags:
      application: ${spring.application.name}
  tracing:
    sampling:
      probability: 1.0
  otlp:
    tracing:
      endpoint: http://tempo:4318/v1/traces
    metrics:
      export:
        enabled: false

logging:
  pattern:
    level: "trace_id=%mdc{traceId} span_id=%mdc{spanId} trace_flags=%mdc{traceFlags} %p"

resources/application.yaml
management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus,metrics
  endpoint:
    health:
      show-details: always
    metrics:
      enabled: true
    prometheus:
      enabled: true
  prometheus:
    metrics:
      export:
        enabled: true
  metrics:
    distribution:
      percentiles-histogram:
        http:
          server:
            requests: true
    tags:
      application: ${spring.application.name}
  tracing:
    sampling:
      probability: 1.0
  otlp:
    tracing:
      endpoint: http://tempo:4318/v1/traces
    metrics:
      export:
        enabled: false

logging:
  pattern:
    level: "trace_id=%mdc{traceId} span_id=%mdc{spanId} trace_flags=%mdc{traceFlags} %p"

management.endpoints.web.exposure.include: Exposes health, info, prometheus, and metrics endpoints
management.tracing.sampling.probability: Set to 1.0 to trace 100% of requests (reduce in production)
management.otlp.tracing.endpoint: Sends traces to Tempo via OTLP HTTP protocol
logging.pattern.level: Embeds trace context (trace_id, span_id, trace_flags) in every log line

Observability setup

Docker Compose configuration

Add the observability stack services to docker-compose.yml:

docker-compose.yml
services:
  spring-java:
    build:
      context: .
      dockerfile: spring_java/Dockerfile
    container_name: spring-java
    ports:
      - "8081:8080"
    environment:
      - SPRING_PROFILES_ACTIVE=prod
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped
    depends_on:
      - tempo
    networks:
      - monitoring

  spring-kotlin:
    build:
      context: .
      dockerfile: spring_kotlin/Dockerfile
    container_name: spring-kotlin
    ports:
      - "8082:8080"
    environment:
      - SPRING_PROFILES_ACTIVE=prod
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped
    depends_on:
      - tempo
    networks:
      - monitoring

  spring-groovy:
    build:
      context: .
      dockerfile: spring_groovy/Dockerfile
    container_name: spring-groovy
    ports:
      - "8083:8080"
    environment:
      - SPRING_PROFILES_ACTIVE=prod
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    restart: unless-stopped
    depends_on:
      - tempo
    networks:
      - monitoring

  prometheus:
    build:
      context: .
      dockerfile: observability/prometheus.Dockerfile
    container_name: prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=15d'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    volumes:
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:9090/-/healthy"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    networks:
      - monitoring

  loki:
    build:
      context: .
      dockerfile: observability/loki.Dockerfile
    container_name: loki
    ports:
      - "3100:3100"
    volumes:
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3100/ready"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    networks:
      - monitoring

  promtail:
    build:
      context: .
      dockerfile: observability/promtail.Dockerfile
    container_name: promtail
    volumes:
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock
    command: -config.file=/etc/promtail/config.yml
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:9080/ready"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    depends_on:
      - loki
    networks:
      - monitoring

  tempo:
    build:
      context: .
      dockerfile: observability/tempo.Dockerfile
    container_name: tempo
    ports:
      - "3200:3200"
      - "4317:4317"
      - "4318:4318"
    volumes:
      - tempo-data:/tmp/tempo
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3200/ready"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    networks:
      - monitoring

  grafana:
    build:
      context: .
      dockerfile: observability/grafana.Dockerfile
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=${GF_SECURITY_ADMIN_USER}
      - GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD}
      - GF_USERS_ALLOW_SIGN_UP=false
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    depends_on:
      - prometheus
      - loki
      - tempo
    networks:
      - monitoring

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus-data:
    driver: local
  loki-data:
    driver: local
  grafana-data:
    driver: local
  tempo-data:
    driver: local

Each Spring Boot service gets two additions:

depends_on: Ensures Tempo starts before the apps
networks: Joins the monitoring network so apps can reach Tempo

The observability stack includes:

Prometheus: Scrapes metrics from all services
Loki: Stores and indexes log data
Promtail: Collects Docker container logs and forwards to Loki
Tempo: Receives and stores distributed traces
Grafana: Visualizes metrics, logs, and traces in unified dashboards

Real-World Setups Look Different

In some projects, it's common to find observability services living in a completely separate Docker Compose project, or even managed by third-party providers like Datadog or Grafana Cloud. Bundling everything into a single docker-compose.yml here keeps things simple for documentation and makes the setup easier to follow.

Loki

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus.

Dockerfile:

observability/loki.Dockerfile
FROM alpine:latest AS builder

RUN mkdir -p /loki/chunks /loki/rules

FROM grafana/loki:3.5.10

COPY --from=builder --chown=10001:10001 /loki /loki
COPY observability/loki-config.yml /etc/loki/local-config.yaml

USER 10001

Uses a multi-stage build to create directories with correct permissions (Loki runs as user 10001).

Configuration:

observability/loki-config.yml
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  instance_addr: 127.0.0.1
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://localhost:9093

compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150
  delete_request_store: filesystem

limits_config:
  retention_period: 360h # 15 days, matches Prometheus and Tempo

# By default, Loki will send anonymous usage data to Grafana.
# This can be disabled by setting this to false
analytics:
  reporting_enabled: false

auth_enabled: false: Disables authentication for local development
storage.filesystem: Uses local filesystem storage (suitable for single-node setups)
retention_period: Keeps logs for 15 days (360 hours)
analytics.reporting_enabled: false: Disables anonymous usage reporting

Promtail

Promtail is an agent which ships the contents of local logs to Loki.

Dockerfile:

observability/promtail.Dockerfile
FROM grafana/promtail:3.5.10
COPY observability/promtail-config.yml /etc/promtail/config.yml

Configuration:

observability/promtail-config.yml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: containers
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: compose_service
      - source_labels: ['compose_service']
        regex: 'spring-.*'
        action: keep
      - source_labels: ['compose_service']
        regex: 'spring-(.*)'
        target_label: compose_service
        replacement: 'spring_${1}'
    pipeline_stages:
      - regex:
          expression: 'trace_id=\S+ span_id=\S+ trace_flags=\S+ (?P<type>\w+) \S+ ---'
      - labels:
          type:

docker_sd_configs: Discovers Docker containers automatically
relabel_configs: Filters for only spring-* services and renames labels
pipeline_stages: Parses log lines to extract the log level and create indexed labels

The regex pattern trace_id=\S+ span_id=\S+ trace_flags=\S+ (?P<type>\w+) \S+ --- extracts the log level from your Spring Boot log format, enabling filtering by log type (INFO, ERROR, DEBUG, etc.) in Grafana.

Tempo

Tempo is a high-volume, minimal-dependency distributed tracing backend.

Dockerfile:

observability/tempo.Dockerfile
FROM alpine:latest AS builder

RUN mkdir -p /tmp/tempo/blocks /tmp/tempo/wal /tmp/tempo/generator/wal && \
    chown -R 10001:10001 /tmp/tempo

FROM grafana/tempo:2.10.0

COPY --from=builder --chown=10001:10001 /tmp/tempo /tmp/tempo
COPY observability/tempo.yml /etc/tempo/tempo.yml

CMD ["-config.file=/etc/tempo/tempo.yml"]

Creates required directories with proper ownership before copying the Tempo binary.

Configuration:

observability/tempo.yml
auth_enabled: false

server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: "0.0.0.0:4317"
        http:
          endpoint: "0.0.0.0:4318"

ingester:
  max_block_duration: 5m
  trace_idle_period: 10s
  max_block_bytes: 1_000_000

storage:
  trace:
    backend: local
    wal:
      path: /tmp/tempo/wal
    local:
      path: /tmp/tempo/blocks

query_frontend:
  search:
    duration_slo: 5s
    throughput_bytes_slo: 1.073741824e+09

metrics_generator:
  registry:
    external_labels:
      source: tempo
  storage:
    path: /tmp/tempo/generator/wal

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics]

usage_report:
  reporting_enabled: false

distributor.receivers.otlp: Accepts traces via OTLP on ports 4317 (gRPC) and 4318 (HTTP)
storage.trace.backend: local: Uses local filesystem for trace storage
metrics_generator: Enables service graph and span metrics generation
usage_report.reporting_enabled: false: Disables telemetry reporting

Prometheus

Prometheus is a systems monitoring and alerting toolkit that collects and stores its metrics as time series data.

Dockerfile:

observability/prometheus.Dockerfile
FROM prom/prometheus:v3.9.1
COPY observability/prometheus.yml /etc/prometheus/prometheus.yml

Configuration:

observability/prometheus.yml
global:
  scrape_interval: 60s
  evaluation_interval: 60s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']

  - job_name: 'spring-java'
    static_configs:
      - targets: ['spring-java:8080']
    metrics_path: '/actuator/prometheus'

  - job_name: 'spring-kotlin'
    static_configs:
      - targets: ['spring-kotlin:8080']
    metrics_path: '/actuator/prometheus'

  - job_name: 'spring-groovy'
    static_configs:
      - targets: ['spring-groovy:8080']
    metrics_path: '/actuator/prometheus'

scrape_interval: Collects metrics every 60 seconds
scrape_configs: Defines three jobs to scrape metrics from each Spring Boot service
metrics_path: Points to /actuator/prometheus where Micrometer exposes metrics

Grafana

Grafana provides visualization and analytics for your observability data.

Dockerfile:

observability/grafana.Dockerfile
FROM grafana/grafana:11.6.11
COPY observability/grafana/datasources /etc/grafana/provisioning/datasources
COPY observability/grafana/dashboards/dashboards.yml /etc/grafana/provisioning/dashboards/dashboards.yml
COPY observability/grafana/dashboards/*.json /var/lib/grafana/dashboards/

Copies provisioning configuration for datasources and dashboards at build time.

Datasources Configuration:

observability/grafana/datasources/datasources.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    uid: prometheus
    isDefault: true
    editable: false
    jsonData:
      httpMethod: POST
      manageAlerts: true
      exemplarTraceIdDestinations:
        - datasourceUid: tempo
          name: TraceID
          urlDisplayLabel: "View Trace"

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    uid: loki
    editable: false
    jsonData:
      derivedFields:
        - name: TraceID
          matcherRegex: "trace_id=(\\w+)"
          url: "$${__value.raw}"
          datasourceUid: tempo
          urlDisplayLabel: "View Trace"

  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    uid: tempo
    editable: false
    jsonData:
      nodeGraph:
        enabled: true
      tracesToLogs:
        datasourceUid: loki
        filterByTraceID: true
        filterBySpanID: false
        tags:
          - service.name

Configures three datasources:

Prometheus: For metrics, marked as default
Loki: For logs, with trace ID extraction for correlation
Tempo: For traces, with links back to Loki logs

The exemplarTraceIdDestinations and derivedFields configurations enable trace-to-log correlation. When you see a metric spike, you can click to view the trace; when viewing logs, you can click the trace ID to see the full distributed trace.

Dashboards Configuration:

observability/grafana/dashboards/dashboards.yml
apiVersion: 1

providers:
  - name: 'default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: false
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: false

Enables automatic dashboard loading from /var/lib/grafana/dashboards.

Pre-configured Dashboards

The repository includes two pre-configured dashboards adapted from the Grafana community:

JVM Micrometer (
dashboard 4701
): JVM metrics including memory, threads, GC, and class loading
Spring Boot Observability (
dashboard 17175
): Application-level metrics with HTTP request rates, response times, and error rates

These are omitted from the patch due to their size (thousands of lines of JSON), but you can find them in the repository at observability/grafana/dashboards/.

Production deployment with Coolify

When deploying to Coolify, the platform automatically detects the new services defined in your docker-compose.yml and starts them alongside your Spring Boot applications. You do not need to manually configure the monitoring stack.

The only additional step is to assign a domain to Grafana so you can access the dashboards:

In Coolify, find the Grafana service in your project
Click on it and set a domain (e.g., grafana.yourdomain.com)
Coolify will handle SSL certificates and routing

Grafana Environment Variables

Grafana expects the environment variables GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD to be set. Make sure to define them in your Coolify service configuration before starting the stack.

Grafana access

Once logged in, the pre-configured dashboards are available at:

https://grafana-domain-you-have-set-in-coolify/dashboards

You will find:

JVM Micrometer: JVM internals (memory pools, garbage collection, threads)
Spring Boot Observability: HTTP metrics, response times, error rates

With the observability stack in place, you now have full visibility into your applications: metrics in Prometheus, traces in Tempo, and logs in Loki, all unified through Grafana dashboards. When something goes wrong in production, you can trace a request end-to-end, jump from a metric spike to the relevant logs, and pinpoint the root cause without guesswork.

Spring Boot Observability B4064b47d504cf799e722421c3b48a8a

How instrumentation works​

The three pillars of telemetry data​

Observability architecture overview​

Repository setup​

Micrometer Registry Prometheus​

Application configuration​

Observability setup​

Docker Compose configuration​

Loki​

Promtail​

Tempo​

Prometheus​

Grafana​

Production deployment with Coolify​

Grafana access​

How instrumentation works

The three pillars of telemetry data

Observability architecture overview

Repository setup

Micrometer Registry Prometheus

Application configuration

Observability setup

Docker Compose configuration

Loki

Promtail

Tempo

Prometheus

Grafana

Production deployment with Coolify

Grafana access