In this example we are going to store and visualise Golang application traces. Exported telemetry data will be stored in ElasticSearch and visualised by Jaeger. The default storage for Jaeger is memory but it won't be efficient. The recommended way of installing and managing Jaeger in a production Kubernetes cluster is via the Jaeger Operator which is what we will be doing too. Jaeger by default stores data in daily indices and it allows you to roll indices - doc. There is automated way for rolling indices but for some reason it was not behaving as expected so I am going to handle this process manually which is also shown in the doc. Commands are at the bottom of the post. One more important note, you could use Kafka (doc) rather than directly pushing data to ElasticSearch which is probably better way of working but I won't get into it here. Have a look at the doc which covers a lot of information.


I have chosen the production deployment strategy but you can go for any other strategy if you wish. Check the doc for it.


As usual, the whole example is open for improvements and adjustments. I left out some "must have" settings to keep the post as small as possible.



ElasticSearch


This would normally be running somewhere in the Internet but I'll use Docker version and expose it to Internet so that Jaeger can access it within Kubernetes.


$ @DOCKER_BUILDKIT=0 docker run \
--rm \
--env discovery.type=single-node \
--publish 9200:9200 \
--name trace-elastic \
docker.elastic.co/elasticsearch/elasticsearch:7.17.1

Expose it to Internet. We will use the resulting URL later in jaeger.yaml file.


$ ssh -p 443 -R0:localhost:9200 a.pinggy.io

http://rntsp-2a02-c7c-6502-900-24bc-b496-574f-783.a.free.pinggy.link
https://rntsp-2a02-c7c-6502-900-24bc-b496-574f-783.a.free.pinggy.link

Before deploying Jaeger it is mandatory to create read/write aliases and write indices like shown below. This is a one-off process.


$ docker run -it --rm --net=host jaegertracing/jaeger-es-rollover:latest init http://localhost:9200 --shards 2 --replicas 1 --index-prefix prod

health status index                           id                     pri rep docs.count docs.deleted store.size creation.date.string
yellow open prod-jaeger-dependencies-000001 CtOkYcWMSraMbDCfKaSH3A 2 1 0 0 1.1kb 2024-06-01T15:47:14.738Z
yellow open prod-jaeger-span-000001 07konOTQSnGf8wbb6s_LSg 2 1 0 0 1.1kb 2024-06-01T15:47:13.273Z
yellow open prod-jaeger-service-000001 QwUbhxeFQ9O4R2zEojN6Bw 2 1 0 0 1.1kb 2024-06-01T15:47:14.056Z

alias                                              index
prod-jaeger-dependencies-read assigned to index prod-jaeger-dependencies-000001
prod-jaeger-dependencies-read assigned to index prod-jaeger-dependencies-000001
prod-jaeger-dependencies-write assigned to index prod-jaeger-dependencies-000001
prod-jaeger-service-read assigned to index prod-jaeger-service-000001
prod-jaeger-service-write assigned to index prod-jaeger-service-000001
prod-jaeger-span-read assigned to index prod-jaeger-span-000001
prod-jaeger-span-write assigned to index prod-jaeger-span-000001

Kubernetes


Run Minikube to boot your local Kubernetes cluster.


$ minikube start --memory 4000 --cpus=2

Prepare Jaeger - doc.


$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.6.1/cert-manager.yaml

$ kubectl get pods --namespace cert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-5656f9c48-lz4sm 1/1 Running 0 103s
cert-manager-cainjector-765d9679c9-2btbx 1/1 Running 0 103s
cert-manager-webhook-586f8d6cf6-9pw8p 1/1 Running 0 103s


$ kubectl create namespace observability
$ kubectl create -n observability -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.56.0/jaeger-operator.yaml

$ kubectl -n observability get all
NAME READY STATUS RESTARTS AGE
pod/jaeger-operator-786c87cb64-vflww 2/2 Running 0 3m3s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-operator-metrics ClusterIP 10.104.211.141 8443/TCP 3m4s
service/jaeger-operator-webhook-service ClusterIP 10.103.136.40 443/TCP 3m4s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-operator 1/1 1 1 3m5s

NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-operator-786c87cb64 1 1 1 3m5s

Application


Dockerfile


FROM golang:1.22.0-alpine3.19 as build
WORKDIR /api
COPY . .
RUN go mod verify
RUN CGO_ENABLED=0 go build -ldflags "-s -w" -o ./bin/api main.go

FROM alpine:3.19
WORKDIR /api
COPY --from=build /api/bin/api bin/api
ENTRYPOINT ./bin/api

main.go


package main

import (
"context"
"log"
"net/http"
"os"

"playground/api"
"playground/trace"
)

func main() {
ctx := context.Background()

exp, err := trace.NewExporter(ctx, trace.ExporterConfig{
Type: os.Getenv("TYPE"),
Address: os.Getenv("JAEGER"),
})
if err != nil {
log.Fatalln(err)
}

pro, err := trace.NewProvider(ctx, trace.ProviderConfig{
Exporter: exp,
Service: os.Getenv("SVC"),
Version: os.Getenv("VER"),
Environment: os.Getenv("ENV"),
})
if err != nil {
log.Fatalln(err)
}
defer pro.Close(ctx)

rtr := http.NewServeMux()
rtr.HandleFunc("GET /api/v1/users/{id}", (api.User{}).Find)

log.Println(http.ListenAndServe(os.Getenv("HOST")+":"+os.Getenv("PORT"), rtr))
}

exporter.go


package trace

import (
"context"
"errors"

"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"

sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

type ExporterConfig struct {
Type string
Address string
}

func NewExporter(ctx context.Context, cfg ExporterConfig) (sdktrace.SpanExporter, error) {
switch cfg.Type {
case "stdout":
return stdouttrace.New()
case "http":
return otlptracehttp.New(ctx,
otlptracehttp.WithInsecure(),
otlptracehttp.WithEndpoint(cfg.Address),
)
}

return nil, errors.New("invalid type")
}

provider.go


package trace

import (
"context"

"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"

sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.25.0"
)

type ProviderConfig struct {
Exporter sdktrace.SpanExporter
Service string
Version string
Environment string
}

type Provider struct {
provider *sdktrace.TracerProvider
}

func NewProvider(ctx context.Context, cfg ProviderConfig) (Provider, error) {
res := resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName(cfg.Service),
semconv.ServiceVersion(cfg.Version),
semconv.DeploymentEnvironment(cfg.Environment),
)

prp := propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
)
otel.SetTextMapPropagator(prp)

prv := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(cfg.Exporter),
sdktrace.WithResource(res),
)
otel.SetTracerProvider(prv)

return Provider{
provider: prv,
}, nil
}

func (p Provider) Close(ctx context.Context) error {
return p.provider.Shutdown(ctx)
}

span.go


I strongly suggest you to improve this to add more functionality. Span type comes with a lot of features. This is just a lazy implementation for now.


package trace

import (
"context"

"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/codes"
"go.opentelemetry.io/otel/trace"
)

func Span(ctx context.Context, name string, opts ...trace.SpanStartOption) (context.Context, trace.Span) {
if opts == nil {
return otel.Tracer("").Start(ctx, name)
}

return otel.Tracer("").Start(ctx, name, opts...)
}

func Error(span trace.Span, err error) {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
}

api.go


package api

import (
"context"
"errors"
"net/http"

"playground/trace"
)

type User struct{}

func (u User) Find(w http.ResponseWriter, r *http.Request) {
ctx, span := trace.Span(r.Context(), "user.Find")
defer span.End()

if err := u.isValid(ctx, r.PathValue("id")); err != nil {
trace.Error(span, errors.New("invalid user id"))

w.WriteHeader(http.StatusBadRequest)

return
}
}

func (u User) isValid(ctx context.Context, id string) error {
_, span := trace.Span(ctx, "user.isValid")
defer span.End()

if id != "b" {
return errors.New("invalid user id")
}

return nil
}

api.yaml


apiVersion: apps/v1
kind: Deployment

metadata:
name: api-deployment
namespace: default
labels:
app: api

spec:
replicas: 1
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: golang
image: you/api:latest
imagePullPolicy: Always
ports:
- containerPort: 8000
env:
- name: HOST
value: "0.0.0.0"
- name: PORT
value: "8000"
- name: ENV
value: "production"
- name: SVC
value: "api"
- name: VER
value: "v0.0.1"
- name: TYPE
value: "http"
- name: JAEGER
value: "jaeger-collector.observability:4318"

---

apiVersion: v1
kind: Service

metadata:
name: api-service
namespace: default

spec:
type: NodePort
selector:
app: api
ports:
- protocol: TCP
port: 80
targetPort: 8000

jaeger.yaml


apiVersion: jaegertracing.io/v1
kind: Jaeger

metadata:
name: jaeger
namespace: observability

spec:
strategy: production
collector:
maxReplicas: 2
resources:
limits:
cpu: 100m
memory: 128Mi
storage:
type: elasticsearch
options:
es:
server-urls: https://rntsp-2a02-c7c-6502-900-24bc-b496-574f-783.a.free.pinggy.link
version: 7
index-prefix: prod
use-aliases: true

Jaeger deployment


$ kubectl apply -f jaeger.yaml

I haven't checked it but cronjob.batch/jaeger-es-index-cleaner might be actually cleaning the old indexes. Just needs checking.


$ kubectl -n observability get all
NAME READY STATUS RESTARTS AGE
pod/jaeger-collector-7d4c468b9f-g8vq9 1/1 Running 0 2m46s
pod/jaeger-operator-786c87cb64-vflww 2/2 Running 0 6m51s
pod/jaeger-query-65f5979bc-btxln 2/2 Running 0 2m46s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-collector ClusterIP 10.103.189.50 9411/TCP,14250/TCP,14267/TCP,14268/TCP,14269/TCP,4317/TCP,4318/TCP 2m47s
service/jaeger-collector-headless ClusterIP None 9411/TCP,14250/TCP,14267/TCP,14268/TCP,14269/TCP,4317/TCP,4318/TCP 2m47s
service/jaeger-operator-metrics ClusterIP 10.104.211.141 8443/TCP 6m52s
service/jaeger-operator-webhook-service ClusterIP 10.103.136.40 443/TCP 6m52s
service/jaeger-query ClusterIP 10.102.252.116 16686/TCP,16685/TCP,16687/TCP 2m47s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-collector 1/1 1 1 2m46s
deployment.apps/jaeger-operator 1/1 1 1 6m52s
deployment.apps/jaeger-query 1/1 1 1 2m46s

NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-collector-7d4c468b9f 1 1 1 2m47s
replicaset.apps/jaeger-operator-786c87cb64 1 1 1 6m53s
replicaset.apps/jaeger-query-65f5979bc 1 1 1 2m47s

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/jaeger-collector Deployment/jaeger-collector cpu: /90%, memory: /90% 1 5 1 79s

NAME SCHEDULE TIMEZONE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/jaeger-es-index-cleaner 55 23 * * * False 0 2m48s
cronjob.batch/jaeger-spark-dependencies 55 23 * * * False 0 2m48s

Application Docker deployment


We need to deploy application image to DockerHub.


$ docker build -t you/api:latest .
$ docker push you/api:latest

Application deployment


$ kubectl apply -f api.yaml

$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/api-deployment-fcdcc84d7-rkfbq 1/1 Running 0 11s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/api-service NodePort 10.100.179.159 80:30512/TCP 11s
service/kubernetes ClusterIP 10.96.0.1 443/TCP 17m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/api-deployment 1/1 1 1 11s

NAME DESIRED CURRENT READY AGE
replicaset.apps/api-deployment-fcdcc84d7 1 1 1 11s

Access Jaeger UI


After running command below, visit http://localhost:16686.


$ kubectl port-forward -n observability service/jaeger-query 16686:16686
Forwarding from 127.0.0.1:16686 -> 16686

Populate trace data


Let's populate some dummy trace data. First port forward so that our Go application is accessible from within local host.


$ kubectl port-forward service/api-service 8888:80
Forwarding from 127.0.0.1:8888 -> 8000
Forwarding from [::1]:8888 -> 8000

$ curl -i http://localhost:8888/api/v1/users/a
HTTP/1.1 400 Bad Request
Date: Wed, 29 May 2024 20:49:51 GMT
Content-Length: 0

$ curl -i http://localhost:8888/api/v1/users/b
HTTP/1.1 200 OK
Date: Wed, 29 May 2024 20:50:02 GMT
Content-Length: 0

Elasticsearch index


$ curl -i https://rntsp-2a02-c7c-6502-900-24bc-b496-574f-783.a.free.pinggy.link/_cat/indices

health status index id pri rep docs.count docs.deleted store.size creation.date.string
yellow open prod-jaeger-dependencies-000001 CtOkYcWMSraMbDCfKaSH3A 2 1 0 0 1.1kb 2024-06-01T15:47:14.738Z
yellow open prod-jaeger-span-000001 07konOTQSnGf8wbb6s_LSg 2 1 13 0 18.8kb 2024-06-01T15:47:13.273Z
yellow open prod-jaeger-service-000001 QwUbhxeFQ9O4R2zEojN6Bw 2 1 0 0 1.1kb 2024-06-01T15:47:14.056Z






Rolling over index


Jaeger uses index-per-day pattern for indexes and creates a new index for each day based on span's timestamp. This pattern might not be as effective for distributing data over shards where an index might contain more data than the others.


When rolling over indexes, an index alias rolls over to a new index based on given configuration conditions. There is one alias for reading and another for writing. The read alias points to a group of read-only indices and write alias to one write index. Let's now go though the commands. You should ideally automate these commands rather than manually running. For example create a cron job in Kubernetes that does this for you once a day. Each command below has --help tag for hints.


Rollover to a new index


This command rolls the write alias to a new index as per the condition. It also adds a new index to read alias so that the new data becomes available for search. Here we roll the alias over to a new index as long as the age of the current write index is older than 10 seconds.


$ docker run -it --rm --net=host jaegertracing/jaeger-es-rollover:latest rollover http://localhost:9200 --conditions '{"max_age": "10s"}' --index-prefix prod

Remove old indices from read aliases


This will make old data unavailable for search as per the given units (after 1 second).


$ docker run -it --rm --net=host jaegertracing/jaeger-es-rollover:latest lookback http://localhost:9200 --unit-count 1 --unit seconds --index-prefix prod

Removing old data


This deletes historical data by deleting old indices. Here we remove indices older than 1 day.


$ docker run -it --rm --net=host jaegertracing/jaeger-es-index-cleaner:latest 0 http://localhost:9200 --rollover --index-prefix prod

Result


Just a note, I've called API once to populate new index just before lookback command.


health status index                           id                     pri rep docs.count docs.deleted store.size creation.date.string
yellow open prod-jaeger-dependencies-000002 atOkYcWMSrasdfsdfaSH3A 2 1 0 0 1.1kb 2024-06-01T15:47:14.738Z
yellow open prod-jaeger-span-000002 a7konOTQSnGf8hgfhs_LSg 2 1 2 0 4.8kb 2024-06-01T15:47:13.273Z
yellow open prod-jaeger-service-000002 awUbjyutj9O4R2zEojN6Bw 2 1 0 0 1.1kb 2024-06-01T15:47:14.056Z