30/05/2024 - DOCKER, ELASTICSEARCH, GO, KUBERNETES
In this example we are going to store and visualise Golang application traces. Exported telemetry data will be stored in ElasticSearch and visualised by Jaeger. The default storage for Jaeger is memory but it won't be efficient. The recommended way of installing and managing Jaeger in a production Kubernetes cluster is via the Jaeger Operator which is what we will be doing too. Jaeger by default stores data in daily indices and it allows you to roll indices - doc. There is automated way for rolling indices but for some reason it was not behaving as expected so I am going to handle this process manually which is also shown in the doc. Commands are at the bottom of the post. One more important note, you could use Kafka (doc) rather than directly pushing data to ElasticSearch which is probably better way of working but I won't get into it here. Have a look at the doc which covers a lot of information.
I have chosen the production deployment strategy but you can go for any other strategy if you wish. Check the doc for it.
As usual, the whole example is open for improvements and adjustments. I left out some "must have" settings to keep the post as small as possible.
This would normally be running somewhere in the Internet but I'll use Docker version and expose it to Internet so that Jaeger can access it within Kubernetes.
$ @DOCKER_BUILDKIT=0 docker run \
--rm \
--env discovery.type=single-node \
--publish 9200:9200 \
--name trace-elastic \
docker.elastic.co/elasticsearch/elasticsearch:7.17.1
Expose it to Internet. We will use the resulting URL later in jaeger.yaml
file.
$ ssh -p 443 -R0:localhost:9200 a.pinggy.io
http://rntsp-2a02-c7c-6502-900-24bc-b496-574f-783.a.free.pinggy.link
https://rntsp-2a02-c7c-6502-900-24bc-b496-574f-783.a.free.pinggy.link
Before deploying Jaeger it is mandatory to create read/write aliases and write indices like shown below. This is a one-off process.
$ docker run -it --rm --net=host jaegertracing/jaeger-es-rollover:latest init http://localhost:9200 --shards 2 --replicas 1 --index-prefix prod
health status index id pri rep docs.count docs.deleted store.size creation.date.string
yellow open prod-jaeger-dependencies-000001 CtOkYcWMSraMbDCfKaSH3A 2 1 0 0 1.1kb 2024-06-01T15:47:14.738Z
yellow open prod-jaeger-span-000001 07konOTQSnGf8wbb6s_LSg 2 1 0 0 1.1kb 2024-06-01T15:47:13.273Z
yellow open prod-jaeger-service-000001 QwUbhxeFQ9O4R2zEojN6Bw 2 1 0 0 1.1kb 2024-06-01T15:47:14.056Z
alias index
prod-jaeger-dependencies-read assigned to index prod-jaeger-dependencies-000001
prod-jaeger-dependencies-read assigned to index prod-jaeger-dependencies-000001
prod-jaeger-dependencies-write assigned to index prod-jaeger-dependencies-000001
prod-jaeger-service-read assigned to index prod-jaeger-service-000001
prod-jaeger-service-write assigned to index prod-jaeger-service-000001
prod-jaeger-span-read assigned to index prod-jaeger-span-000001
prod-jaeger-span-write assigned to index prod-jaeger-span-000001
Run Minikube to boot your local Kubernetes cluster.
$ minikube start --memory 4000 --cpus=2
Prepare Jaeger - doc.
$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.6.1/cert-manager.yaml
$ kubectl get pods --namespace cert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-5656f9c48-lz4sm 1/1 Running 0 103s
cert-manager-cainjector-765d9679c9-2btbx 1/1 Running 0 103s
cert-manager-webhook-586f8d6cf6-9pw8p 1/1 Running 0 103s
$ kubectl create namespace observability
$ kubectl create -n observability -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.56.0/jaeger-operator.yaml
$ kubectl -n observability get all
NAME READY STATUS RESTARTS AGE
pod/jaeger-operator-786c87cb64-vflww 2/2 Running 0 3m3s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-operator-metrics ClusterIP 10.104.211.1418443/TCP 3m4s
service/jaeger-operator-webhook-service ClusterIP 10.103.136.40443/TCP 3m4s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-operator 1/1 1 1 3m5s
NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-operator-786c87cb64 1 1 1 3m5s
FROM golang:1.22.0-alpine3.19 as build
WORKDIR /api
COPY . .
RUN go mod verify
RUN CGO_ENABLED=0 go build -ldflags "-s -w" -o ./bin/api main.go
FROM alpine:3.19
WORKDIR /api
COPY --from=build /api/bin/api bin/api
ENTRYPOINT ./bin/api
package main
import (
"context"
"log"
"net/http"
"os"
"playground/api"
"playground/trace"
)
func main() {
ctx := context.Background()
exp, err := trace.NewExporter(ctx, trace.ExporterConfig{
Type: os.Getenv("TYPE"),
Address: os.Getenv("JAEGER"),
})
if err != nil {
log.Fatalln(err)
}
pro, err := trace.NewProvider(ctx, trace.ProviderConfig{
Exporter: exp,
Service: os.Getenv("SVC"),
Version: os.Getenv("VER"),
Environment: os.Getenv("ENV"),
})
if err != nil {
log.Fatalln(err)
}
defer pro.Close(ctx)
rtr := http.NewServeMux()
rtr.HandleFunc("GET /api/v1/users/{id}", (api.User{}).Find)
log.Println(http.ListenAndServe(os.Getenv("HOST")+":"+os.Getenv("PORT"), rtr))
}
package trace
import (
"context"
"errors"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
)
type ExporterConfig struct {
Type string
Address string
}
func NewExporter(ctx context.Context, cfg ExporterConfig) (sdktrace.SpanExporter, error) {
switch cfg.Type {
case "stdout":
return stdouttrace.New()
case "http":
return otlptracehttp.New(ctx,
otlptracehttp.WithInsecure(),
otlptracehttp.WithEndpoint(cfg.Address),
)
}
return nil, errors.New("invalid type")
}
package trace
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.25.0"
)
type ProviderConfig struct {
Exporter sdktrace.SpanExporter
Service string
Version string
Environment string
}
type Provider struct {
provider *sdktrace.TracerProvider
}
func NewProvider(ctx context.Context, cfg ProviderConfig) (Provider, error) {
res := resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName(cfg.Service),
semconv.ServiceVersion(cfg.Version),
semconv.DeploymentEnvironment(cfg.Environment),
)
prp := propagation.NewCompositeTextMapPropagator(
propagation.TraceContext{},
propagation.Baggage{},
)
otel.SetTextMapPropagator(prp)
prv := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(cfg.Exporter),
sdktrace.WithResource(res),
)
otel.SetTracerProvider(prv)
return Provider{
provider: prv,
}, nil
}
func (p Provider) Close(ctx context.Context) error {
return p.provider.Shutdown(ctx)
}
I strongly suggest you to improve this to add more functionality. Span type comes with a lot of features. This is just a lazy implementation for now.
package trace
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/codes"
"go.opentelemetry.io/otel/trace"
)
func Span(ctx context.Context, name string, opts ...trace.SpanStartOption) (context.Context, trace.Span) {
if opts == nil {
return otel.Tracer("").Start(ctx, name)
}
return otel.Tracer("").Start(ctx, name, opts...)
}
func Error(span trace.Span, err error) {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
}
package api
import (
"context"
"errors"
"net/http"
"playground/trace"
)
type User struct{}
func (u User) Find(w http.ResponseWriter, r *http.Request) {
ctx, span := trace.Span(r.Context(), "user.Find")
defer span.End()
if err := u.isValid(ctx, r.PathValue("id")); err != nil {
trace.Error(span, errors.New("invalid user id"))
w.WriteHeader(http.StatusBadRequest)
return
}
}
func (u User) isValid(ctx context.Context, id string) error {
_, span := trace.Span(ctx, "user.isValid")
defer span.End()
if id != "b" {
return errors.New("invalid user id")
}
return nil
}
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-deployment
namespace: default
labels:
app: api
spec:
replicas: 1
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: golang
image: you/api:latest
imagePullPolicy: Always
ports:
- containerPort: 8000
env:
- name: HOST
value: "0.0.0.0"
- name: PORT
value: "8000"
- name: ENV
value: "production"
- name: SVC
value: "api"
- name: VER
value: "v0.0.1"
- name: TYPE
value: "http"
- name: JAEGER
value: "jaeger-collector.observability:4318"
---
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: default
spec:
type: NodePort
selector:
app: api
ports:
- protocol: TCP
port: 80
targetPort: 8000
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
namespace: observability
spec:
strategy: production
collector:
maxReplicas: 2
resources:
limits:
cpu: 100m
memory: 128Mi
storage:
type: elasticsearch
options:
es:
server-urls: https://rntsp-2a02-c7c-6502-900-24bc-b496-574f-783.a.free.pinggy.link
version: 7
index-prefix: prod
use-aliases: true
$ kubectl apply -f jaeger.yaml
I haven't checked it but cronjob.batch/jaeger-es-index-cleaner
might be actually cleaning the old indexes. Just needs checking.
$ kubectl -n observability get all
NAME READY STATUS RESTARTS AGE
pod/jaeger-collector-7d4c468b9f-g8vq9 1/1 Running 0 2m46s
pod/jaeger-operator-786c87cb64-vflww 2/2 Running 0 6m51s
pod/jaeger-query-65f5979bc-btxln 2/2 Running 0 2m46s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-collector ClusterIP 10.103.189.509411/TCP,14250/TCP,14267/TCP,14268/TCP,14269/TCP,4317/TCP,4318/TCP 2m47s
service/jaeger-collector-headless ClusterIP None9411/TCP,14250/TCP,14267/TCP,14268/TCP,14269/TCP,4317/TCP,4318/TCP 2m47s
service/jaeger-operator-metrics ClusterIP 10.104.211.1418443/TCP 6m52s
service/jaeger-operator-webhook-service ClusterIP 10.103.136.40443/TCP 6m52s
service/jaeger-query ClusterIP 10.102.252.11616686/TCP,16685/TCP,16687/TCP 2m47s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-collector 1/1 1 1 2m46s
deployment.apps/jaeger-operator 1/1 1 1 6m52s
deployment.apps/jaeger-query 1/1 1 1 2m46s
NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-collector-7d4c468b9f 1 1 1 2m47s
replicaset.apps/jaeger-operator-786c87cb64 1 1 1 6m53s
replicaset.apps/jaeger-query-65f5979bc 1 1 1 2m47s
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/jaeger-collector Deployment/jaeger-collector cpu:/90%, memory: /90% 1 5 1 79s
NAME SCHEDULE TIMEZONE SUSPEND ACTIVE LAST SCHEDULE AGE
cronjob.batch/jaeger-es-index-cleaner 55 23 * * *False 0 2m48s
cronjob.batch/jaeger-spark-dependencies 55 23 * * *False 0 2m48s
We need to deploy application image to DockerHub.
$ docker build -t you/api:latest .
$ docker push you/api:latest
$ kubectl apply -f api.yaml
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/api-deployment-fcdcc84d7-rkfbq 1/1 Running 0 11s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/api-service NodePort 10.100.179.15980:30512/TCP 11s
service/kubernetes ClusterIP 10.96.0.1443/TCP 17m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/api-deployment 1/1 1 1 11s
NAME DESIRED CURRENT READY AGE
replicaset.apps/api-deployment-fcdcc84d7 1 1 1 11s
After running command below, visit http://localhost:16686
.
$ kubectl port-forward -n observability service/jaeger-query 16686:16686
Forwarding from 127.0.0.1:16686 -> 16686
Let's populate some dummy trace data. First port forward so that our Go application is accessible from within local host.
$ kubectl port-forward service/api-service 8888:80
Forwarding from 127.0.0.1:8888 -> 8000
Forwarding from [::1]:8888 -> 8000
$ curl -i http://localhost:8888/api/v1/users/a
HTTP/1.1 400 Bad Request
Date: Wed, 29 May 2024 20:49:51 GMT
Content-Length: 0
$ curl -i http://localhost:8888/api/v1/users/b
HTTP/1.1 200 OK
Date: Wed, 29 May 2024 20:50:02 GMT
Content-Length: 0
$ curl -i https://rntsp-2a02-c7c-6502-900-24bc-b496-574f-783.a.free.pinggy.link/_cat/indices
health status index id pri rep docs.count docs.deleted store.size creation.date.string
yellow open prod-jaeger-dependencies-000001 CtOkYcWMSraMbDCfKaSH3A 2 1 0 0 1.1kb 2024-06-01T15:47:14.738Z
yellow open prod-jaeger-span-000001 07konOTQSnGf8wbb6s_LSg 2 1 13 0 18.8kb 2024-06-01T15:47:13.273Z
yellow open prod-jaeger-service-000001 QwUbhxeFQ9O4R2zEojN6Bw 2 1 0 0 1.1kb 2024-06-01T15:47:14.056Z
Jaeger uses index-per-day pattern for indexes and creates a new index for each day based on span's timestamp. This pattern might not be as effective for distributing data over shards where an index might contain more data than the others.
When rolling over indexes, an index alias rolls over to a new index based on given configuration conditions. There is one alias for reading and another for writing. The read alias points to a group of read-only indices and write alias to one write index. Let's now go though the commands. You should ideally automate these commands rather than manually running. For example create a cron job in Kubernetes that does this for you once a day. Each command below has --help
tag for hints.
This command rolls the write alias to a new index as per the condition. It also adds a new index to read alias so that the new data becomes available for search. Here we roll the alias over to a new index as long as the age of the current write index is older than 10 seconds.
$ docker run -it --rm --net=host jaegertracing/jaeger-es-rollover:latest rollover http://localhost:9200 --conditions '{"max_age": "10s"}' --index-prefix prod
This will make old data unavailable for search as per the given units (after 1 second).
$ docker run -it --rm --net=host jaegertracing/jaeger-es-rollover:latest lookback http://localhost:9200 --unit-count 1 --unit seconds --index-prefix prod
This deletes historical data by deleting old indices. Here we remove indices older than 1 day.
$ docker run -it --rm --net=host jaegertracing/jaeger-es-index-cleaner:latest 0 http://localhost:9200 --rollover --index-prefix prod
Just a note, I've called API once to populate new index just before lookback
command.
health status index id pri rep docs.count docs.deleted store.size creation.date.string
yellow open prod-jaeger-dependencies-000002 atOkYcWMSrasdfsdfaSH3A 2 1 0 0 1.1kb 2024-06-01T15:47:14.738Z
yellow open prod-jaeger-span-000002 a7konOTQSnGf8hgfhs_LSg 2 1 2 0 4.8kb 2024-06-01T15:47:13.273Z
yellow open prod-jaeger-service-000002 awUbjyutj9O4R2zEojN6Bw 2 1 0 0 1.1kb 2024-06-01T15:47:14.056Z