Valamis Microservice Specification

Requirements

Multi-tenancy

TODO: General multi-tenancy requirements TODO: Multi-tenant data access layer

Hybrid multi-tennant sharding

Replica-ready

TODO: Replica requiremnts TODO: table/row lock dangers???

Rolling upgrade and migrations

DB migrations should take in account that there can be two versions of the same service running in the same time. For example: Never remove columns which the old versions needs before every instance of service is rolled to a new version. In practice this means that non-backward-compatible migrations need to be done in multiple phases.

TODO: some tips/links?

Health check endpoints

TODO: Liveliness/readiness probes

Implement DB, kafka checks

Metrics endpoint

Add Prometheus /metrics endpoint to every service. There should be implementations to most of popular languages/frameworks.

TODO suggestions:

  • JVM/Scala/Kotlin
  • node.js
  • Go

Configuration

12-factor app? .dot files?

Cross-cutting concerns

Authentication

TODO OIDC, JWT, link to actual authn/authz doc

Authorization and policies

TODO OPA sidecar/node instance??, link to actual authn/authz doc

Configuration Management

TODO 12-factor app style?

Service Discovery

TODO: Kubernetes CoreDNS

Load Balancing

TODO: Cloud provider

Feature flags

TODO

Logging

TODO fluentd, EFK

Audit + errors, Optional debug, Logs to stderr/stdout

Errors

TBD

Debugging

TBD

Audit logs

Who?

Monitoring and alerting

Health checks

liveliness/readiness probing

Prometheus and Alertmanager

Libraries

JVM: https://github.com/prometheus/client_java Node.js: https://github.com/siimon/prom-client Go: https://github.com/prometheus/client_golang

HTTP cache

Surrogate-Control cache headers should be used to tell Fastly CDN that API responses can be cached. Header is stripped from the browser output.

Additionally standard Cache-Control can be used if response is something that browser should also cache. See Cache-Control. Be extra careful with the browser cache as it's hard to control.

Note that explicit purge from CDN cache can be done if CDN caches are set to be long living. TODO: general event-driven service for CDN purges.

For more information, see Fastly documentation about caching at the edge.

Compression (gzip, brotli)

Main responsible for microservice compression should be CDN and/or API gateway, or in the future, service mesh proxy layer (Envoy,Linkerd, etc) if there is any.

Current problem with efficient brotli compression is that it's not widely supported. Waiting some resolution of either of these issues: