In today’s world , Docker and Kubernetes are the De-Facto standard for running micro services. With containerisation and orchestration engines giving us the benefit of running micro services on scale which help us increase data traffic but at the same time network and access control complexity arise. Here we address some of the security concerns in complex distributed systems in respect to the following categories : —
- Docker Container Security.
- Kubernetes Security and Governance
- Application Governance Issues in Micro services Domain
- Solving application risk concerns using Service Mesh(Istio) .
NOTE: Though all the down mentioned points may not seem to us as a typical security vulnerability but we need to understand that human negligence can also lead to high security concerns. Here we shall address a bunch of points on both vulnerabilities and governance modelling issues in large enterprises.
This is strictly based for micro services in a cloud native platform.
Docker Container Security
- Here we shall address issues regarding docker containers and pipelines in software development lifecycle which are responsible for building and shipping of docker images.
- Containers rely on base image, knowing if the container base image comes from source reliable.
- Containers need to be monitored at run-times since they have short life spans. As an operator we need to have complete visibility/observability in of our containers running on clusters.
- Containers run with an orchestration engine, what happens if your container is compromised, the area of impact increases which means the orchestration engine can also be compromised.
- Traditional compliance methods such as firewalls rules take a very complex route when tried to implement in a containerised world.
- Most of the images are built and shipped in a continuous delivery DevOps lifecycle, vulnerability assessments need to be addressed at the pipeline level. Proper implementation of DevSecOps lifecycles needs to be embedded.
Kubernetes Security and Governance
- Kubernetes has its own set of challenges as any complex distributed system. Though core algorithmic issues of consensus and commit log are taken care by RAFT within ETCD provide a stable state of the cluster. Issues such as validation of manifests, best security practices, network and application traceability, RBAC, User authentication with cluster and with side-cars as the recent strategies to support your application containers add a lot of work to do.
- Though one may have achieved the following, when it comes to running large clusters within large enterprises the governance model around it can be as complex as it may seem.
- One of the more complex and a major one is operators, though we may be running prometheus/etcd operators and they assure HA of our application workload but at the same time we need to have one or more operators to have availability incase one operator goes down. Here we shall deal with the problem of leader election between operators as we don’t want two operators at the same time reconciling against the kubernetes api.
- Leader election between multiple pods running for the same application are some of the complex use cases to be implemented. Streaming or Long TCP connections want traffic to be served with only one pod but the service in kubernetes works on randomised algorithm distributing traffic across the pod. How can we enable a leader election without any complexity?
- As a k8s operator you deploy a bunch of manifests, maybe deploy bunch of helm charts are those charts validated according best practices, what if a k8s operator skips the part of mentioning security context in the manifest file. How can you monitor and validate all the manifests files deployed by N number of users in your cluster?
- DataScience teams want to access your cluster, as they need to deploy their Jupiter Notebook in the cluster. So as a k8s operator we need to provide them a restricted access to their namespace only with only specific verbs such as CREATE and WATCH. Implementing for one team can go smoothly by creating RBAC manually but what happens when we have N numbers of team accessing the cluster namespaces.
- Improper Scheduling in kubernetes has led to improper resource utilisation leading to massive downtime and increased costs. In return a lot of security practices can be compromised.
- Controlling of ingress traffic whether authenticated with the application is generally controlled by third party integrations such as octa, oauth but handling egress traffic and controlling it from the cluster has to be fixed and can have complex use cases.
- End to End TLS from the outside and till the pod needs to be on TLS, to prevent man in the middle attacks and spoofing. Implementing such things can be a pain.
Application Security and Governance Model In Micro services
In general we have strategies to develop high end micro service following the 12 factor app or 10 factor app for micro services development. Each factor needs to be a piece of code in the micro service which in that case needs to be maintained as well as versioned.
As a developer I need to focus on my business logic at the core. Here are some of the key points that need to be introspective while developing micro services.
- Is my micro service domain driven or event driven ?
- Is my micro service making Synchronous or Asynchronous calls?
- REST architecture is slowly moving away as GRPC is seen as the emerging tech stack for micro services.
- With GRPC comes in protobuf being used as the data structure for micro services to communicate on HTTP/2.
- Micro services development must practice the Bob’s clean architecture concepts when it comes to writing clean code.
- Where to use a messaging layer such as Kafka, how to design an application layer around it.
- The choice of Databases, where to run them ? SQL or NoSQL ? Databases are stateful and require persistence volumes to persist data. Running Databases on K8s can be complex and how to do sharding in DB’s.
- Which Model to follow when it comes to micro services using DB’s, when and where is the CQRS model required, how can it help in faster retrieval and querying of data.
The above points are one of the most generalised topics a developer works on, but now developers also need to code on network/distributed concerns such as :-
- Rate Limiting
- Circuit Breaking
- Fault Tolerance
- Traffic Routing/Shifting
- AuthN and AuthZ
Which results in creating from a 8 factor to 12 or more factor app.
Solving Application Risks/Concerns using Service Mesh
As an Architect I want application developers to strictly focus on business logic and all the above 5 pointer mentioned to be handled from an operational perspective. Here is where istio service mesh comes to play. Service Mesh shall reduce application developers workload by handling all routing, observability, monitoring required by an application.
From a Traffic Routing Perspective : —
- Service Mesh shall handle all traffic routing, if version 1 of Service A is down, it shall shift traffic to version 2 of Service A. No Code Change required from a developers end.
- Url endpoint /v1/products goes down in production and now the traffic needs to be shifted to v2/products, without any code change istio can mirror all the traffic from v1/products to v2/products without any change in the endpoint v1/products.
- Rate limiting, Fault Tolerance, Fault injection, Circuit breaking shall not be written in code, these features shall be handled operationally.
From a Security Perspective : —
- Transport security and end user security is what istio provides.
- Mutual TLS all the micro services shall talk on TLS, without developers minting the certs as well as coding the logic for it.
- Authenticating end users either from an application context or user context. Working with JWT Tokens.
- Let’s take a sample case
→ Service A wants to talk to Service B running in a kubernetes cluster.
- Is Service A authenticated ?
- If yes, is it authorised ?
- If yes, then how is B validating A.
- Is B caching the Request
- Is service A making a call from an application context or user context.
- Want all services to have end to end TLS.
Here you see now you end up in tons of permutations and combinations with this use case. How to solve this when you have hundreds of services running in a kubernetes cluster. Do we need to write an authorisation server or In case i am using Octa or Auth0, will i need to make code changes in my application? So these are issues we shall address using Istio adapters.
From Observability Perspective : —
- Distributed tracing for microservices
- Application/Infrastructure/API Monitoring
- Visualization of service mesh.
Service Mesh features are vast and the adoption model for it is quite complex. The above are thoughts of my own.