Bad boy(s) of DevOps: 2020

Thursday, November 12, 2020

Microservices for the better performance

I'm starting to be a fan of API based communication and content loading. In this blog I shortly describe why.

Let’s have a blog page which is a bit like this page. It has following components:

Menu
Content (this text+title)
Comments

Let’s first look at the life cycles of these parts:

Comment - it’s changing whenever someone sends the comment. So it’s changing quite often at the famous blog. Each blog entry has its own comments.
Menu - it’s changing when new content is coming or the titles are updated. The menu is practically the same at every page.
Content - every page has its own content and it’s not changing very often after it has been published. In most cases it’s not changing at all. (Well - maybe some typo fixes but not much more than that.)

First we have the traditional architecture which e.g. Wordpress is using. It doesn’t have any API. It just constructs the whole page at the server and returns it. So you’re every time loading the menu, content and comments. You can’t cache any of this data easily or you risk that people are missing the comments. Or if you think it’s possible to create the cache and then invalidate it whenever there are any changes the process is quite complex. With pseudo code:

If menu changes -> Invalidate all pages which has menu - this is the loop and the invalidation process must know what pages has the menu
If content changes -> Invalidate the content of that page
If there is comment -> Invalidate the content of that page

The menu changes are expensive. After that all page loads are hitting the backend for a while.

What if we create API based communication? The ‘static’ web page is a bit of HTML without any content, JavaScript and CSS files. The APIs are Menu, Content and Comments. Below is the architecture picture of the system. User's cache can be e.g the internal cache of the browser or the proxy of Internet Service Provider.

There’s good chances that the Content does not have to hit the real storage ever after it has been loaded for the first time. Content cache TTL for the local cache can be forever. We can easily invalidate that. The story for the remote caches are different. The TTL can be e.g. 30 seconds. In that case the user’s cache does not store the data for a long time. But instead of hitting our Content service it hits our local cache.

When the data at the Menu changes we don’t have to create a complex loop which invalidates the cache. We have only one call which invalidates the cache of the menu of all pages. This simplifies our rules a lot. The rule for the local cache can be “forever”, but for the users’ caches it can be e.g. 30 seconds or even shorter.

The caching of Comments API depends what the features are. If it gives the user the possibility to modify or delete his comment, then this API cannot be cached for the user who is logged in. There can be more complex rules for caching the Comments API. User logged in -> Never cache. Anonymous user -> Always cache, but invalidate when new comments are written.

Good microservice architecture can improve the performance with the good caching policies. The APIs can have their own life cycles and caching rules should follow those. In many cases it’s enough that the component sets proper caching headers. But to separate the different caching rules for local cache and user’s cache the caching application must be able to modify those.

P.S. Good caching lowers also the infrastructure costs and increases the reliability of the system.

Friday, August 14, 2020

Kubernetes (and Azure AKS) RBAC description

Part of the Kubernetes security is to use RBAC for the authentication and authorization. There’s plenty of short articles about that, but I didn’t find any good and complete “how to”-instructions. I hope this will be such. If you want me to clarify something, add it to the comments please. This is done from the Azure AKS point of view when it is integrated with AAD. But many things are the same in other clusters also.

Here’s the description what kind of parts the Kubernetes role and role binding has:

In this terminology “Role” is describing what the binded identity can do. The identity can be a user, group or service account. Roles can be binded to multiple identities. Let’s start to look at the things backwards and start from the Role binding.

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
 name: <name for the binding>
subjects:
 - kind: Group
   name: <AAD group ID>
roleRef:
 kind: Role
 name: <name of the role>
 apiGroup: rbac.authorization.k8s.io

Role binding is describing what identity can use the role. For humans the identity is group or single user. The service account is for those pods which have to access the apiserver. User is a single user (like testuser_1@youaaddomain.onmicrosoft.com). The role binding to the single user is useful only if you have a few (less than two) users. With more than one user it will become complex and time consuming to maintain. At AKS AAD integration the Group is the object ID of the AD group. E.g. 6ec5b8f7-823c-491c-97d6-977ae68afbf3.

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
 namespace: <mandatory for Role>
 name: <name of the role>
rules:
 - apiGroups:
     - ""
     - <some other API group>
   resources:
     - <resource>
     - <another resource>
   verbs:
     - <verb 1>
     - <verb 2>
 - apiGroups: # Another block - there can be any number of objects
     - <some other API group>
   resources:
     - <resource>
     - <another resource>
   verbs:
     - <verb 1>
     - <verb 2>

The verbs are actions which are allowed. The resources has the following verbs: Create, Get, List, Watch, Update, Patch, Delete, Deletecollection. Addition to those there are several special verbs:

use verb for the podsecuritypolicies in the policy API group
bind and escalate verbs on roles and clusterroles resources in the rbac.authorization.k8s.io API group
impersonate verb on user

You have to read API documentation what each verb exactly does for each resource.

API Group is the group where the resource belongs to. If the resource is a member of the API group, it must be mentioned in the apiGroups part. The empty string is core. All others must be mentioned. When the resource is searched, the Kubernetes checks all API groups which have been defined for this access right object. If you have defined ‘*’ resource for the access rights, it means that any resource from the defined API groups match with this access.

The resources are divided into two separate groups: Namespace resources and cluster resources. For example, a pod is a namespace resource while a node is a cluster resource. The ClusterRoleis only object which can allow access to the cluster resource. ClusterResource can also allow access to namespace resources. In that case the access is to the resource in all namespaces. The resource binding is done with the ClusterResourceBinding.

If the namespace resources are meant to give access to the specific namespace, the Role is used. The Role defines what namespace is in use. The binding is done with the RoleBinding.

I've created the Kubernetes RBAC Matrix for better readability.