Share IT!

Sunday, June 26, 2022

[How To] Prevent Container with Privileged Mode to Run on Kubernetes Cluster

My Kubernetes exploration brought me to the topic of privileged pods. Privileged pod, or container running in privileged mode is a configuration option of K8s deployment which can be useful, but can also be dangerous.

This is an excerpt of deployment specification where privileged mode defined.

kind: Deployment
...
spec:
template:
spec:
containers:
- name: ...
image: nginx:1.14.2
securityContext:
privileged: true

Found this article:

https://www.cncf.io/blog/2020/10/16/hack-my-mis-configured-kubernetes-privileged-pods/

which explains about what's the true intent of running privileged pod and the security risk it caused, including how to exploit privileged pod to do malicious intent.

Now the question is, how can we prevent container with privileged mode to run on our Kubernetes cluster?

[lunar.lab] Cannot Resolve ".local" Domain from TKGm Workload Cluster

Problem Statement

Kubernetes Pod Status ImagePullBackOff
Describe pod show error message:

dial tcp: lookup harbor-01a.corp.local: Temporary failure in name resolution

Container image pulled from local container registry with ".local" domain suffix

[lunar.lab] Allow TKGm Workload Cluster to Pull Image from Harbor Configured with Self-signed Certificate

Disclaimer

This method is kind of a hack and hence ** Unsupported **.
I do this only within my lab or PoC with controlled environment.

Problem Statement

TKGm Workload Cluster do not allow pulling image from Container Registry configured with Self-signed Certificate.

Doing such thing will throw error message as follows:

x509: certificate signed by unknown authority

[How To] Enhance Online Boutique App to Use Persistent Volume

Online Boutique (https://github.com/GoogleCloudPlatform/microservices-demo) is a web-based e-commerce microservices demo app built by folks at Google. I use this as demo app to deploy on top of Tanzu Kubernetes platform. One of the demo scenario I do is how to consume vSphere datastore as persistent storage for Kubernetes app, in easy, on-demand, fully automated, and scalable fashion. This can be done by a feature called Cloud Native Storage (CNS). Read more about CNS here:

https://blogs.vmware.com/virtualblocks/2019/08/14/introducing-cloud-native-storage-for-vsphere/

One of Online Boutique service is redis-cart. This is the service in charge for Shopping Cart. If any item added to Shopping Cart, the record will be handled by this service. With default configuration, the data volume used by redis-cart do not use persistent volume. If redis-cart is failed, Shopping Cart data will be lost. This article explains how to alter this and use vSphere datastore to provide persistent storage for redis-cart service.

[How To] Avoid Hitting Docker Pull Rate Limit by Authenticate Pull Request

When demoing kubernetes platform, I definitely need sample application to deploy. There are some great reference here: https://williamlam.com/2020/06/interesting-kubernetes-application-demos.html, where most of the source container images are coming from Docker registry. If you try to deploy the app manifests, you might hit error like the following:

429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit.

[lunar.lab] Deploy TKG Management Cluster on vSphere

Now all the preparation completed, I finally able to deploy TKG management cluster. The recommended (and easiest) way to do this for the first time is using the installer interface. From the bootstrap machine prepared earlier (https://dy.si/TAg1M72), I type this:

tanzu management-cluster create --ui --browser none --bind 192.168.110.101:8081

192.168.110.101 is my bootstrap machine IP Address
I set the installer interface to be accessible on port 8081
--browser none to tell the installer not to open browser locally as my bootstrap machine is without GUI
Open http://192.168.110.101:8081 on a browser with network access to bootstrap machine
As always, this is the official documentation that I refer to: https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.5/vmware-tanzu-kubernetes-grid-15/GUID-mgmt-clusters-deploy-ui.html

Step 1 - IaaS Provider

Created a base image template that matches the management cluster’s Kubernetes version.

Official documentation: https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.5/vmware-tanzu-kubernetes-grid-15/GUID-mgmt-clusters-vsphere.html#import-a-base-image-template-into-vsphere-4
Import latest available version and -1 to test version upgrade

In my case for TKGm 1.5.1, I import version Kubernetes v1.21.8 and v1.22.5
I found out that Ubuntu template is not recognized when trying to deploy workload cluster from TMC, so I also import Photon image.

[lunar.lab] Install Harbor Container Registry as Docker Containers

This is an installation note of Harbor container registry for lunar.lab. Keeping in mind to minimize footprint due to resource constraint, I decided to install Harbor service in bootstrap machine VM which I already deployed (See here for the article: https://dy.si/TAg1M72).

Official documentation followed for this purpose can be found here: https://goharbor.io/docs/2.5.0/install-config/
List of software version:

Harbor 2.5.0
Docker Engine 20.10.16
Docker Compose 2.5.0

For a more proper way to deploy Harbor in TKG, you may want to check this official documentation:

[lunar.lab] Prepare Bootstrap Machine for TKGm Deployment

Having a bootstrap machine is one of the step required for deploying TKGm to vSphere as stated here:

https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.5/vmware-tanzu-kubernetes-grid-15/GUID-mgmt-clusters-vsphere.html

This article will run through the step I took to configure that bootstrap machine which involves these following 4 steps.

Step 1 - Starting Point: Ubuntu VM

I create Ubuntu VM from scratch with the following configuration:

Virtual Hardware specification

8 vCPU
8GB RAM
40GB disk

Ubuntu 20.04
Minimum install + OpenSSH
Configure static IP
Configure Internet access (using proxy)

Once VM created on vSphere, boot from Ubuntu 20.04 installation image, then all the above settings can be configured easily through the installation wizard. Pretty straightforward.

Help! Where is my tanzu cluster Plugin?

So you have lost your tanzu cluster Plugin after upgrading Tanzu CLI?

I tried to upgrade my TKG from version 1.4.2 to 1.5.1. One of the first step is to upgrade Tanzu CLI. Once upgraded, I realize that tanzu cluster plugin is missing! 😱 How can I manage my TKG cluster then? This did not happen when I did the upgrade from 1.3.1 to 1.4.2!

Help, I Cannot Pass IaaS Provider Step when Deploying TKG 1.4.2 Management Cluster to vSphere!

In TKG 1.4, if you deploy management clusters to vSphere with the installer interface, as a first step of configuring vCenter Server as IaaS Provider, you’ll need to fill in your vCenter Server IP Address or FQDN, username/password then hit Connect. If your vCenter use the default certificate, you’ll found this error:

Failed to connect to the specified vCenter Server. Post "https://IP_or_FQDN/sdk": x509: cannot validate certificate for IP_or_FQDN because it doesn't contain any IP SANs

[lunar.lab] Configuring NSX Advanced Load Balancer for Tanzu Kubernetes Grid (TKG) on VMware vSphere

Disclaimer:

This is for lunar.lab only and not intended as a guide for production environment.
All deployed on vSphere environment.
I use nested environment.
I use vSphere 7, NSX ALB 20.1.6, TKGm 1.5.1
The official documentation is here: https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.5/vmware-tanzu-kubernetes-grid-15/GUID-mgmt-clusters-install-nsx-adv-lb.html

To support my learning as part of VMware SEAK (South East Asia & Korea) Tanzu Take-12 Program, I try to build my own lab. I don't have the luxury of physical lab on my own, so I build on nested environment my company provided. Here is the first of my documentation.

I started with a base pod which provide me a working vSphere cluster. The networking setup is one distributed switch with following portgroup:

ESXi: management network vmkernel - 192.168.110.0/24
vMotion: vmotion vmkernel - 10.10.30.0/24
storage: storage vmkernel - 10.10.20.0/24
VM: VM management network - 192.168.110.0/24
tkg-vip-network: load balancer/ingress virtual IP - 192.168.120.0/24
tkg-network: management/workload cluster nodes - 192.168.100.0/24
avi-internal: placeholder network for ALB Service Engine

All subnet are routable.

[lunar.lab] NSX-T Deploy & Initial Configuration

Disclaimer:

This is for lunar.lab only and not intended as a guide for production environment. I’m aware that some configurations I use here are not supported configuration.
All deployed on vSphere environment.

I use nested environment.

The official documentation is here: https://docs.vmware.com/en/VMware-NSX-T/2.1/com.vmware.nsxt.install.doc/GUID-414C33B3-674F-44E0-94B8-BFC0B9056B33.html

The documentation is quite clear, I just add some notes specific to my deployment.

Step-by-step:

Deploy

NSX-T Manager: https://docs.vmware.com/en/VMware-NSX-T/2.1/com.vmware.nsxt.install.doc/GUID-FA0ABBBD-34D8-4DA9-882D-085E7E0D269E.html
NSX-T Controller (for lab purpose I only deploy 1 NSX-T Controller): https://docs.vmware.com/en/VMware-NSX-T/2.1/com.vmware.nsxt.install.doc/GUID-24428FD4-EC8F-4063-9CF9-D8136740963A.html
NSX-T Edge (for lab purpose I only deploy 1 NSX-T Edge): Please note that PKS requires Large (8 vCPU 16GB RAM) size of NSX-T Edge Node. https://docs.vmware.com/en/VMware-NSX-T/2.1/com.vmware.nsxt.install.doc/GUID-AECC66D0-C968-4EF2-9CAD-7772B0245BF6.html

vSAN Effective Capacity - Quick and Dirty Sizer Tool

Found this online tool to quickly approximate how much datastore capacity you can get from a certain number of vSAN hosts with certain configuration. I think this is very useful in the case where you already have certain number of physical hosts with vSAN compatible disks, and you want to determine how much capacity you can get from it, and the capacity of cache disk you need to provide to fulfil the 10% recommendation of cache capacity compare to consumed storage (reference). In my experience, I have seen couple of customers where they already have considerably new server hardware with vSAN compatible components where they want to see what they will get if they use vSAN. Please note that VMware has official vSAN Sizing Calculator where you can get recommendation of the hardware specification you need to provide to accommodate certain number of workloads (VMs).

vSphere 5.5 End of General Support Reminder

Maintain Full Level of Support by Upgrading Your vSphere Environment

In 2016 I wrote a note regarding vSphere upgrade because a lot of my customer at that time was doing upgrade due to vSphere 5.1 EoGS (End of General Support) phase. Well, it’s 2018, and this time you may already aware that vSphere 5.5 will enter EoGS phase in 19 September 2018. Which means, if you still have ESXi hosts 5.5 and/or vCenter Server 5.5, you’ll need to upgrade them in order to maintain your level of full support and subscription services as referenced here.

Source: https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/support/product-lifecycle-matrix.pdf

Maximum Supported CPU on Windows Server 2003

Even though Windows Server 2003 has gone into end of support since July 2015, I still find their existence in my customer environment. Last week, a performance issue raised by one of my customer, where it related with application running on Windows Server 2003. The application just migrated from physical to virtual about 2 weeks before the application team observed slower performance during end of month process. When migrated, they changed the configuration from 2 CPU socket x 4 cores/socket into 4 CPU vSocket x 4 vCores/vSocket. Related to that change, my first thought, do we hit any maximum CPU limit? What is the maximum CPU on Windows Server 2003? This article is a self note about maximum CPU supported on Windows Server 2003.

Remember that Windows Server 2003 released in the era where the standard is single core CPU. Looking at the official document from Microsoft such as this document shows the supported Symmetric Multiprocessing (SMP) each edition of Windows Server 2003. For instance, Windows Server 2003 Standard Edition support 4-way SMP. Now the question is what’s defined as 4-way SMP? I found this VMware kb article which says that 4-way SMP means 4 CPU socket (or 4 vSocket in vSphere environment).

Build a Case for Infrastructure & Operation Automation Initiative

This post is part of blog post series focusing on how a company might kickstart their automation or cloud journey.
Part 1: Improve IT Service Delivery Quality with Automation
Part 2: Build a Case for Infrastructure & Operation Automation Initiative (this post)

Couple of days ago my newsfeed brought me to this paper from Gartner with title “How to Measure the Potential Value of Your I&O Automation Initiatives”. You may get access to this publication freely through this link. I found this paper is really interesting as it greatly summarized what I always try to told my customer when they want to start their automation or cloud journey. On my previous post “Improve IT Service Delivery Quality with Automation”, I highlighted some use cases taken by some company to kick start their automation journey, which happened to be aligned with what explained in this Gartner’s publication.

What initially caught my attention and makes me read the whole paper was the recommendations I quoted below:

I&O leaders who are optimizing operations and need to show the value of their automation spend should:

Justify I&O automation initiatives by focusing first on efficiency improvements in labor usage and on effectiveness gains in quality, consistency, agility and risk reduction.

Improve IT Service Delivery Quality with Automation

This post is part of blog post series focusing on how a company might kickstart their automation or cloud journey.
Part 1: Improve IT Service Delivery Quality with Automation (this post)
Part 2: Build a Case for Infrastructure & Operation Automation Initiative

I delivered a session during VMware vForum 2017 Indonesia back in November 2017 about how a company can kick start their automation journey and get the value from it. Automation itself is a very extensive topic and one should choose what to automate carefully. IMO, it should start with repetitive tasks which will bring value to business if automated. Begin with something that is easy to automate, and along the journey, adds more tasks to further complete what requires to deliver a service. Quick wins built confidence. Do not try to automate everything in one go, as it will be complex, costly, and give you headache in supporting it. This post is a rewrite what delivered during my vForum 2017 Indonesia session, which would like to give ideas on where to start and what challenges it will solve by doing it.

There are 4 use cases that I presented:

Accelerate delivery and improve consistency of application environments
Manage VM sprawl by automated lifecycle enforcement
Providing secure access to 3rd party vendor
Continuous Delivery for apps and SDDC

Below I try to explain on challenges that each use cases try to solve, and how VMware can solve that challenges. The solution mainly powered by VMware vRealize Automation.

#NSXUenak: A Testimony from An NSX User

One of my customer spoke in a customer testimony session during VMware vForum 2017 held at Jakarta, Indonesia back in November 2017. One word he kept using to describe his satisfaction in using NSX is UENAK. Uenak, is some kind of an accentuate expression derived from the word enak, which is Bahasa Indonesia for good, comfortable, or pleasant. During his more or less 10 minutes session, he mentioned uenak for at least 5 times to express his feeling after using NSX. This post is a rewrite of what he mentioned during the session.

NSX uenak because it helps on the scalability. Distributed router and distributed firewall move east-west traffic from what it used to be centralized in physical device (core switch or firewall) to be processed in the transport layer (eg. the ESXi hosts). It is important to them as the nature of their business requires flexible scalability. With NSX (and vSphere), every time they add new hosts, they not only add compute (CPU and RAM) capacity, but also network capacity (in term of bandwidth and additional power to process traffic routing and segmentation/isolation.

Infrastructure as Code

Consuming vRA Catalog Item using REST API

I'm preparing a demo to show how developer can consume infrastructure layer by using API. I have the cloud management platform using VMware vRealize Automation up and working, blueprints already created, catalog set for user and ready to consume. Now the question is how to consume the Catalog Item using API. Thankfully a colleague point me to a blog from Ryan Kelly here:
http://www.vmtocloud.com/how-to-script-a-vrealize-automation-7-rest-api-request/.
The article is awesome, it shows step-by-step on how to do it so you will understand the flow, which will be useful for further exploration. And at the end of the article, it wrapped all the steps into one script that user can call to request the catalog item. After that, it only requires to run the script to do the request.

Pages

Sunday, June 26, 2022

Now the question is, how can we prevent container with privileged mode to run on our Kubernetes cluster?

Sunday, June 5, 2022

Problem Statement

Thursday, June 2, 2022

Disclaimer

Problem Statement

Monday, May 30, 2022

Saturday, May 28, 2022

Tuesday, May 24, 2022

Step 1 - IaaS Provider

Sunday, May 22, 2022

Sunday, March 27, 2022

Step 1 - Starting Point: Ubuntu VM

Thursday, March 17, 2022

So you have lost your tanzu cluster Plugin after upgrading Tanzu CLI?

Tuesday, March 15, 2022

Saturday, March 12, 2022

Saturday, June 9, 2018

Sunday, May 27, 2018

Saturday, May 12, 2018

Maintain Full Level of Support by Upgrading Your vSphere Environment

Sunday, May 6, 2018

Thursday, January 25, 2018

Friday, December 29, 2017

Tuesday, December 26, 2017

Thursday, July 27, 2017

Consuming vRA Catalog Item using REST API