Kubernetes – Production-Grade Container Orchestration

Editor's note: Today’s post is by the Infrastructure Platform Department team at JD.com about their transition from OpenStack to Kubernetes. JD.com is one of China’s largest companies and the first Chinese Internet company to make the Global Fortune 500 list.

History of cluster building

The era of physical machines (2004-2014)

Before 2014, our company's applications were all deployed on the physical machine. In the age of physical machines, we needed to wait an average of one week for the allocation to application coming on-line. Due to the lack of isolation, applications would affected each other, resulting in a lot of potential risks. At that time, the average number of tomcat instances on each physical machine was no more than nine. The resource of physical machine was seriously wasted and the scheduling was inflexible. The time of application migration took hours due to the breakdown of physical machines. And the auto-scaling cannot be achieved. To enhance the efficiency of application deployment, we developed compilation-packaging, automatic deployment, log collection, resource monitoring and some other systems.

Containerized era (2014-2016)

The Infrastructure Platform Department (IPD) led by Liu Haifeng--Chief Architect of JD.COM, sought a new resolution in the fall of 2014. Docker ran into our horizon. At that time, docker had been rising, but was slightly weak and lacked of experience in production environment. We had repeatedly tested docker. In addition, docker was customized to fix a couple of issues, such as system crash caused by device mapper and some Linux kernel bugs. We also added plenty of new features into docker, including disk speed limit, capacity management, and layer merging in image building and so on.

To manage the container cluster properly, we chose the architecture of OpenStack + Novadocker driver. Containers are managed as virtual machines. It is known as the first generation of JD container engine platform--JDOS1.0 (JD Datacenter Operating System). The main purpose of JDOS 1.0 is to containerize the infrastructure. All applications run in containers rather than physical machines since then. As for the operation and maintenance of applications, we took full advantage of existing tools. The time for developers to request computing resources in production environment reduced to several minutes rather than a week. After the pooling of computing resources, even the scaling of 1,000 containers would be finished in seconds. Application instances had been isolated from each other. Both the average deployment density of applications and the physical machine utilization had increased by three times, which brought great economic benefits.

We deployed clusters in each IDC and provided unified global APIs to support deployment across the IDC. There are 10,000 compute nodes at most and 4,000 at least in a single OpenStack distributed container cluster in our production environment. The first generation of container engine platform (JDOS 1.0) successfully supported the “6.18” and “11.11” promotional activities in both 2015 and 2016. There are already 150,000 running containers online by November 2016.

“6.18” and “11.11” are known as the two most popular online promotion of JD.COM, similar to the black Friday promotions. Fulfilled orders in November 11, 2016 reached 30 million.

In the practice of developing and promoting JDOS 1.0, applications were migrated directly from physical machines to containers. Essentially, JDOS 1.0 was an implementation of IaaS. Therefore, deployment of applications was still heavily dependent on compilation-packaging and automatic deployment tools. However, the practice of JDOS1.0 is very meaningful. Firstly, we successfully moved business into containers. Secondly, we have a deep understanding of container network and storage, and know how to polish them to the best. Finally, all the experiences lay a solid foundation for us to develop a brand new application container platform.

New container engine platform (JDOS 2.0)

Platform architecture

When JDOS 1.0 grew from 2,000 containers to 100,000, we launched a new container engine platform (JDOS 2.0). The goal of JDOS 2.0 is not just an infrastructure management platform, but also a container engine platform faced to applications. On the basic of JDOS 1.0 and Kubernetes, JDOS 2.0 integrates the storage and network of JDOS 1.0, gets through the process of CI/CD from the source to the image, and finally to the deployment. Also, JDOS 2.0 provides one-stop service such as log, monitor, troubleshooting, terminal and orchestration. The platform architecture of JDOS 2.0 is shown below.

$D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\arc.png$

Function	Product
Source Code Management	Gitlab
Container Tool	Docker
Container Networking	Cane
Container Engine	Kubernetes
Image Registry	Harbor
CI Tool	Jenkins
Log Management	Logstash + Elastic Search
Monitor	Prometheus

In JDOS 2.0, we define two levels, system and application. A system consists of several applications and an application consists of several Pods which provide the same service. In general, a department can apply for one or more systems which directly corresponds to the namespace of Kubernetes. This means that the Pods of the same system will be in the same namespace.

Most of the JDOS 2.0 components (GitLab / Jenkins / Harbor / Logstash / Elastic Search / Prometheus) are also containerized and deployed on the Kubernetes platform.

One Stop Solution

$D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\cicd.png$

JDOS 2.0 takes docker image as the core to implement continuous integration and continuous deployment.
Developer pushes code to git.
Git triggers the jenkins master to generate build job.
Jenkins master invokes Kubernetes to create jenkins slave Pod.
Jenkins slave pulls the source code, compiles and packs.
Jenkins slave sends the package and the Dockerfile to the image build node with docker.
The image build node builds the image.
The image build node pushes the image to the image registry Harbor.
User creates or updates app Pods in different zone.

The docker image in JDOS 1.0 consisted primarily of the operating system and the runtime software stack of the application. So, the deployment of applications was still dependent on the auto-deployment and some other tools. While in JDOS 2.0, the deployment of the application is done during the image building. And the image contains the complete software stack, including App. With the image, we can achieve the goal of running applications as designed in any environment.

$D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\image.png$

Networking and External Service Load Balancing

JDOS 2.0 takes the network solution of JDOS 1.0, which is implemented with the VLAN model of OpenStack Neutron. This solution enables highly efficient communication between containers, making it ideal for a cluster environment within a company. Each Pod occupies a port in Neutron, with a separate IP. Based on the Container Network Interface standard (CNI) standard, we have developed a new project Cane for integrating kubelet and Neutron.

$D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\network.png$

At the same time, Cane is also responsible for the management of LoadBalancer in Kubernetes service. When a LoadBalancer is created / deleted / modified, Cane will call the creating / removing / modifying interface of the lbaas service in Neutron. In addition, the Hades component in the Cane project provides an internal DNS resolution service for the Pods.

The source code of the Cane project is currently being finished and will be released on GitHub soon.

Flexible Scheduling

$D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\schedule.png$ JDOS 2.0 accesses applications, including big data, web applications, deep learning and some other types, and takes more diverse and flexible scheduling approaches. In some IDCs, we experimentally mixed deployment of online tasks and offline tasks. Compared to JDOS 1.0, overall resource utilization increased by about 30%.

Summary

The rich functionality of Kubernetes allows us to pay more attention to the entire ecosystem of the platform, such as network performance, rather than the platform itself. In particular, the SREs highly appreciated the functionality of replication controller. With it, the scaling of the applications is achieved in several seconds. JDOS 2.0 now has accessed about 20% of the applications, and deployed 2 clusters with about 20,000 Pods running daily. We plan to access more applications of our company, to replace the current JDOS 1.0. And we are also glad to share our experience in this process with the community.

Thank you to all the contributors of Kubernetes and the other open source projects.

--Infrastructure Platform Department team at JD.com

Get involved with the Kubernetes project on GitHub
Post questions (or answer questions) on Stack Overflow
DownloadKubernetes
Connect with the community on Slack
Follow us on Twitter @Kubernetesiofor latest updates

Today’s post is by Brendan Burns, Partner Architect, at Microsoft & Kubernetes co-founder.

Containers are revolutionizing the way that people build, package and deploy software. But what is often overlooked is how they are revolutionizing the way that people build the software that builds, packages and deploys software. (it’s ok if you have to read that sentence twice…) Today, and in a talk at Container World tomorrow, I’m taking a look at how container orchestrators like Kubernetes form the foundation for next generation platform as a service (PaaS). In particular, I’m interested in how cloud container as a service (CaaS) platforms like Azure Container Service, Google Container Engine and others are becoming the new infrastructure layer that PaaS is built upon.

To see this, it’s important to consider the set of services that have traditionally been provided by PaaS platforms:

Source code and executable packaging and distribution
Reliable, zero-downtime rollout of software versions
Healing, auto-scaling, load balancing

When you look at this list, it’s clear that most of these traditional “PaaS” roles have now been taken over by containers. The container image and container image build tooling has become the way to package up your application. Container registries have become the way to distribute your application across the world. Reliable software rollout is achieved using orchestrator concepts like Deployment in Kubernetes, and service healing, auto-scaling and load-balancing are all properties of an application deployed in Kubernetes using ReplicaSets and Services.

What then is left for PaaS? Is PaaS going to be replaced by container as a service? I think the answer is “no.” The piece that is left for PaaS is the part that was always the most important part of PaaS in the first place, and that’s the opinionated developer experience. In addition to all of the generic parts of PaaS that I listed above, the most important part of a PaaS has always been the way in which the developer experience and application framework made developers more productive within the boundaries of the platform. PaaS enables developers to go from source code on their laptop to a world-wide scalable service in less than an hour. That’s hugely powerful.

However, in the world of traditional PaaS, the skills needed to build PaaS infrastructure itself, the software on which the user’s software ran, required very strong skills and experience with distributed systems. Consequently, PaaS tended to be built by distributed system engineers rather than experts in a particular vertical developer experience. This means that PaaS platforms tended towards general purpose infrastructure rather than targeting specific verticals. Recently, we have seen this start to change, first with PaaS targeted at mobile API backends, and later with PaaS targeting “function as a service”. However, these products were still built from the ground up on top of raw infrastructure.

More recently, we are starting to see these platforms build on top of container infrastructure. Taking for example “function as a service” there are at least two (and likely more) open source implementations of functions as a service that run on top of Kubernetes (fission and funktion). This trend will only continue. Building a platform as a service, on top of container as a service is easy enough that you could imagine giving it out as an undergraduate computer science assignment. This ease of development means that individual developers with specific expertise in a vertical (say software for running three-dimensional simulations) can and will build PaaS platforms targeted at that specific vertical experience. In turn, by targeting such a narrow experience, they will build an experience that fits that narrow vertical perfectly, making their solution a compelling one in that target market.

This then points to the other benefit of next generation PaaS being built on top of container as a service. It frees the developer from having to make an “all-in” choice on a particular PaaS platform. When layered on top of container as a service, the basic functionality (naming, discovery, packaging, etc) are all provided by the CaaS and thus common across multiple PaaS that happened to be deployed on top of that CaaS. This means that developers can mix and match, deploying multiple PaaS to the same container infrastructure, and choosing for each application the PaaS platform that best suits that particular platform. Also, importantly, they can choose to “drop down” to raw CaaS infrastructure if that is a better fit for their application. Freeing PaaS from providing the infrastructure layer, enables PaaS to diversify and target specific experiences without fear of being too narrow. The experiences become more targeted, more powerful, and yet by building on top of container as a service, more flexible as well.

Kubernetes is infrastructure for next generation applications, PaaS and more. Given this, I’m really excited by our announcement today that Kubernetes on Azure Container Service has reached general availability. When you deploy your next generation application to Azure, whether on a PaaS or deployed directly onto Kubernetes itself (or both) you can deploy it onto a managed, supported Kubernetes cluster.

Furthermore, because we know that the world of PaaS and software development in general is a hybrid one, we’re excited to announce the preview availability of Windows clusters in Azure Container Service. We’re also working on hybrid clusters in ACS-Engine and expect to roll those out to general availability in the coming months.

I’m thrilled to see how containers and container as a service is changing the world of compute, I’m confident that we’re only scratching the surface of the transformation we’ll see in the coming months and years.

--Brendan Burns, Partner Architect, at Microsoft and co-founder of Kubernetes

Get involved with the Kubernetes project on GitHub
Post questions (or answer questions) on Stack Overflow
DownloadKubernetes
Connect with the community on Slack
Follow us on Twitter @Kubernetesiofor latest updates

Editor’s note: Today’s guest post is by Jeff McCormick, a developer at Crunchy Data, showing how to build a PostgreSQL cluster using the new Kubernetes StatefulSet feature.

In an earlier post, I described how to deploy a PostgreSQL cluster using Helm, a Kubernetes package manager. The following example provides the steps for building a PostgreSQL cluster using the new Kubernetes StatefulSets feature.

StatefulSets Example

Step 1 - Create Kubernetes Environment

StatefulSets is a new feature implemented in Kubernetes 1.5 (prior versions it was known as PetSets). As a result, running this example will require an environment based on Kubernetes 1.5.0 or above.

The example in this blog deploys on Centos7 using kubeadm. Some instructions on what kubeadm provides and how to deploy a Kubernetes cluster is located here.

Step 2 - Install NFS

The example in this blog uses NFS for the Persistent Volumes, but any shared file system would also work (ex: ceph, gluster).

The example script assumes your NFS server is running locally and your hostname resolves to a known IP address.

In summary, the steps used to get NFS working on a Centos 7 host are as follows:

sudo setsebool -P virt_use_nfs 1

sudo yum -y install nfs-utils libnfsidmap

sudo systemctl enable rpcbind nfs-server

sudo systemctl start rpcbind nfs-server rpc-statd nfs-idmapd

sudo mkdir /nfsfileshare

sudo chmod 777 /nfsfileshare/

sudo vi /etc/exports

sudo exportfs -r

The /etc/exports file should contain a line similar to this one except with the applicable IP address specified:

/nfsfileshare 192.168.122.9(rw,sync)

After these steps NFS should be running in the test environment.

Step 3 - Clone the Crunchy PostgreSQL Container Suite

The example used in this blog is found at in the Crunchy Containers GitHub repo here. Clone the Crunchy Containers repository to your test Kubernertes host and go to the example:

cd $HOME

git clone https://github.com/CrunchyData/crunchy-containers.git

cd crunchy-containers/examples/kube/statefulset

Next, pull down the Crunchy PostgreSQL container image:

docker pull crunchydata/crunchy-postgres:centos7-9.5-1.2.6

Step 4 - Run the Example

To begin, it is necessary to set a few of the environment variables used in the example:

export BUILDBASE=$HOME/crunchy-containers

export CCP_IMAGE_TAG=centos7-9.5-1.2.6

BUILDBASE is where you cloned the repository and CCP_IMAGE_TAG is the container image version we want to use.

Next, run the example:

./run.sh

That script will create several Kubernetes objects including:

Persistent Volumes (pv1, pv2, pv3)
Persistent Volume Claim (pgset-pvc)
Service Account (pgset-sa)
Services (pgset, pgset-master, pgset-replica)
StatefulSet (pgset)
Pods (pgset-0, pgset-1)

At this point, two pods will be running in the Kubernetes environment:

$ kubectl get pod

NAME READY STATUS RESTARTS AGE

pgset-0 1/1 Running 0 2m

pgset-1 1/1 Running 1 2m

Immediately after the pods are created, the deployment will be as depicted below:

Step 5 - What Just Happened?

This example will deploy a StatefulSet, which in turn creates two pods.

The containers in those two pods run the PostgreSQL database. For a PostgreSQL cluster, we need one of the containers to assume the master role and the other containers to assume the replica role.

So, how do the containers determine who will be the master, and who will be the replica?

This is where the new StateSet mechanics come into play. The StateSet mechanics assign a unique ordinal value to each pod in the set.

The StatefulSets provided unique ordinal value always start with 0. During the initialization of the container, each container examines its assigned ordinal value. An ordinal value of 0 causes the container to assume the master role within the PostgreSQL cluster. For all other ordinal values, the container assumes a replica role. This is a very simple form of discovery made possible by the StatefulSet mechanics.

PostgreSQL replicas are configured to connect to the master database via a Service dedicated to the master database. In order to support this replication, the example creates a separate Service for each of the master role and the replica role. Once the replica has connected, the replica will begin replicating state from the master.

During the container initialization, a master container will use a Service Account (pgset-sa) to change it’s container label value to match the master Service selector. Changing the label is important to enable traffic destined to the master database to reach the correct container within the Stateful Set. All other pods in the set assume the replica Service label by default.

Step 6 - Deployment Diagram

The example results in a deployment depicted below:

In this deployment, there is a Service for the master and a separate Service for the replica. The replica is connected to the master and replication of state has started.

The Crunchy PostgreSQL container supports other forms of cluster deployment, the style of deployment is dictated by setting the PG_MODE environment variable for the container. In the case of a StatefulSet deployment, that value is set to: PG_MODE=set

This environment variable is a hint to the container initialization logic as to the style of deployment we intend.

Step 7 - Testing the Example

The tests below assume that the psql client has been installed on the test system. If if not, the psql client has been previously installed, it can be installed as follows:

sudo yum -y install postgresql

In addition, the tests below assume that the tested environment DNS resolves to the Kube DNS and that the tested environment DNS search path is specified to match the applicable Kube namespace and domain. The master service is named pgset-master and the replica service is named pgset-replica.

Test the master as follows (the password is password):

psql -h pgset-master -U postgres postgres -c 'table pg_stat_replication'

If things are working, the command above will return output indicating that a single replica is connecting to the master.

Next, test the replica as follows:

psql -h pgset-replica -U postgres postgres -c 'create table foo (id int)'

The command above should fail as the replica is read-only within a PostgreSQL cluster.

Next, scale up the set as follows:

kubectl scale statefulset pgset --replicas=3

The command above should successfully create a new replica pod called pgset-2 as depicted below:

Step 8 - Persistence Explained

Take a look at the persisted PostgreSQL data files on the resulting NFS mount path:

$ ls -l /nfsfileshare/

total 12

drwx------ 20 26 26 4096 Jan 17 16:35 pgset-0

drwx------ 20 26 26 4096 Jan 17 16:35 pgset-1

drwx------ 20 26 26 4096 Jan 17 16:48 pgset-2

Each container in the stateful set binds to the single NFS Persistent Volume Claim (pgset-pvc) created in the example script.

Since NFS and the PVC can be shared, each pod can write to this NFS path.

The container is designed to create a subdirectory on that path using the pod host name for uniqueness.

Conclusion

StatefulSets is an exciting feature added to Kubernetes for container builders that are implementing clustering. The ordinal values assigned to the set provide a very simple mechanism to make clustering decisions when deploying a PostgreSQL cluster.

--Jeff McCormick, Developer, Crunchy Data

Editor's note: Today’s post is by Ryan Quackenbush, Advocacy Programs Manager at Apprenda, showing a new community portal for Kubernetes advocates: the K8sPort.

The K8sPort is a hub designed to help you, the Kubernetes community, earn credit for the hard work you’re putting forth in making this one of the most successful open source projects ever. Back at KubeCon Seattle in November, I presented a lightning talk of a preview of K8sPort.

This hub, and our intentions in helping to drive this initiative in the community, grew out of a desire to help cultivate an engaged community of Kubernetes advocates. This is done through gamification in a community hub full of different activities called “challenges,” which are activities meant to help direct members of the community to attend various events and meetings, share and provide feedback on important content, answer questions posed on sites like Stack Overflow, and more.

By completing these challenges, you collect points and can redeem them for different types of rewards and experiences, examples of which include charitable donations, gift certificates, conference tickets and more. As advocates complete challenges and gain points, they’ll earn performance-related badges, move up in community tiers and participate in a fun community leaderboard.

My presentation at KubeCon, simply put, was a call for early signups. Those who’ve been piloting the program have, for the most part, had positive things to say about their experiences.

I know I'm the only one playing with @K8sPort but it may be the most important thing the #Kubernetes community has.
— Justin Garrison (@rothgar) November 22, 2016

“Great way of improving the community and documentation. The gamification of Kubernetes gave me more insight into the stack as well.”
- Jonas Kint, Devops Engineer at Showpad
“A great way to engage with the kubernetes project and also help the community. Fun stuff.”
- Kevin Duane, Systems Engineer at The Walt Disney Company
“K8sPort seems like an awesome idea for incentivising giving back to the community in a way that will hopefully cause more valuable help from more people than might usually be helping.”
- William Stewart, Site Reliability Engineer at Superbalist

Today I am pleased to announce that the Cloud Native Computing Foundation (CNCF) is making the K8sPort generally available to the entire contributing community! We’ve simplified the signup process by allowing would-be advocates to authenticate and register through the use of their existing GitHub accounts.

Screen Shot 2017-03-22 at 10.59.04 AM.png

If you’re a contributing member of the Kubernetes community and you have an active GitHub account tied to the Kubernetes repository at GitHub, you can authenticate using your GitHub credentials and gain access to the K8sPort.

Screen Shot 2017-03-22 at 12.48.08 PM.png

Screen Shot 2017-03-22 at 12.48.08 PM.png

Beyond the challenges that get posted regularly, community members will be recognized and compile points for things they’re already doing today. This will be accomplished through the K8sPort’s full integration with GitHub and the core Kubernetes repository. Once you authenticate, you’ll automatically begin earning points and recognition for various contributions -- including logging issues, making pull requests, code commits & more.

If you’re interested in joining the advocacy hub, please join us at k8sport.org! We hope you’re as excited about what you see as we are to continue to build it and present it to you.

For a quick walkthrough on K8sPort authentication and the hub itself, see this quick demo, below.

--Ryan Quackenbush, Advocacy Programs Manager, Apprenda

Today we’re announcing the release of Kubernetes 1.6.

In this release the community’s focus is on scale and automation, to help you deploy multiple workloads to multiple users on a cluster. We are announcing that 5,000 node clusters are supported. We moved dynamic storage provisioning to stable. Role-based access control (RBAC), kubefed, kubeadm, and several scheduling features are moving to beta. We have also added intelligent defaults throughout to enable greater automation out of the box.

What’s New

Scale and Federation: Large enterprise users looking for proof of at-scale performance will be pleased to know that Kubernetes’ stringent scalability SLO now supports 5,000 node (150,000 pod) clusters. This 150% increase in total cluster size, powered by a new version of etcd v3 by CoreOS, is great news if you are deploying applications such as search or games which can grow to consume larger clusters.

For users who want to scale beyond 5,000 nodes or spread across multiple regions or clouds, federation lets you combine multiple Kubernetes clusters and address them through a single API endpoint. In this release, the kubefed command line utility graduated to beta - with improved support for on-premise clusters. kubefed now automatically configureskube-dns on joining clusters and can pass arguments to federated components.

Security and Setup: Users concerned with security will find that RBAC, now beta adds a significant security benefit through more tightly scoped default roles for system components. The default RBAC policies in 1.6 grant scoped permissions to control-plane components, nodes, and controllers. RBAC allows cluster administrators to selectively grant particular users or service accounts fine-grained access to specific resources on a per-namespace basis. RBAC users upgrading from 1.5 to 1.6 should view the guidance here.

Users looking for an easy way to provision a secure cluster on physical or cloud servers can use kubeadm, which is now beta. kubeadm has been enhanced with a set of command line flags and a base feature set that includes RBAC setup, use of the Bootstrap Token system and an enhanced Certificates API.

Advanced Scheduling: This release adds a set of powerful and versatile scheduling constructs to give you greater control over how pods are scheduled, including rules to restrict pods to particular nodes in heterogeneous clusters, and rules to spread or pack pods across failure domains such as nodes, racks, and zones.

Node affinity/anti-affinity, now in beta, allows you to restrict pods to schedule only on certain nodes based on node labels. Use built-in or custom node labels to select specific zones, hostnames, hardware architecture, operating system version, specialized hardware, etc. The scheduling rules can be required or preferred, depending on how strictly you want the scheduler to enforce them.

A related feature, called taints and tolerations, makes it possible to compactly represent rules for excluding pods from particular nodes. The feature, also now in beta, makes it easy, for example, to dedicate sets of nodes to particular sets of users, or to keep nodes that have special hardware available for pods that need the special hardware by excluding pods that don’t need it.

Sometimes you want to co-schedule services, or pods within a service, near each other topologically, for example to optimize North-South or East-West communication. Or you want to spread pods of a service for failure tolerance, or keep antagonistic pods separated, or ensure sole tenancy of nodes. Pod affinity and anti-affinity, now in beta, enables such use cases by letting you set hard or soft requirements for spreading and packing pods relative to one another within arbitrary topologies (node, zone, etc.).

Lastly, for the ultimate in scheduling flexibility, you can run your own custom scheduler(s) alongside, or instead of, the default Kubernetes scheduler. Each scheduler is responsible for different sets of pods. Multiple schedulers is beta in this release.

Dynamic Storage Provisioning: Users deploying stateful applications will benefit from the extensive storage automation capabilities in this release of Kubernetes.

Since its early days, Kubernetes has been able to automatically attach and detach storage, format disk, mount and unmount volumes per the pod spec, and do so seamlessly as pods move between nodes. In addition, the PersistentVolumeClaim (PVC) and PersistentVolume (PV) objects decouple the request for storage from the specific storage implementation, making the pod spec portable across a range of cloud and on-premise environments. In this release StorageClass and dynamic volume provisioning are promoted to stable, completing the automation story by creating and deleting storage on demand, eliminating the need to pre-provision.

The design allows cluster administrators to define and expose multiple flavors of storage within a cluster, each with a custom set of parameters. End users can stop worrying about the complexity and nuances of how storage is provisioned, while still selecting from multiple storage options.

In 1.6 Kubernetes comes with a set of built-in defaults to completely automate the storage provisioning lifecycle, freeing you to work on your applications. Specifically, Kubernetes now pre-installs system-defined StorageClass objects for AWS, Azure, GCP, OpenStack and VMware vSphere by default. This gives Kubernetes users on these providers the benefits of dynamic storage provisioning without having to manually setup StorageClass objects. This is a change in the default behavior of PVC objects on these clouds. Note that default behavior is that dynamically provisioned volumes are created with the “delete” reclaim policy. That means once the PVC is deleted, the dynamically provisioned volume is automatically deleted so users do not have the extra step of ‘cleaning up’.

In addition, we have expanded the range of storage supported overall including:

ScaleIO Kubernetes Volume Plugin enabling pods to seamlessly access and use data stored on ScaleIO volumes.
Portworx Kubernetes Volume Plugin adding the capability to use Portworx as a storage provider for Kubernetes clusters. Portworx pools your server capacity and turns your servers or cloud instances into converged, highly available compute and storage nodes.
Support for NFSv3, NFSv4, and GlusterFS on clusters using the COS node image
Support for user-written/run dynamic PV provisioners. A golang library and examples can be found here.
Beta support for mount options in persistent volumes

Container Runtime Interface, etcd v3 and Daemon set updates: while users may not directly interact with the container runtime or the API server datastore, they are foundational components for user facing functionality in Kubernetes’. As such the community invests in expanding the capabilities of these and other system components.

The Docker-CRI implementation is beta and is enabled by default in kubelet. Alpha support for other runtimes, cri-o, frakti, rkt, has also been implemented.
The default backend storage for the API server has been upgraded to use etcd v3 by default for new clusters. If you are upgrading from a 1.5 cluster, care should be taken to ensure continuity by planning a data migration window.
Node reliability is improved as Kubelet exposes an admin configurable Node Allocatable feature to reserve compute resources for system daemons.
Daemon set updates lets you perform rolling updates on a daemon set

Alpha features: this release was mostly focused on maturing functionality, however, a few alpha features were added to support the roadmap

Out-of-tree cloud provider support adds a new cloud-controller-manager binary that may be used for testing the new out-of-core cloud provider flow
Per-pod-eviction in case of node problems combined with tolerationSeconds, lets users tune the duration a pod stays bound to a node that is experiencing problems
Pod Injection Policy adds a new API resource PodPreset to inject information such as secrets, volumes, volume mounts, and environment variables into pods at creation time.
Custom metrics support in the Horizontal Pod Autoscaler changed to use
Multiple Nvidia GPU support is introduced with the Docker runtime only

These are just some of the highlights in our first release for the year. For a complete list please visit the release notes.

Community

This release is possible thanks to our vast and open community. Together, we’ve pushed nearly 5,000 commits by some 275 authors. To bring our many advocates together, the community has launched a new program called K8sPort, an online hub where the community can participate in gamified challenges and get credit for their contributions. Read more about the program here.

Release Process

A big thanks goes out to the release team for 1.6 (lead by Dan Gillespie of CoreOS) for their work bringing the 1.6 release to light. This release team is an exemplar of the Kubernetes community’s commitment to community governance. Dan is the first non-Google release manager and he, along with the rest of the team, worked throughout the release (building on the 1.5 release manager, Saad Ali’s, great work) to uncover and document tribal knowledge, shine light on tools and processes that still require special permissions, and prioritize work to improve the Kubernetes release process. Many thanks to the team.

User Adoption

We’re continuing to see rapid adoption of Kubernetes in all sectors and sizes of businesses. Furthermore, adoption is coming from across the globe, from a startup in Tennessee, USA to a Fortune 500 company in China.

JD.com, one of China's largest internet companies, uses Kubernetes in conjunction with their OpenStack deployment. They’ve move 20% of their applications thus far on Kubernetes and are already running 20,000 pods daily. Read more about their setup here.

Spire, a startup based in Tennessee, witnessed their public cloud provider experience an outage, but suffered zero downtime because Kubernetes was able to move their workloads to different zones. Read their full experience here.

“With Kubernetes, there was never a moment of panic, just a sense of awe watching the automatic mitigation as it happened.”

Share your Kubernetes use case story with the community here.

Availability

Kubernetes 1.6 is available for download here on GitHub and via get.k8s.io. To get started with Kubernetes, try one of the these interactive tutorials.

Get Involved

CloudNativeCon + KubeCon in Berlin is this week March 29-30, 2017. We hope to get together with much of the community and share more there!

Share your voice at our weekly community meeting:

Post questions (or answer questions) on Stack Overflow
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack

Many thanks for your contributions and advocacy!

-- Aparna Sinha, Senior Product Manager, Kubernetes, Google

With the help of our growing community of 1,110 plus contributors, we pushed around 5,000 commits to deliver Kubernetes 1.6, bringing focus on multi-user, multi-workloads at scale. While many improvements have been contributed, we selected few features to highlight in a series of in-depths posts listed below.

Follow along and read what’s new:

Day 1	* Dynamic Provisioning and Storage Classes in Kubernetes Stable in 1.6
Day 2	* ???
Day 3	* ???
Day 4	* ???
Day 5	* ??

Connect

Follow us on Twitter @Kubernetesio for latest updates
Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Get involved with the Kubernetes project on GitHub
Connect with the community on Slack
Download Kubernetes

Editor’s note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.6

Storage is a critical part of running stateful containers, and Kubernetes offers powerful primitives for managing it. Dynamic volume provisioning, a feature unique to Kubernetes, allows storage volumes to be created on-demand. Before dynamic provisioning, cluster administrators had to manually make calls to their cloud or storage provider to provision new storage volumes, and then create PersistentVolume objects to represent them in Kubernetes. With dynamic provisioning, these two steps are automated, eliminating the need for cluster administrators to pre-provision storage. Instead, the storage resources can be dynamically provisioned using the provisioner specified by the StorageClass object (see user-guide). StorageClasses are essentially blueprints that abstract away the underlying storage provider, as well as other parameters, like disk-type (e.g.; solid-state vs standard disks).

StorageClasses use provisioners that are specific to the storage platform or cloud provider to give Kubernetes access to the physical media being used. Several storage provisioners are provided in-tree (see user-guide), but additionally out-of-tree provisioners are now supported (see kubernetes-incubator).

In the Kubernetes 1.6 release, dynamic provisioning has been promoted to stable (having entered beta in 1.4). This is a big step forward in completing the Kubernetes storage automation vision, allowing cluster administrators to control how resources are provisioned and giving users the ability to focus more on their application. With all of these benefits, there are a few important user-facing changes (discussed below) that are important to understand before using Kubernetes 1.6.

Storage Classes and How to Use them

StorageClasses are the foundation of dynamic provisioning, allowing cluster administrators to define abstractions for the underlying storage platform. Users simply refer to a StorageClass by name in the PersistentVolumeClaim (PVC) using the “storageClassName” parameter.

In the following example, a PVC refers to a specific storage class named “gold”.

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

namespace: testns

spec:

accessModes:

- ReadWriteOnce

resources:

requests:

storage: 100Gi

storageClassName: gold

In order to promote the usage of dynamic provisioning this feature permits the cluster administrator to specify a defaultStorageClass. When present, the user can create a PVC without having specifying a storageClassName, further reducing the user’s responsibility to be aware of the underlying storage provider. When using default StorageClasses, there are some operational subtleties to be aware of when creating PersistentVolumeClaims (PVCs). This is particularly important if you already have existing PersistentVolumes (PVs) that you want to re-use:

PVs that are already “Bound” to PVCs will remain bound with the move to 1.6

They will not have a StorageClass associated with them unless the user manually adds it
If PVs become “Available” (i.e.; if you delete a PVC and the corresponding PV is recycled), then they are subject to the following

If storageClassNameis not specified in the PVC, the default storage class will be used for provisioning.

Existing, “Available”, PVs that do not have the default storage class label will not be considered for binding to the PVC

If storageClassNameis set to an empty string(‘’) in the PVC, no storage class will be used (i.e.; dynamic provisioning is disabled for this PVC)

Existing, “Available”, PVs (that do not have a specified storageClassName) will be considered for binding to the PVC

If storageClassName is set to a specific value, then the matching storage class will be used

Existing, “Available”, PVs that have a matching storageClassNamewill be considered for binding to the PVC
If no corresponding storage class exists, the PVC will fail.

To reduce the burden of setting up default StorageClasses in a cluster, beginning with 1.6, Kubernetes installs (via the add-on manager) default storage classes for several cloud providers. To use these default StorageClasses, users do not need refer to them by name – that is, storageClassName need not be specified in the PVC.

The following table provides more detail on default storage classes pre-installed by cloud provider as well as the specific parameters used by these defaults.

Cloud Provider	Default StorageClass Name	Default Provisioner
Amazon Web Services	gp2	aws-ebs
Microsoft Azure	standard	azure-disk
Google Cloud Platform	standard	gce-pd
OpenStack	standard	cinder
VMware vSphere	thin	vsphere-volume

While these pre-installed default storage classes are chosen to be “reasonable” for most storage users, this guide provides instructions on how to specify your own default.

Dynamically Provisioned Volumes and the Reclaim Policy

All PVs have a reclaim policy associated with them that dictates what happens to a PV once it becomes released from a claim (see user-guide). Since the goal of dynamic provisioning is to completely automate the lifecycle of storage resources, the default reclaim policy for dynamically provisioned volumes is “delete”. This means that when a PersistentVolumeClaim (PVC) is released, the dynamically provisioned volume is de-provisioned (deleted) on the storage provider and the data is likely irretrievable. If this is not the desired behavior, the user must change the reclaim policy on the corresponding PersistentVolume (PV) object after the volume is provisioned.

How do I change the reclaim policy on a dynamically provisioned volume?

You can change the reclaim policy by editing the PV object and changing the “persistentVolumeReclaimPolicy” field to the desired value. For more information on various reclaim policies see user-guide.

FAQs

How do I use a default StorageClass?

If your cluster has a default StorageClass that meets your needs, then all you need to do is create a PersistentVolumeClaim (PVC) and the default provisioner will take care of the rest – there is no need to specify the storageClassName:

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

namespace: testns

spec:

accessModes:

- ReadWriteOnce

resources:

requests:

storage: 100Gi

Can I add my own storage classes?
Yes. To add your own storage class, first determine which provisioners will work in your cluster. Then, create a StorageClass object with parameters customized to meet your needs (see user-guide for more detail). For many users, the easiest way to create the object is to write a yaml file and apply it with “kubectl create -f”. The following is an example of a StorageClass for Google Cloud Platform named “gold” that creates a “pd-ssd”. Since multiple classes can exist within a cluster, the administrator may leave the default enabled for most workloads (since it uses a “pd-standard”), with the “gold” class reserved for workloads that need extra performance.

kind: StorageClass

apiVersion: storage.k8s.io/v1

metadata:

provisioner: kubernetes.io/gce-pd

parameters:

type: pd-ssd

How do I check if I have a default StorageClass Installed?

You can use kubectl to check for StorageClass objects. In the example below there are two storage classes: “gold” and “standard”. The “gold” class is user-defined, and the “standard” class is installed by Kubernetes and is the default.

$ kubectl get sc

NAME TYPE

gold kubernetes.io/gce-pd

standard (default) kubernetes.io/gce-pd

$ kubectl describe storageclass standard

Name: standard

IsDefaultClass:Yes

Annotations:storageclass.beta.kubernetes.io/is-default-class=true

Provisioner:kubernetes.io/gce-pd

Parameters:type=pd-standard

Events: <none>

Can I delete/turn off the default StorageClasses?
You cannot delete the default storage class objects provided. Since they are installed as cluster addons, they will be recreated if they are deleted.

You can, however, disable the defaulting behavior by removing (or setting to false) the following annotation: storageclass.beta.kubernetes.io/is-default-class.

If there are no StorageClass objects marked with the default annotation, then PersistentVolumeClaim objects (without a StorageClass specified) will not trigger dynamic provisioning. They will, instead, fall back to the legacy behavior of binding to an available PersistentVolume object.

Can I assign my existing PVs to a particular StorageClass?
Yes, you can assign a StorageClass to an existing PV by editing the appropriate PV object and adding (or setting) the desired storageClassName field to it.

What happens if I delete a PersistentVolumeClaim (PVC)?
If the volume was dynamically provisioned, then the default reclaim policy is set to “delete”. This means that, by default, when the PVC is deleted, the underlying PV and storage asset will also be deleted. If you want to retain the data stored on the volume, then you must change the reclaim policy from “delete” to “retain” after the PV is provisioned.

--Saad Ali & Michelle Au, Software Engineers, and Matthew De Lio, Product Manager, Google

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Get involved with the Kubernetes project on GitHub
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack
Download Kubernetes

Editor’s note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.6

Last summer we shared updates on Kubernetes scalability, since then we’ve been working hard and are proud to announce that Kubernetes 1.6 can handle 5,000-node clusters with up to 150,000 pods. Moreover, those cluster have even better end-to-end pod startup time than the previous 2,000-node clusters in the 1.3 release; and latency of the API calls are within the one-second SLO.

In this blog post we review what metrics we monitor in our tests and describe our performance results from Kubernetes 1.6. We also discuss what changes we made to achieve the improvements, and our plans for upcoming releases in the area of system scalability.

X-node clusters - what does it mean?

Now that Kubernetes 1.6 is released, it is a good time to review what it means when we say we “support” X-node clusters. As described in detail in a previous blog post, we currently have two performance-related Service Level Objectives (SLO):

API-responsiveness: 99% of all API calls return in less than 1s
Pod startup time: 99% of pods and their containers (with pre-pulled images) start within 5s.

As before, it is possible to run larger deployments than the stated supported 5,000-node cluster (and users have), but performance may be degraded and it may not meet our strict SLO defined above.

We are aware of the limited scope of these SLOs. There are many aspects of the system that they do not exercise. For example, we do not measure how soon a new pod that is part of a service will be reachable through the service IP address after the pod is started. If you are considering using large Kubernetes clusters and have performance requirements not covered by our SLOs, please contact the Kubernetes Scalability SIG so we can help you understand whether Kubernetes is ready to handle your workload now.

The top scalability-related priority for upcoming Kubernetes releases is to enhance our definition of what it means to support X-node clusters by:

refining currently existing SLOs
adding more SLOs (that will cover various areas of Kubernetes, including networking)

Kubernetes 1.6 performance metrics at scale

So how does performance in large clusters look in Kubernetes 1.6? The following graph shows the end-to-end pod startup latency with 2000- and 5000-node clusters. For comparison, we also show the same metric from Kubernetes 1.3, which we published in our previous scalability blog post that described support for 2000-node clusters. As you can see, Kubernetes 1.6 has better pod startup latency with both 2000 and 5000 nodes compared to Kubernetes 1.3 with 2000 nodes [1].

The next graph shows API response latency for a 5000-node Kubernetes 1.6 cluster. The latencies at all percentiles are less than 500ms, and even 90th percentile is less than about 100ms.

How did we get here?

Over the past nine months (since the last scalability blog post), there have been a huge number of performance and scalability related changes in Kubernetes. In this post we will focus on the two biggest ones and will briefly enumerate a few others.

etcd v3
In Kubernetes 1.6 we switched the default storage backend (key-value store where the whole cluster state is stored) from etcd v2 to etcd v3. The initial works towards this transition has been started during the 1.3 release cycle. You might wonder why it took us so long, given that:

the first stable version of etcd supporting the v3 API was announced on June 30, 2016
the new API was designed together with the Kubernetes team to support our needs (from both a feature and scalability perspective)
the integration of etcd v3 with Kubernetes had already mostly been finished when etcd v3 was announced (indeed CoreOS used Kubernetes as a proof-of-concept for the new etcd v3 API)

As it turns out, there were a lot of reasons. We will describe the most important ones below.

Changing storage in a backward incompatible way, as is in the case for the etcd v2 to v3 migration, is a big change, and thus one for which we needed a strong justification. We found this justification in September when we determined that we would not be able to scale to 5000-node clusters if we continued to use etcd v2 (kubernetes/32361 contains some discussion about it). In particular, what didn’t scale was the watch implementation in etcd v2. In a 5000-node cluster, we need to be able to send at least 500 watch events per second to a single watcher, which wasn’t possible in etcd v2.
Once we had the strong incentive to actually update to etcd v3, we started thoroughly testing it. As you might expect, we found some issues. There were some minor bugs in Kubernetes, and in addition we requested a performance improvement in etcd v3’s watch implementation (watch was the main bottleneck in etcd v2 for us). This led to the 3.0.10 etcd patch release.
Once those changes had been made, we were convinced that new Kubernetes clusters would work with etcd v3. But the large challenge of migrating existing clusters remained. For this we needed to automate the migration process, thoroughly test the underlying CoreOS etcd upgrade tool, and figure out a contingency plan for rolling back from v3 to v2.

But finally, we are confident that it should work.

Switching storage data format to protobuf
In the Kubernetes 1.3 release, we enabled protobufs as the data format for Kubernetes components to communicate with the API server (in addition to maintaining support for JSON). This gave us a huge performance improvement.

However, we were still using JSON as a format in which data was stored in etcd, even though technically we were ready to change that. The reason for delaying this migration was related to our plans to migrate to etcd v3. Now you are probably wondering how this change was depending on migration to etcd v3. The reason for it was that with etcd v2 we couldn’t really store data in binary format (to workaround it we were additionally base64-encoding the data), whereas with etcd v3 it just worked. So to simplify the transition to etcd v3 and avoid some non-trivial transformation of data stored in etcd during it, we decided to wait with switching storage data format to protobufs until migration to etcd v3 storage backend is done.

Other optimizations
We made tens of optimizations throughout the Kubernetes codebase during the last three releases, including:

optimizing the scheduler (which resulted in 5-10x higher scheduling throughput)
switching all controllers to a new recommended design using shared informers, which reduced resource consumption of controller-manager - for reference see this document
optimizing individual operations in the API server (conversions, deep-copies, patch)
reducing memory allocation in the API server (which significantly impacts the latency of API calls)

We want to emphasize that the optimization work we have done during the last few releases, and indeed throughout the history of the project, is a joint effort by many different companies and individuals from the whole Kubernetes community.

What’s next?

People frequently ask how far we are going to go in improving Kubernetes scalability. Currently we do not have plans to increase scalability beyond 5000-node clusters (within our SLOs) in the next few releases. If you need clusters larger than 5000 nodes, we recommend to use federation to aggregate multiple Kubernetes clusters.

However, that doesn’t mean we are going to stop working on scalability and performance. As we mentioned at the beginning of this post, our top priority is to refine our two existing SLOs and introduce new ones that will cover more parts of the system, e.g. networking. This effort has already started within the Scalability SIG. We have made significant progress on how we would like to define performance SLOs, and this work should be finished in the coming month.

Join the effort
If you are interested in scalability and performance, please join our community and help us shape Kubernetes. There are many ways to participate, including:

Chat with us in the Kubernetes Slack scalability channel:
Join our Special Interest Group, SIG-Scalability, which meets every Thursday at 9:00 AM PST

Thanks for the support and contributions! Read more in-depth posts on what's new in Kubernetes 1.6 here.

-- Wojciech Tyczynski, Software Engineer, Google

[1] We are investigating why 5000-node clusters have better startup time than 2000-node clusters. The current theory is that it is related to running 5000-node experiments using 64-core master and 2000-node experiments using 32-core master.

Editor’s note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.6

The Kubernetes scheduler’s default behavior works well for most cases -- for example, it ensures that pods are only placed on nodes that have sufficient free resources, it ties to spread pods from the same set (ReplicaSet, StatefulSet, etc.) across nodes, it tries to balance out the resource utilization of nodes, etc.

But sometimes you want to control how your pods are scheduled. For example, perhaps you want to ensure that certain pods only schedule on nodes with specialized hardware, or you want to co-locate services that communicate frequently, or you want to dedicate a set of nodes to a particular set of users. Ultimately, you know much more about how your applications should be scheduled and deployed than Kubernetes ever will. So Kubernetes 1.6 offers four advanced scheduling features: node affinity/anti-affinity, taints and tolerations, pod affinity/anti-affinity, and custom schedulers. Each of these features are now in beta in Kubernetes 1.6.

Node Affinity/Anti-Affinity

Node Affinity/Anti-Affinity is one way to set rules on which nodes are selected by the scheduler. This feature is a generalization of the nodeSelector feature which has been in Kubernetes since version 1.0. The rules are defined using the familiar concepts of custom labels on nodes and selectors specified in pods, and they can be either required or preferred, depending on how strictly you want the scheduler to enforce them.

Required rules must be met for a pod to schedule on a particular node. If no node matches the criteria (plus all of the other normal criteria, such as having enough free resources for the pod’s resource request), then the pod won’t be scheduled. Required rules are specified in the requiredDuringSchedulingIgnoredDuringExecution field of nodeAffinity.

For example, if we want to require scheduling on a node that is in the us-central1-a GCE zone of a multi-zone Kubernetes cluster, we can specify the following affinity rule as part of the Pod spec:

affinity:

nodeAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

- matchExpressions:

- key: "failure-domain.beta.kubernetes.io/zone"

operator: In

values: ["us-central1-a"]

“IgnoredDuringExecution” means that the pod will still run if labels on a node change and affinity rules are no longer met. There are future plans to offer requiredDuringSchedulingRequiredDuringExecution which will evict pods from nodes as soon as they don’t satisfy the node affinity rule(s).

Preferred rules mean that if nodes match the rules, they will be chosen first, and only if no preferred nodes are available will non-preferred nodes be chosen. You can prefer instead of require that pods are deployed to us-central1-a by slightly changing the pod spec to use preferredDuringSchedulingIgnoredDuringExecution:

affinity:

nodeAffinity:

preferredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

- matchExpressions:

- key: "failure-domain.beta.kubernetes.io/zone"

operator: In

values: ["us-central1-a"]

Node anti-affinity can be achieved by using negative operators. So for instance if we want our pods to avoid us-central1-a we can do this:

affinity:

nodeAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

nodeSelectorTerms:

- matchExpressions:

- key: "failure-domain.beta.kubernetes.io/zone"

operator: NotIn

values: ["us-central1-a"]

Valid operators you can use are In, NotIn, Exists, DoesNotExist. Gt, and Lt.

Additional use cases for this feature are to restrict scheduling based on nodes’ hardware architecture, operating system version, or specialized hardware. Node affinity/anti-affinity is beta in Kubernetes 1.6.

Taints and Tolerations

A related feature is “taints and tolerations,” which allows you to mark (“taint”) a node so that no pods can schedule onto it unless a pod explicitly “tolerates” the taint. Marking nodes instead of pods (as in node affinity/anti-affinity) is particularly useful for situations where most pods in the cluster should avoid scheduling onto the node. For example, you might want to mark your master node as schedulable only by Kubernetes system components, or dedicate a set of nodes to a particular group of users, or keep regular pods away from nodes that have special hardware so as to leave room for pods that need the special hardware.

The kubectl command allows you to set taints on nodes, for example:

kubectl taint nodes node1 key=value:NoSchedule

creates a taint that marks the node as unschedulable by any pods that do not have a toleration for taint with key key, value value, and effect NoSchedule. (The other taint effects are PreferNoSchedule, which is the preferred version of NoSchedule, and NoExecute, which means any pods that are running on the node when the taint is applied will be evicted unless they tolerate the taint.) The toleration you would add to a PodSpec to have the corresponding pod tolerate this taint would look like this

tolerations:

- key: "key"

operator: "Equal"

value: "value"

effect: "NoSchedule"

In addition to moving taints and tolerations to beta in Kubernetes 1.6, we have introduced an alpha feature that uses taints and tolerations to allow you to customize how long a pod stays bound to a node when the node experiences a problem like a network partition instead of using the default five minutes. See this section of the documentation for more details.

Pod Affinity/Anti-Affinity

Node affinity/anti-affinity allows you to constrain which nodes a pod can run on based on the nodes’ labels. But what if you want to specify rules about how pods should be placed relative to one another, for example to spread or pack pods within a service or relative to pods in other services? For that you can use pod affinity/anti-affinity, which is also beta in Kubernetes 1.6.

Let’s look at an example. Say you have front-ends in service S1, and they communicate frequently with back-ends that are in service S2 (a “north-south” communication pattern). So you want these two services to be co-located in the same cloud provider zone, but you don’t want to have to choose the zone manually--if the zone fails, you want the pods to be rescheduled to another (single) zone. You can specify this with a pod affinity rule that looks like this (assuming you give the pods of this service a label “service=S2” and the pods of the other service a label “service=S1”):

affinity:

podAffinity:

requiredDuringSchedulingIgnoredDuringExecution:

- labelSelector:

matchExpressions:

- key: service

operator: In

values: [“S1”]

topologyKey: failure-domain.beta.kubernetes.io/zone

As with node affinity/anti-affinity, there is also a preferredDuringSchedulingIgnoredDuringExecution variant.

Pod affinity/anti-affinity is very flexible. Imagine you have profiled the performance of your services and found that containers from service S1 interfere with containers from service S2 when they share the same node, perhaps due to cache interference effects or saturating the network link. Or maybe due to security concerns you never want containers of S1 and S2 to share a node. To implement these rules, just make two changes to the snippet above -- change podAffinity to podAntiAffinity and change topologyKey to kubernetes.io/hostname.

Custom Schedulers

If the Kubernetes scheduler’s various features don’t give you enough control over the scheduling of your workloads, you can delegate responsibility for scheduling arbitrary subsets of pods to your own custom scheduler(s) that run(s) alongside, or instead of, the default Kubernetes scheduler. Multiple schedulers is beta in Kubernetes 1.6.

Each new pod is normally scheduled by the default scheduler. But if you provide the name of your own custom scheduler, the default scheduler will ignore that Pod and allow your scheduler to schedule the Pod to a node. Let’s look at an example.

Here we have a Pod where we specify the schedulerName field:

apiVersion: v1

kind: Pod

metadata:

labels:

app: nginx

spec:

schedulerName: my-scheduler

containers:

- name: nginx

image: nginx:1.10

If we create this Pod without deploying a custom scheduler, the default scheduler will ignore it and it will remain in a Pending state. So we need a custom scheduler that looks for, and schedules, pods whose schedulerName field is my-scheduler.

A custom scheduler can be written in any language and can be as simple or complex as you need. Here is a very simple example of a custom scheduler written in Bash that assigns a node randomly. Note that you need to run this along with kubectl proxy for it to work.

#!/bin/bash

SERVER='localhost:8001'

while true;

;

NODES=($(kubectl --server $SERVER get nodes -o json | jq '.items[].metadata.name' | tr -d '"'))

NUMNODES=${#NODES[@]}

CHOSEN=${NODES[$[ $RANDOM % $NUMNODES ]]}

curl --header "Content-Type:application/json" --request POST --data '{"apiVersion":"v1", "kind": "Binding", "metadata": {"name": "'$PODNAME'"}, "target": {"apiVersion": "v1", "kind"

: "Node", "name": "'$CHOSEN'"}}' http://$SERVER/api/v1/namespaces/default/pods/$PODNAME/binding/

echo "Assigned $PODNAME to $CHOSEN"

done

sleep 1

done

Learn more

The Kubernetes 1.6 release notes have more information about these features, including details about how to change your configurations if you are already using the alpha version of one or more of these features (this is required, as the move from alpha to beta is a breaking change for these features).

Acknowledgements

The features described here, both in their alpha and beta forms, were a true community effort, involving engineers from Google, Huawei, IBM, Red Hat and more.

Get Involved

Share your voice at our weekly community meeting:

Post questions (or answer questions) on Stack Overflow
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack (room #sig-scheduling)

Many thanks for your contributions.

--Ian Lewis, Developer Advocate, and David Oppenheimer, Software Engineer, Google

Editor’s note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.6

Many users have existing domain name zones that they would like to integrate into their Kubernetes DNS namespace. For example, hybrid-cloud users may want to resolve their internal “.corp” domain addresses within the cluster. Other users may have a zone populated by a non-Kubernetes service discovery system (like Consul). We’re pleased to announce that, in Kubernetes 1.6, kube-dns adds support for configurable private DNS zones (often called “stub domains”) and external upstream DNS nameservers. In this blog post, we describe how to configure and use this feature.

Default lookup flow

Kubernetes currently supports two DNS policies specified on a per-pod basis using the dnsPolicy flag: “Default” and “ClusterFirst”. If dnsPolicy is not explicitly specified, then “ClusterFirst” is used:

If dnsPolicy is set to “Default”, then the name resolution configuration is inherited from the node the pods run on. Note: this feature cannot be used in conjunction with dnsPolicy: “Default”.
If dnsPolicy is set to “ClusterFirst”, then DNS queries will be sent to the kube-dns service. Queries for domains rooted in the configured cluster domain suffix (any address ending in “.cluster.local” in the example above) will be answered by the kube-dns service. All other queries (for example, www.kubernetes.io) will be forwarded to the upstream nameserver inherited from the node.

Before this feature, it was common to introduce stub domains by replacing the upstream DNS with a custom resolver. However, this caused the custom resolver itself to become a critical path for DNS resolution, where issues with scalability and availability may cause the cluster to lose DNS functionality. This feature allows the user to introduce custom resolution without taking over the entire resolution path.

Customizing the DNS Flow

Beginning in Kubernetes 1.6, cluster administrators can specify custom stub domains and upstream nameservers by providing a ConfigMap for kube-dns. For example, the configuration below inserts a single stub domain and two upstream nameservers. As specified, DNS requests with the “.acme.local” suffix will be forwarded to a DNS listening at 1.2.3.4. Additionally, Google Public DNS will serve upstream queries. See ConfigMap Configuration Notes at the end of this section for a few notes about the data format.

apiVersion: v1

kind: ConfigMap

metadata:

namespace: kube-system

data:

stubDomains: |

{“acme.local”: [“1.2.3.4”]}

upstreamNameservers: |

[“8.8.8.8”, “8.8.4.4”]

The diagram below shows the flow of DNS queries specified in the configuration above. With the dnsPolicy set to “ClusterFirst” a DNS query is first sent to the DNS caching layer in kube-dns. From here, the suffix of the request is examined and then forwarded to the appropriate DNS. In this case, names with the cluster suffix (e.g.; “.cluster.local”) are sent to kube-dns. Names with the stub domain suffix (e.g.; “.acme.local”) will be sent to the configured custom resolver. Finally, requests that do not match any of those suffixes will be forwarded to the upstream DNS.

Below is a table of example domain names and the destination of the queries for those domain names:

Domain name	Server answering the query
kubernetes.default.svc.cluster.local	kube-dns
foo.acme.local	custom DNS (1.2.3.4)
widget.com	upstream DNS (one of 8.8.8.8, 8.8.4.4)

ConfigMap Configuration Notes

stubDomains (optional)

Format: a JSON map using a DNS suffix key (e.g.; “acme.local”) and a value consisting of a JSON array of DNS IPs.
Note: The target nameserver may itself be a Kubernetes service. For instance, you can run your own copy of dnsmasq to export custom DNS names into the ClusterDNS namespace.

upstreamNameservers (optional)

Format: a JSON array of DNS IPs.
Note: If specified, then the values specified replace the nameservers taken by default from the node’s /etc/resolv.conf
Limits: a maximum of three upstream nameservers can be specified

Example #1: Adding a Consul DNS Stub Domain

In this example, the user has Consul DNS service discovery system they wish to integrate with kube-dns. The consul domain server is located at 10.150.0.1, and all consul names have the suffix “.consul.local”. To configure Kubernetes, the cluster administrator simply creates a ConfigMap object as shown below. Note: in this example, the cluster administrator did not wish to override the node’s upstream nameservers, so they didn’t need to specify the optional upstreamNameservers field.

apiVersion: v1

kind: ConfigMap

metadata:

namespace: kube-system

data:

stubDomains: |

{“consul.local”: [“10.150.0.1”]}

Example #2: Replacing the Upstream Nameservers

In this example the cluster administrator wants to explicitly force all non-cluster DNS lookups to go through their own nameserver at 172.16.0.1. Again, this is easy to accomplish; they just need to create a ConfigMap with the upstreamNameservers field specifying the desired nameserver.

apiVersion: v1

kind: ConfigMap

metadata:

namespace: kube-system

data:

upstreamNameservers: |

[“172.16.0.1”]

Get involved

If you’d like to contribute or simply help provide feedback and drive the roadmap, join our community. Specifically for network related conversations participate though one of these channels:

Chat with us on the Kubernetes Slack network channel
Join our Special Interest Group, SIG-Network, which meets on Tuesdays at 14:00 PT

Thanks for your support and contributions. Read more in-depth posts on what's new in Kubernetes 1.6 here.

--Bowei Du, Software Engineer and Matthew DeLio, Product Manager, Google

Post questions (or answer questions) on Stack Overflow

Join the community portal for advocates on K8sPort

Get involved with the Kubernetes project on GitHub

Connect with the community on Slack

Download Kubernetes

Editor’s note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.6

One of the highlights of the Kubernetes 1.6 release is the RBAC authorizer feature moving to beta. RBAC, Role-based access control, is an an authorization mechanism for managing permissions around Kubernetes resources. RBAC allows configuration of flexible authorization policies that can be updated without cluster restarts.

The focus of this post is to highlight some of the interesting new capabilities and best practices.

RBAC vs ABAC

Currently there are several authorization mechanisms available for use with Kubernetes. Authorizers are the mechanisms that decide who is permitted to make what changes to the cluster using the Kubernetes API. This affects things like kubectl, system components, and also certain applications that run in the cluster and manipulate the state of the cluster, like Jenkins with the Kubernetes plugin, or Helm that runs in the cluster and uses the Kubernetes API to install applications in the cluster. Out of the available authorization mechanisms, ABAC and RBAC are the mechanisms local to a Kubernetes cluster that allow configurable permissions policies.

ABAC, Attribute Based Access Control, is a powerful concept. However, as implemented in Kubernetes, ABAC is difficult to manage and understand. It requires ssh and root filesystem access on the master VM of the cluster to make authorization policy changes. For permission changes to take effect the cluster API server must be restarted.

RBAC permission policies are configured using kubectl or the Kubernetes API directly. Users can be authorized to make authorization policy changes using RBAC itself, making it possible to delegate resource management without giving away ssh access to the cluster master. RBAC policies map easily to the resources and operations used in the Kubernetes API.

Based on where the Kubernetes community is focusing their development efforts, going forward RBAC should be preferred over ABAC.

Basic Concepts

The are a few basic ideas behind RBAC that are foundational in understanding it. At its core, RBAC is a way of granting users granular access to Kubernetes API resources.

The connection between user and resources is defined in RBAC using two objects.

Roles
A Role is a collection of permissions. For example, a role could be defined to include read permission on pods and list permission for pods. A ClusterRole is just like a Role, but can be used anywhere in the cluster.

Role Bindings
A RoleBinding maps a Role to a user or set of users, granting that Role's permissions to those users for resources in that namespace. A ClusterRoleBinding allows users to be granted a ClusterRole for authorization across the entire cluster.

Additionally there are cluster roles and cluster role bindings to consider. Cluster roles and cluster role bindings function like roles and role bindings except they have wider scope. The exact differences and how cluster roles and cluster role bindings interact with roles and role bindings are covered in the Kubernetes documentation.

RBAC in Kubernetes

RBAC is now deeply integrated into Kubernetes and used by the system components to grant the permissions necessary for them to function. System roles are typically prefixed with system: so they can be easily recognized.

➜ kubectl get clusterroles --namespace=kube-system

NAME KIND

admin ClusterRole.v1beta1.rbac.authorization.k8s.io

cluster-admin ClusterRole.v1beta1.rbac.authorization.k8s.io

edit ClusterRole.v1beta1.rbac.authorization.k8s.io

kubelet-api-admin ClusterRole.v1beta1.rbac.authorization.k8s.io

system:auth-delegator ClusterRole.v1beta1.rbac.authorization.k8s.io

system:basic-user ClusterRole.v1beta1.rbac.authorization.k8s.io

system:controller:attachdetach-controller ClusterRole.v1beta1.rbac.authorization.k8s.io

system:controller:certificate-controller ClusterRole.v1beta1.rbac.authorization.k8s.io

...

The RBAC system roles have been expanded to cover the necessary permissions for running a Kubernetes cluster with RBAC only.

During the permission translation from ABAC to RBAC, some of the permissions that were enabled by default in many deployments of ABAC authorized clusters were identified as unnecessarily broad and were scoped down in RBAC. The area most likely to impact workloads on a cluster is the permissions available to service accounts. With the permissive ABAC configuration, requests from a pod using the pod mounted token to authenticate to the API server have broad authorization. As a concrete example, the curl command at the end of this sequence will return a JSON formatted result when ABAC is enabled and an error when only RBAC is enabled.

➜ kubectl run nginx --image=nginx:latest

➜ kubectl exec -it $(kubectl get pods -o jsonpath='{.items[0].metadata.name}') bash

➜ apt-get update && apt-get install -y curl

➜ curl -ik \

-H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \

https://kubernetes/api/v1/namespaces/default/pods

Any applications you run in your Kubernetes cluster that interact with the Kubernetes API have the potential to be affected by the permissions changes when transitioning from ABAC to RBAC.

To smooth the transition from ABAC to RBAC, you can create Kubernetes 1.6 clusters with both ABAC and RBAC authorizers enabled. When both ABAC and RBAC are enabled, authorization for a resource is granted if either authorization policy grants access. However, under that configuration the most permissive authorizer is used and it will not be possible to use RBAC to fully control permissions.

At this point, RBAC is complete enough that ABAC support should be considered deprecated going forward. It will still remain in Kubernetes for the foreseeable future but development attention is focused on RBAC.

Two different talks at the at the Google Cloud Next conference touched on RBAC related changes in Kubernetes 1.6, jump to the relevant parts here and here. For more detailed information about using RBAC in Kubernetes 1.6 read the full RBAC documentation.

Get Involved

If you’d like to contribute or simply help provide feedback and drive the roadmap, join our community. Specifically interested in security and RBAC related conversation, participate through one of these channels:

Chat with us on the Kubernetes Slack sig-auth channel
Join the biweekly SIG-Auth meetings on Wednesday at 11:00 AM PT

Thanks for your support and contributions. Read more in-depth posts on what's new in Kubernetes 1.6 here.

-- Jacob Simpson, Greg Castle & CJ Cullen, Software Engineers at Google

Post questions (or answer questions) on Stack Overflow

Join the community portal for advocates on K8sPort

Get involved with the Kubernetes project on GitHub

Connect with the community on Slack

Download Kubernetes

Editor's Note: Today’s post is by Daniel Hoelbling-Inzko, Infrastructure Architect at Bitmovin, a company that provides services that transcode digital video and audio to streaming formats, sharing insights about their use of Kubernetes.

Running a large scale video encoding infrastructure on multiple public clouds is tough. At Bitmovin, we have been doing it successfully for the last few years, but from an engineering perspective, it’s neither been enjoyable nor particularly fun.

So obviously, one of the main things that really sold us on using Kubernetes, was it’s common abstraction from the different supported cloud providers and the well thought out programming interface it provides. More importantly, the Kubernetes project did not settle for the lowest common denominator approach. Instead, they added the necessary abstract concepts that are required and useful to run containerized workloads in a cloud and then did all the hard work to map these concepts to the different cloud providers and their offerings.

The great stability, speed and operational reliability we saw in our early tests in mid-2016 made the migration to Kubernetes a no-brainer.

And, it didn’t hurt that the vision for scale the Kubernetes project has been pursuing is closely aligned with our own goals as a company. Aiming for >1,000 node clusters might be a lofty goal, but for a fast growing video company like ours, having your infrastructure aim to support future growth is essential. Also, after initial brainstorming for our new infrastructure, we immediately knew that we would be running a huge number of containers and having a system, with the expressed goal of working at global scale, was the perfect fit for us. Now with the recent Kubernetes 1.6 release and its support for 5,000 node clusters, we feel even more validated in our choice of a container orchestration system.

During the testing and migration phase of getting our infrastructure running on Kubernetes, we got quite familiar with the Kubernetes API and the whole ecosystem around it. So when we were looking at expanding our cloud video encoding offering for customers to use in their own datacenters or cloud environments, we quickly decided to leverage Kubernetes as our ubiquitous cloud operating system to base the solution on.

Just a few months later this effort has become our newest service offering: Bitmovin Managed On-Premise encoding. Since all Kubernetes clusters share the same API, adapting our cloud encoding service to also run on Kubernetes enabled us to deploy into our customer’s datacenter, regardless of the hardware infrastructure running underneath. With great tools from the community, like kube-up and turnkey solutions, like Google Container Engine, anyone can easily provision a new Kubernetes cluster, either within their own infrastructure or in their own cloud accounts.

To give us the maximum flexibility for customers that deploy to bare metal and might not have any custom cloud integrations for Kubernetes yet, we decided to base our solution solely on facilities that are available in any Kubernetes install and don’t require any integration into the surrounding infrastructure (it will even run inside Minikube!). We don’t rely on Services of type LoadBalancer, primarily because enterprise IT is usually reluctant to open up ports to the open internet - and not every bare metal Kubernetes install supports externally provisioned load balancers out of the box. To avoid these issues, we deploy a BitmovinAgent that runs inside the Cluster and polls our API for new encoding jobs without requiring any network setup. This agent then uses the locally available Kubernetes credentials to start up new deployments that run the encoders on the available hardware through the Kubernetes API.

Even without having a full cloud integration available, the consistent scheduling, health checking and monitoring we get from using the Kubernetes API really enabled us to focus on making the encoder work inside a container rather than spending precious engineering resources on integrating a bunch of different hypervisors, machine provisioners and monitoring systems.

Multi-Stage Canary Deployments

Our first encounters with the Kubernetes API were not for the On-Premise encoding product. Building our containerized encoding workflow on Kubernetes was rather a decision we made after seeing how incredibly easy and powerful the Kubernetes platform proved during development and rollout of our Bitmovin API infrastructure. We migrated to Kubernetes around four months ago and it has enabled us to provide rapid development iterations to our service while meeting our requirements of downtime-free deployments and a stable development to production pipeline. To achieve this we came up with an architecture that runs almost a thousand containers and meets the following requirements we had laid out on day one:

Zero downtime deployments for our customers
Continuous deployment to production on each git mainline push
High stability of deployed services for customers

Obviously #2 and #3 are at odds with each other, if each merged feature gets deployed to production right away - how can we ensure these releases are bug-free and don’t have adverse side effects for our customers?

To overcome this oxymoron, we came up with a four-stage canary pipeline for each microservice where we simultaneously deploy to production and keep changes away from customers until the new build has proven to work reliably and correctly in the production environment.

Once a new build is pushed, we deploy it to an internal stage that’s only accessible for our internal tests and the integration test suite. Once the internal test suite passes, QA reports no issues, and we don’t detect any abnormal behavior, we push the new build to our free stage. This means that 5% of our free users would get randomly assigned to this new build. After some time in this stage the build gets promoted to the next stage that gets 5% of our paid users routed to it. Only once the build has successfully passed all 3 of these hurdles, does it get deployed to the production tier, where it will receive all traffic from our remaining users as well as our enterprise customers, which are not part of the paid bucket and never see their traffic routed to a canary track.

This setup makes us a pretty big Kubernetes installation by default, since all of our canary tiers are available at a minimum replication of 2. Since we are currently deploying around 30 microservices (and growing) to our clusters, it adds up to a minimum of 10 pods per service (8 application pods + minimum 2 HAProxy pods that do the canary routing). Although, in reality our preferred standard configuration is usually running 2 internal, 4 free, 4 others and 10 production pods alongside 4 HAProxy pods - totalling around 700 pods in total. This also means that we are running at least 150 services that provide a static ClusterIP to their underlying microservice canary tier.

A typical deployment looks like this:

Services (ClusterIP)	Deployments	#Pods
account-service	account-service-haproxy	4
account-service-internal	account-service-internal-v1.18.0	2
account-service-canary	account-service-canary-v1.17.0	4
account-service-paid	account-service-paid-v1.15.0	4
account-service-production	account-service-production-v1.15.0	10

An example service definition the production track will have the following label selectors:

apiVersion: v1

kind:Service

metadata:

labels:

app: account-service-production

tier: service

lb:private

spec:

ports:

- port:8080

targetPort:8080

protocol: TCP

selector:

app: account-service

tier: service

track: production

In front of the Kubernetes services, load balancing the different canary versions of the service, lives a small cluster of HAProxy pods that get their haproxy.conf from the Kubernetes ConfigMaps that looks something like this:

frontend http-in

bind *:80

log 127.0.0.1 local2 debug

acl traffic_internal hdr(X-Traffic-Group)-m str -i INTERNAL

acl traffic_free hdr(X-Traffic-Group)-m str -i FREE

acl traffic_enterprise hdr(X-Traffic-Group)-m str -i ENTERPRISE

use_backend internal if traffic_internal

use_backend canary if traffic_free

use_backend enterprise if traffic_enterprise

default_backend paid

backend internal

balance roundrobin

server internal-lb user-resource-service-internal:8080 resolvers dns check inter 2000

backend canary

balance roundrobin

server canary-lb user-resource-service-canary:8080 resolvers dns check inter 2000 weight 5

server production-lb user-resource-service-production:8080 resolvers dns check inter 2000 weight 95

backend paid

balance roundrobin

server canary-paid-lb user-resource-service-paid:8080 resolvers dns check inter 2000 weight 5

server production-lb user-resource-service-production:8080 resolvers dns check inter 2000 weight 95

backend enterprise

balance roundrobin

server production-lb user-resource-service-production:8080 resolvers dns check inter 2000 weight 100

Each HAProxy will inspect a header that gets assigned by our API-Gateway called X-Traffic-Group that determines which bucket of customers this request belongs to. Based on that, a decision is made to hit either a canary deployment or the production deployment.

Obviously, at this scale, kubectl (while still our main day-to-day tool to work on the cluster) doesn’t really give us a good overview of whether everything is actually running as it’s supposed to and what is maybe over or under replicated.

Since we do blue/green deployments, we sometimes forget to shut down the old version after the new one comes up, so some services might be running over replicated and finding these issues in a soup of 25 deployments listed in kubectl is not trivial, to say the least.

So, having a container orchestrator like Kubernetes, that’s very API driven, was really a godsend for us, as it allowed us to write tools that take care of that.

We built tools that either run directly off kubectl (eg bash-scripts) or interact directly with the API and understand our special architecture to give us a quick overview of the system. These tools were mostly built in Go using the client-go library.

One of these tools is worth highlighting, as it’s basically our only way to really see service health at a glance. It goes through all our Kubernetes services that have the tier: service selector and checks if the accompanying HAProxy deployment is available and all pods are running with 4 replicas. It also checks if the 4 services behind the HAProxys (internal, free, others and production) have at least 2 endpoints running. If any of these conditions are not met, we immediately get a notification in Slack and by email.

Managing this many pods with our previous orchestrator proved very unreliable and the overlay network frequently caused issues. Not so with Kubernetes - even doubling our current workload for test purposes worked flawlessly and in general, the cluster has been working like clockwork ever since we installed it.

Another advantage of switching over to Kubernetes was the availability of the kubernetes resource specifications, in addition to the API (which we used to write some internal tools for deployment). This enabled us to have a Git repo with all our Kubernetes specifications, where each track is generated off a common template and only contains placeholders for variable things like the canary track and the names.

All changes to the cluster have to go through tools that modify these resource specifications and get checked into git automatically so, whenever we see issues, we can debug what changes the infrastructure went through over time!

To summarize this post - by migrating our infrastructure to Kubernetes, Bitmovin is able to have:

Zero downtime deployments, allowing our customers to encode 24/7 without interruption
Fast development to production cycles, enabling us to ship new features faster
Multiple levels of quality assurance and high confidence in production deployments
Ubiquitous abstractions across cloud architectures and on-premise deployments
Stable and reliable health-checking and scheduling of services
Custom tooling around our infrastructure to check and validate the system
History of deployments (resource specifications in git + custom tooling)

We want to thank the Kubernetes community for the incredible job they have done with the project. The velocity at which the project moves is just breathtaking! Maintaining such a high level of quality and robustness in such a diverse environment is really astonishing.

--Daniel Hoelbling-Inzko, Infrastructure Architect, Bitmovin

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Get involved with the Kubernetes project on GitHub
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack
Download Kubernetes

Editor's note: Today’s post is by Jess Frazelle of Google and Brandon Philips of CoreOS about the Kubernetes security disclosures and response policy.

Software running on servers underpins ever growing amounts of the world's commerce, communications, and physical infrastructure. And nearly all of these systems are connected to the internet; which means vital security updates must be applied rapidly. As software developers and IT professionals, we often find ourselves dancing on the edge of a volcano: we may either fall into magma induced oblivion from a security vulnerability exploited before we can fix it, or we may slide off the side of the mountain because of an inadequate process to address security vulnerabilities.

The Kubernetes community believes that we can help teams restore their footing on this volcano with a foundation built on Kubernetes. And the bedrock of this foundation requires a process for quickly acknowledging, patching, and releasing security updates to an ever growing community of Kubernetes users.

With over 1,200 contributors and over a million lines of code, each release of Kubernetes is a massive undertaking staffed by brave volunteer release managers. These normal releases are fully transparent and the process happens in public. However, security releases must be handled differently to keep potential attackers in the dark until a fix is made available to users.

We drew inspiration from other open source projects in order to create the Kubernetes security release process. Unlike a regularly scheduled release, a security release must be delivered on an accelerated schedule, and we created the Product Security Team to handle this process.

This team quickly selects a lead to coordinate work and manage communication with the persons that disclosed the vulnerability and the Kubernetes community. The security release process also documents ways to measure vulnerability severity using the Common Vulnerability Scoring System (CVSS) Version 3.0 Calculator. This calculation helps inform decisions on release cadence in the face of holidays or limited developer bandwidth. By making severity criteria transparent we are able to better set expectations and hit critical timelines during an incident where we strive to:

Respond to the person or team who reported the vulnerability and staff a development team responsible for a fix within 24 hours
Disclose a forthcoming fix to users within 7 days of disclosure
Provide advance notice to vendors within 14 days of disclosure
Release a fix within 21 days of disclosure

As we continue to harden Kubernetes, the security release process will help ensure that Kubernetes remains a secure platform for internet scale computing. If you are interested in learning more about the security release process please watch the presentation from KubeCon Europe 2017 on YouTube and follow along with the slides. If you are interested in learning more about authentication and authorization in Kubernetes, along with the Kubernetes cluster security model, consider joining Kubernetes SIG Auth. We also hope to see you at security related presentations and panels at the next Kubernetes community event: CoreOS Fest 2017 in San Francisco on May 31 and June 1.

As a thank you to the Kubernetes community, a special 25 percent discount to CoreOS Fest is available using k8s25code or via this special 25 percent off link to register today for CoreOS Fest 2017.

--Brandon Philips of CoreOS and Jess Frazelle of Google

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack
Get involved with the Kubernetes project on GitHub

Today’s guest post is by Rob Hirschfeld, co-founder of open infrastructure automation project, Digital Rebar and co-chair of the SIG Cluster Ops.

Why Kargo?

Making Kubernetes operationally strong is a widely held priority and I track many deployment efforts around the project. The incubated Kargo project is of particular interest for me because it uses the popular Ansible toolset to build robust, upgradable clusters on both cloud and physical targets. I believe using tools familiar to operators grows our community.

We’re excited to see the breadth of platforms enabled by Kargo and how well it handles a wide range of options like integrating Ceph for StatefulSet persistence and Helm for easier application uploads. Those additions have allowed us to fully integrate the OpenStack Helm charts (demo video).

By working with the upstream source instead of creating different install scripts, we get the benefits of a larger community. This requires some extra development effort; however, we believe helping share operational practices makes the whole community stronger. That was also the motivation behind the SIG-Cluster Ops.

With Kargo delivering robust installs, we can focus on broader operational concerns.

For example, we can now drive parallel deployments, so it’s possible to fully exercise the options enabled by Kargo simultaneously for development and testing.

That’s helpful to built-test-destroy coordinated Kubernetes installs on CentOS, Red Hat and Ubuntu as part of an automation pipeline. We can also set up a full classroom environment from a single command using Digital Rebar’s providers, tenants and cluster definition JSON.

Let’s explore the classroom example:

First, we define a student cluster in JSON like the snippet below

{

"attribs": {

"k8s-version": "v1.6.0",

"k8s-kube_network_plugin": "calico",

"k8s-docker_version": "1.12"

"name": "cluster01",

"tenant": "cluster01",

"public_keys": {

"cluster01": "ssh-rsa AAAAB..... user@example.com"

"provider": {

"name": "google-provider"

"nodes": [

{

"roles": [ "etcd","k8s-addons", "k8s-master" ],

"count": 1

{

"roles": [ "k8s-worker" ],

"count": 3

}

]

}

Then we run the Digital Rebar workloads Multideploy.sh reference script which inspects the deployment files to pull out key information. Basically, it automates the following steps:

rebar provider create {“name”:“google-provider”, [secret stuff]}

rebar tenants create {“name”:“cluster01”}

rebar deployments create [contents from cluster01 file]

The deployments create command will automatically request nodes from the provider. Since we’re using tenants and SSH key additions, each student only gets access to their own cluster. When we’re done, adding the --destroy flag will reverse the process for the nodes and deployments but leave the providers and tenants.

We are invested in operational scripts like this example using Kargo and Digital Rebar because if we cannot manage variation in a consistent way then we’re doomed to operational fragmentation.

I am excited to see and be part of the community progress towards enterprise-ready Kubernetes operations on both cloud and on-premises. That means I am seeing reasonable patterns emerge with sharable/reusable automation. I strongly recommend watching (or better, collaborating in) these efforts if you are deploying Kubernetes even at experimental scale. Being part of the community requires more upfront effort but returns dividends as you get the benefits of shared experience and improvement.

When deploying at scale, how do you set up a system to be both repeatable and multi-platform without compromising scale or security?

With Kargo and Digital Rebar as a repeatable base, extensions get much faster and easier. Even better, using upstream directly allows improvements to be quickly cycled back into upstream. That means we’re closer to building a community focused on the operational side of Kubernetes with an SRE mindset.

If this is interesting, please engage with us in the Cluster Ops SIG, Kargo or Digital Rebar communities.

-- Rob Hirschfeld, co-founder of RackN and co-chair of the Cluster Ops SIG

Get involved with the Kubernetes project on GitHub

Post questions (or answer questions) on Stack Overflow

Connect with the community on Slack

Today’s post is by Jean-Mathieu Saponaro, Research & Analytics Engineer at Datadog, discussing what Kubernetes changes for monitoring, and how you can prepare to properly monitor a containerized infrastructure orchestrated by Kubernetes.

Container technologies are taking the infrastructure world by storm. While containers solve or simplify infrastructure management processes, they also introduce significant complexity in terms of orchestration. That’s where Kubernetes comes to our rescue. Just like a conductor directs an orchestra, Kubernetes oversees our ensemble of containers—starting, stopping, creating, and destroying them automatically to keep our applications humming along.

Kubernetes makes managing a containerized infrastructure much easier by creating levels of abstractions such as pods and services. We no longer have to worry about where applications are running or if they have enough resources to work properly. But that doesn’t change the fact that, in order to ensure good performance, we need to monitor our applications, the containers running them, and Kubernetes itself.

Rethinking monitoring for the Kubernetes era

Just as containers have completely transformed how we think about running services on virtual machines, Kubernetes has changed the way we interact with containers. The good news is that with proper monitoring, the abstraction levels inherent to Kubernetes provide a comprehensive view of your infrastructure, even if the containers and applications are constantly moving. But Kubernetes monitoring requires us to rethink and reorient our strategies, since it differs from monitoring traditional hosts such as VMs or physical machines in several ways.

Tags and labels become essential
With containers and their orchestration completely managed by Kubernetes, labels are now the only way we have to interact with pods and containers. That’s why they are absolutely crucial for monitoring since all metrics and events will be sliced and diced using labels across the different layers of your infrastructure. Defining your labels with a logical and easy-to-understand schema is essential so your metrics will be as useful as possible.

There are now more components to monitor

In traditional, host-centric infrastructure, we were used to monitoring only two layers: applications and the hosts running them. Now with containers in the middle and Kubernetes itself needing to be monitored, there are four different components to monitor and collect metrics from.

Applications are constantly moving
Kubernetes schedules applications dynamically based on scheduling policy, so you don’t always know where applications are running. But they still need to be monitored. That’s why using a monitoring system or tool with service discovery is a must. It will automatically adapt metric collection to moving containers so applications can be continuously monitored without interruption.

Be prepared for distributed clusters

Kubernetes has the ability to distribute containerized applications across multiple data centers and potentially different cloud providers. That means metrics must be collected and aggregated among all these different sources.

For more details about all these new monitoring challenges inherent to Kubernetes and how to overcome them, we recently published an in-depth Kubernetes monitoring guide. Part 1 of the series covers how to adapt your monitoring strategies to the Kubernetes era.

Metrics to monitor

Whether you use Heapster data or a monitoring tool integrating with Kubernetes and its different APIs, there are several key types of metrics that need to be closely tracked:

Running pods and their deployments
Usual resource metrics such as CPU, memory usage, and disk I/O
Container-native metrics
Application metrics for which a service discovery feature in your monitoring tool is essential

All these metrics should be aggregated using Kubernetes labels and correlated with events from Kubernetes and container technologies.

Part 2 of our series on Kubernetes monitoring guides you through all the data that needs to be collected and tracked.

Collecting these metrics

Whether you want to track these key performance metrics by combining Heapster, a storage backend, and a graphing tool, or by integrating a monitoring tool with the different components of your infrastructure, Part 3, about Kubernetes metric collection, has you covered.

Anchors aweigh!

Using Kubernetes drastically simplifies container management. But it requires us to rethink our monitoring strategies on several fronts, and to make sure all the key metrics from the different components are properly collected, aggregated, and tracked. We hope our monitoring guide will help you to effectively monitor your Kubernetes clusters. Feedback and suggestions are more than welcome.

--Jean-Mathieu Saponaro, Research & Analytics Engineer, Datadog

Get involved with the Kubernetes project on GitHub
Post questions (or answer questions) on Stack Overflow
Connect with the community on Slack
Follow us on Twitter @Kubernetesio for latest updates

Today’s post is by the Istio team showing how you can get visibility, resiliency, security and control for your microservices in Kubernetes.

Services are at the core of modern software architecture. Deploying a series of modular, small (micro-)services rather than big monoliths gives developers the flexibility to work in different languages, technologies and release cadence across the system; resulting in higher productivity and velocity, especially for larger teams.

With the adoption of microservices, however, new problems emerge due to the sheer number of services that exist in a larger system. Problems that had to be solved once for a monolith, like security, load balancing, monitoring, and rate limiting need to be handled for each service.

Kubernetes and Services

Kubernetes supports a microservices architecture through the Service construct. It allows developers to abstract away the functionality of a set of Pods, and expose it to other developers through a well-defined API. It allows adding a name to this level of abstraction and perform rudimentary L4 load balancing. But it doesn’t help with higher-level problems, such as L7 metrics, traffic splitting, rate limiting, circuit breaking, etc.

Istio, announced last week at GlueCon 2017, addresses these problems in a fundamental way through a service mesh framework. With Istio, developers can implement the core logic for the microservices, and let the framework take care of the rest – traffic management, discovery, service identity and security, and policy enforcement. Better yet, this can be also done for existing microservices without rewriting or recompiling any of their parts. Istio uses Envoy as its runtime proxy component and provides an extensible intermediation layer which allows global cross-cutting policy enforcement and telemetry collection.

The current release of Istio is targeted to Kubernetes users and is packaged in a way that you can install in a few lines and get visibility, resiliency, security and control for your microservices in Kubernetes out of the box.

In a series of blog posts, we'll look at a simple application that is composed of 4 separate microservices. We'll start by looking at how the application can be deployed using plain Kubernetes. We'll then deploy the exact same services into an Istio-enabled cluster without changing any of the application code -- and see how we can observe metrics.

In subsequent posts, we’ll focus on more advanced capabilities such as HTTP request routing, policy, identity and security management.

Example Application: BookInfo

We will use a simple application called BookInfo, that displays information, reviews and ratings for books in a store. The application is composed of four microservices written in different languages:

BookInfo-all (2).png

Since the container images for these microservices can all be found in Docker Hub, all we need to deploy this application in Kubernetes are the yaml configurations.

It’s worth noting that these services have no dependencies on Kubernetes and Istio, but make an interesting case study. Particularly, the multitude of services, languages and versions for the reviews service make it an interesting service mesh example. More information about this example can be found here.

Running the Bookinfo Application in Kubernetes

In this post we’ll focus on the v1 version of the app:

Deploying it with Kubernetes is straightforward, no different than deploying any other services. Service and Deployment resources for the productpage microservice looks like this:

apiVersion: v1

kind: Service

metadata:

labels:

app: productpage

spec:

type: NodePort

ports:

- port: 9080

selector:

app: productpage

---

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

spec:

replicas: 1

template:

metadata:

labels:

app: productpage

track: stable

spec:

containers:

- name: productpage

image: istio/examples-bookinfo-productpage-v1

imagePullPolicy: IfNotPresent

ports:

- containerPort: 9080

The other two services that we will need to deploy if we want to run the app are details and reviews-v1. We don’t need to deploy the ratings service at this time because v1 of the reviews service doesn’t use it. The remaining services follow essentially the same pattern as productpage. The yaml files for all services can be found here.

To run the services as an ordinary Kubernetes app:

kubectl apply -f bookinfo-v1.yaml

To access the application from outside the cluster we’ll need the NodePort address of the productpage service:

export BOOKINFO_URL=$(kubectl get po -l app=productpage -o jsonpath={.items[0].status.hostIP}):$(kubectl get svc productpage -o jsonpath={.spec.ports[0].nodePort})

We can now point the browser to http://$BOOKINFO_URL/productpage, and see:

Running the Bookinfo Application with Istio

Now that we’ve seen the app, we’ll adjust our deployment slightly to make it work with Istio. We first need to install Istio in our cluster. To see all of the metrics and tracing features in action, we also install the optional Prometheus, Grafana, and Zipkin addons. We can now delete the previous app and start the Bookinfo app again using the exact same yaml file, this time with Istio:

kubectl delete -f bookinfo-v1.yaml

kubectl apply -f <(istioctl kube-inject -f bookinfo-v1.yaml)

Notice that this time we use the istioctl kube-inject command to modify bookinfo-v1.yaml before creating the deployments. It injects the Envoy sidecar into the Kubernetes pods as documented here. Consequently, all of the microservices are packaged with an Envoy sidecar that manages incoming and outgoing traffic for the service.

In the Istio service mesh we will not want to access the application productpage directly, as we did in plain Kubernetes. Instead, we want an Envoy sidecar in the request path so that we can use Istio’s management features (version routing, circuit breakers, policies, etc.) to control external calls to productpage, just like we can for internal requests. Istio’s Ingress controller is used for this purpose.

To use the Istio Ingress controller, we need to create a Kubernetes Ingress resource for the app, annotated with kubernetes.io/ingress.class: "istio", like this:

cat <<EOF | kubectl create -f -

apiVersion: extensions/v1beta1

kind: Ingress

metadata:

annotations:

kubernetes.io/ingress.class: "istio"

spec:

rules:

- http:

paths:

- path: /productpage

backend:

serviceName: productpage

servicePort: 9080

- path: /login

backend:

serviceName: productpage

servicePort: 9080

- path: /logout

backend:

serviceName: productpage

servicePort: 9080

EOF

The resulting deployment with Istio and v1 version of the bookinfo app looks like this:

BookInfo-v1-Istio (5).png

This time we will access the app using the NodePort address of the Istio Ingress controller:

export BOOKINFO_URL=$(kubectl get po -l istio=ingress -o jsonpath={.items[0].status.hostIP}):$(kubectl get svc istio-ingress -o jsonpath={.spec.ports[0].nodePort})

We can now load the page at http://$BOOKINFO_URL/productpage and once again see the running app -- there should be no difference from the previous deployment without Istio for the user.

However, now that the application is running in the Istio service mesh, we can immediately start to see some benefits.

Metrics collection

The first thing we get from Istio out-of-the-box is the collection of metrics in Prometheus. These metrics are generated by the Istio filter in Envoy, collected according to default rules (which can be customized), and then sent to Prometheus. The metrics can be visualized in the Istio dashboard in Grafana. Note that while Prometheus is the out-of-the-box default metrics backend, Istio allows you to plug in to others, as we’ll demonstrate in future blog posts.

To demonstrate, we'll start by running the following command to generate some load on the application:

wrk -t1 -c1 -d20s http://$BOOKINFO_URL/productpage

We obtain Grafana’s NodePort URL:

export GRAFANA_URL=$(kubectl get po -l app=grafana -o jsonpath={.items[0].status.hostIP}):$(kubectl get svc grafana -o jsonpath={.spec.ports[0].nodePort})

We can now open a browser at http://$GRAFANA_URL/dashboard/db/istio-dashboard and examine the various performance metrics for each of the Bookinfo services:

Distributed tracing The next thing we get from Istio is call tracing with Zipkin. We obtain its NodePort URL:

export ZIPKIN_URL=$(kubectl get po -l app=zipkin -o jsonpath={.items[0].status.hostIP}):$(kubectl get svc zipkin -o jsonpath={.spec.ports[0].nodePort})

We can now point a browser at http://$ZIPKIN_URL/ to see request trace spans through the Bookinfo services.

Although the Envoy proxies send trace spans to Zipkin out-of-the-box, to leverage its full potential, applications need to be Zipkin aware and forward some headers to tie the individual spans together. See zipkin-tracing for details.

Holistic view of the entire fleet The metrics that Istio provides are much more than just a convenience. They provide a consistent view of the service mesh, by generating uniform metrics throughout. We don’t have to worry about reconciling different types of metrics emitted by various runtime agents, or add arbitrary agents to gather metrics for legacy uninstrumented apps. We also no longer have to rely on the development process to properly instrument the application to generate metrics. The service mesh sees all the traffic, even into and out of legacy "black box" services, and generates metrics for all of it. Summary The demo above showed how in a few steps, we can launch Istio-backed services and observe L7 metrics on them. Over the next weeks we’ll follow on with demonstration of more Istio capabilities like policy management and HTTP request routing. Google, IBM and Lyft joined forces to create Istio based on our common experiences building and operating large and complex microservice deployments for internal and enterprise customers. Istio is an industry-wide community effort. We’ve been thrilled to see the enthusiasm from the industry partners and the insights they brought. As we take the next step and release Istio to the wild, we cannot wait to see what the broader community of contributors will bring to it. If you’re using or considering to use a microservices architecture on Kubernetes, we encourage you to give Istio a try, learn about it more at istio.io, let us know what you think, or better yet, join the developer community to help shape its future!

--On behalf of the Istio team. Frank Budinsky, Software Engineer at IBM, Andra Cismaru, Software Engineer and Israel Shalom, Product Manager at Google.

Get involved with the Kubernetes project on GitHub
Post questions (or answer questions) on Stack Overflow
Connect with the community on Slack
Follow us on Twitter @Kubernetesio for latest updates

Today's post is by Brendan Burns, Director of Engineering at Microsoft Azure and Kubernetes co-founder.

About a month ago Microsoft announced the acquisition of Deis to expand our expertise in containers and Kubernetes. Today, I’m excited to announce a new open source project derived from this newly expanded Azure team: Draft.

While by now the strengths of Kubernetes for deploying and managing applications at scale are well understood. The process of developing a new application for Kubernetes is still too hard. It’s harder still if you are new to containers, Kubernetes, or developing cloud applications.

Draft fills this role. As it’s name implies it is a tool that helps you begin that first draft of a containerized application running in Kubernetes. When you first run the draft tool, it automatically discovers the code that you are working on and builds out the scaffolding to support containerizing your application. Using heuristics and a variety of pre-defined project templates draft will create an initial Dockerfile to containerize your application, as well as a Helm Chart to enable your application to be deployed and maintained in a Kubernetes cluster. Teams can even bring their own draft project templates to customize the scaffolding that is built by the tool.

But the value of draft extends beyond simply scaffolding in some files to help you create your application. Draft also deploys a server into your existing Kubernetes cluster that is automatically kept in sync with the code on your laptop. Whenever you make changes to your application, the draft daemon on your laptop synchronizes that code with the draft server in Kubernetes and a new container is built and deployed automatically without any user action required. Draft enables the “inner loop” development experience for the cloud.

Of course, as is the expectation with all infrastructure software today, Draft is available as an open source project, and it itself is in “draft” form :) We eagerly invite the community to come and play around with draft today, we think it’s pretty awesome, even in this early form. But we’re especially excited to see how we can develop a community around draft to make it even more powerful for all developers of containerized applications on Kubernetes.

To give you a sense for what Draft can do, here is an example drawn from the Getting Started page in the GitHub repository.

There are multiple example applications included within the examples directory. For this walkthrough, we'll be using the python example application which uses Flask to provide a very simple Hello World webserver.

$ cd examples/python

Draft Create

We need some "scaffolding" to deploy our app into a Kubernetes cluster. Draft can create a Helm chart, a Dockerfile and a draft.toml with draft create:

$ draft create

--> Python app detected

--> Ready to sail

$ ls

Dockerfile app.py chart/ draft.toml requirements.txt

The chart/ and Dockerfile assets created by Draft default to a basic Python configuration. This Dockerfile harnesses the python:onbuild image, which will install the dependencies in requirements.txt and copy the current directory into /usr/src/app. And to align with the service values in chart/values.yaml, this Dockerfile exposes port 80 from the container.

The draft.toml file contains basic configuration about the application like the name, which namespace it will be deployed to, and whether to deploy the app automatically when local files change.

$ cat draft.toml
[environments]
[environments.development]
   name = "tufted-lamb"
   namespace = "default"
   watch = true
   watch_delay = 2

Draft Up

Now we're ready to deploy app.py to a Kubernetes cluster.

Draft handles these tasks with one draft up command:

reads configuration from draft.toml
compresses the chart/ directory and the application directory as two separate tarballs
uploads the tarballs to draftd, the server-side component
draftd then builds the docker image and pushes the image to a registry
draftd instructs helm to install the Helm chart, referencing the Docker registry image just built

With the watch option set to true, we can let this run in the background while we make changes later on…

$ draft up
--> Building Dockerfile
Step 1 : FROM python:onbuild
onbuild: Pulling from library/python
...
Successfully built 38f35b50162c
--> Pushing docker.io/microsoft/tufted-lamb:5a3c633ae76c9bdb81b55f5d4a783398bf00658e
The push refers to a repository [docker.io/microsoft/tufted-lamb]
...
5a3c633ae76c9bdb81b55f5d4a783398bf00658e: digest: sha256:9d9e9fdb8ee3139dd77a110fa2d2b87573c3ff5ec9c045db6009009d1c9ebf5b size: 16384
--> Deploying to Kubernetes
   Release "tufted-lamb" does not exist. Installing it now.
--> Status: DEPLOYED
--> Notes:
    1. Get the application URL by running these commands:
    NOTE: It may take a few minutes for the LoadBalancer IP to be available.
          You can watch the status of by running 'kubectl get svc -w tufted-lamb-tufted-lamb'
export SERVICE_IP=$(kubectl get svc --namespace default tufted-lamb-tufted-lamb -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:80

Watching local files for changes...

Interact with the Deployed App

Using the handy output that follows successful deployment, we can now contact our app. Note that it may take a few minutes before the load balancer is provisioned by Kubernetes. Be patient!

$ export SERVICE_IP=$(kubectl get svc --namespace default tufted-lamb-tufted-lamb -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ curl http://$SERVICE_IP

When we curl our app, we see our app in action! A beautiful "Hello World!" greets us.

Update the App

Now, let's change the "Hello, World!" output in app.py to output "Hello, Draft!" instead:

$ cat <<EOF > app.py
from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello():
return "Hello, Draft!\n"

if __name__ == "__main__":
app.run(host='0.0.0.0', port=8080)
EOF

Draft Up(grade)

Now if we watch the terminal that we initially called draft up with, Draft will notice that there were changes made locally and call draft up again. Draft then determines that the Helm release already exists and will perform a helm upgrade rather than attempting another helm install:

--> Building Dockerfile
Step 1 : FROM python:onbuild
...
Successfully built 9c90b0445146
--> Pushing docker.io/microsoft/tufted-lamb:f031eb675112e2c942369a10815850a0b8bf190e
The push refers to a repository [docker.io/microsoft/tufted-lamb]
...
--> Deploying to Kubernetes
--> Status: DEPLOYED
--> Notes:
    1. Get the application URL by running these commands:
    NOTE: It may take a few minutes for the LoadBalancer IP to be available.
          You can watch the status of by running 'kubectl get svc -w tufted-lamb-tufted-lamb'
export SERVICE_IP=$(kubectl get svc --namespace default tufted-lamb-tufted-lamb -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:80

Now when we run curl http://$SERVICE_IP, our first app has been deployed and updated to our Kubernetes cluster via Draft!

We hope this gives you a sense for everything that Draft can do to streamline development for Kubernetes. Happy drafting!

--Brendan Burns, Director of Engineering, Microsoft Azure

Post questions (or answer questions) on Stack Overflow

Join the community portal for advocates on K8sPort

Connect with the community on Slack

Get involved with the Kubernetes project on GitHub

Today we’re announcing Kubernetes 1.7, a milestone release that adds security, storage and extensibility features motivated by widespread production use of Kubernetes in the most demanding enterprise environments.

At-a-glance, security enhancements in this release include encrypted secrets, network policy for pod-to-pod communication, node authorizer to limit kubelet access and client / server TLS certificate rotation.

For those of you running scale-out databases on Kubernetes, this release has a major feature that adds automated updates to StatefulSets and enhances updates for DaemonSets. We are also announcing alpha support for local storage and a burst mode for scaling StatefulSets faster.

Also, for power users, API aggregation in this release allows user-provided apiservers to be served along with the rest of the Kubernetes API at runtime. Additional highlights include support for extensible admission controllers, pluggable cloud providers, and container runtime interface (CRI) enhancements.

What’s New
Security:

The Network Policy API is promoted to stable. Network policy, implemented through a network plug-in, allows users to set and enforce rules governing which pods can communicate with each other.
Node authorizer and admission control plugin are new additions that restrict kubelet’s access to secrets, pods and other objects based on its node.
Encryption for Secrets, and other resources in etcd, is now available as alpha.
Kubelet TLS bootstrapping now supports client and server certificate rotation.
Audit logs stored by the API server are now more customizable and extensible with support for event filtering and webhooks. They also provide richer data for system audit.

Stateful workloads:

StatefulSet Updates is a new beta feature in 1.7, allowing automated updates of stateful applications such as Kafka, Zookeeper and etcd, using a range of update strategies including rolling updates.
StatefulSets also now support faster scaling and startup for applications that do not require ordering through Pod Management Policy. This can be a major performance improvement.
Local Storage (alpha) was one of most frequently requested features for stateful applications. Users can now access local storage volumes through the standard PVC/PV interface and via StorageClasses in StatefulSets.
DaemonSets, which create one pod per node already have an update feature, and in 1.7 have added smart rollback and history capability.
A new StorageOS Volume plugin provides highly-available cluster-wide persistent volumes from local or attached node storage.

Extensibility:

API aggregation at runtime is the most powerful extensibility features in this release, allowing power users to add Kubernetes-style pre-built, 3rd party or user-created APIs to their cluster.
Container Runtime Interface (CRI) has been enhanced with New RPC calls to retrieve container metrics from the runtime. Validation tests for the CRI have been published and Alpha integration with containerd, which supports basic pod lifecycle and image management is now available. Read our previous in-depth post introducing CRI.

Additional Features:

Alpha support for external admission controllers is introduced, providing two options for adding custom business logic to the API server for modifying objects as they are created and validating policy.
Policy-based Federated Resource Placement is introduced as Alpha providing placement policies for the federated clusters, based on custom requirements such as regulation, pricing or performance.

Deprecation:

Third Party Resource (TPR) has been replaced with Custom Resource Definitions (CRD) which provides a cleaner API, and resolves issues and corner cases that were raised during the beta period of TPR. If you use the TPR beta feature, you are encouraged to migrate, as it is slated for removal by the community in Kubernetes 1.8.

The above are a subset of the feature highlights in Kubernetes 1.7. For a complete list please visit the release notes.

Adoption
This release is possible thanks to our vast and open community. Together, we’ve already pushed more than 50,000 commits in just three years, and that’s only in the main Kubernetes repo. Additional extensions to Kubernetes are contributed in associated repos bringing overall stability to the project. This velocity makes Kubernetes one of the fastest growing open source projects -- ever.

Kubernetes adoption has been coming from every sector across the world. Recent user stories from the community include:

GolfNow, a member of the NBC Sports Group, migrated their application to Kubernetes giving them better resource utilization and slashing their infrastructure costs in half.
Bitmovin, provider of video infrastructure solutions, showed us how they’re using Kubernetes to do multi-stage canary deployments in the cloud and on-prem.
Ocado, world’s largest online supermarket, uses Kubernetes to create a distributed data center for their smart warehouses. Read about their full setup here.
Is Kubernetes helping your team? Share your story with the community. See our growing resource of user case studies and learn from great companies like Box that have adopted Kubernetes in their organization.

Huge kudos and thanks go out to the Kubernetes 1.7 release team, led by Dawn Chen of Google.

Availability
Kubernetes 1.7 is available for download on GitHub. To get started with Kubernetes, try one of the these interactive tutorials.

Get Involved
Join the community at CloudNativeCon + KubeCon in Austin Dec. 6-8 for the largest Kubernetes gathering ever. Speaking submissions are open till August 21 and discounted registration ends October 6.

The simplest way to get involved is joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and these channels:

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack
Share your Kubernetes story.

Many thanks to our vast community of contributors and supporters in making this and all releases possible.

-- Aparna Sinha, Group Product Manager, Kubernetes Google and Ihor Dvoretskyi, Program Manager, Kubernetes Mirantis

Today’s post is by Sandhya Kapoor, Senior Technologist, Watson Platform for Health, IBM

For more than a year, Watson Platform for Health at IBM deployed healthcare applications in virtual machines on our cloud platform. Because virtual machines had been a costly, heavyweight solution for us, we were interested to evaluate Kubernetes for our deployments.

Our design was to set up the application and data containers in the same namespace, along with the required agents using sidecars, to meet security and compliance requirements in the healthcare industry.

I was able to run more processes on a single physical server than I could using a virtual machine. Also, running our applications in containers ensured optimal usage of system resources.

To orchestrate container deployment, we are usingArmada infrastructure, a Kubernetes implementation by IBM for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure.

With Kubernetes, our developers can rapidly develop highly available applications by leveraging the power and flexibility of containers, and with integrated and secure volume service, we can store persistent data, share data between Kubernetes pods, and restore data when needed.

Here is a snapshot of Watson Care Manager, running inside a Kubernetes cluster:

Before deploying an app, a user must create a worker node cluster. I can create a cluster using the kubectl cli commands or create it from a Bluemix dashboard.

Our clusters consist of one or more physical or virtual machines, also known as worker nodes, that are loosely coupled, extensible, and centrally monitored and managed by the Kubernetes master. When we deploy a containerized app, the Kubernetes master decides where to deploy the app, taking into consideration the deployment requirements and available capacity in the cluster.

A user makes a request to Kubernetes to deploy the containers, specifying the number of replicas required for high availability. The Kubernetes scheduler decides where the pods (groups of one or more containers) will be scheduled and which worker nodes they will be deployed on, storing this information internally in Kubernetes and etcd. The deployment of pods in worker nodes is updated based on load at runtime, optimizing the placement of pods in the cluster.

Kubelet running in each worker node regularly polls the kube API server. If there is new work to do, kubelet pulls the configuration information and takes action, for example, spinning off a new pod.

Process Flow:

UCD– IBM UrbanCode Deploy is a tool for automating application deployments through your environments. WH Cluster – Kubernetes worker node.

Usage of GitLab in the Process Flow:

We stored all our artifacts in GitLab, which includes the Docker files that are required for creating the image, YAML files needed to create a pod, and the configuration files to make the Healthcare application run.

GitLab and Jenkins interaction in the Process Flow:

We use Jenkins for continuous integration and build automation to create/pull/retag the Docker image and push the image to a Docker registry in the cloud.

Basically, we have a Jenkins job configured to interact with GitLab project to get the latest artifacts and, based on requirements, it will either create a new Docker image from scratch by pulling the needed intermediate images from Docker/Bluemix repository or update the Docker image.

After the image is created/updated the Jenkins job pushes the image to a Bluemix repository to save the latest image to be pulled by UrbanCode Deploy (UCD) component.

Jenkins and UCD interaction in the Process Flow:

The Jenkins job is configured to use the UCD component and its respective application, application process, and the UCD environment to deploy the application. The Docker image version files that will be used by the UCD component are also passed via Jenkins job to the UCD component.

Usage of UCD in the Process Flow:

UCD is used for deployment and the end-to end deployment process is automated here. UCD component process involves the following steps:

Download the required artifacts for deployment from the Gitlab.
Login to Bluemix and set the KUBECONFIG based on the Kubernetes cluster used for creating the pods.
Create the application pod in the cluster using kubectl create command.
If needed, run a rolling update to update the existing pod.

Deploying the application in Armada:

Provision a cluster in Armada with <x> worker nodes. Create Kubernetes controllers for deploying the containers in worker nodes, the Armada infrastructure pulls the Docker images from IBM Bluemix Docker registry to create containers. We tried deploying an application container and running a logmet agent (see Reading and displaying logs using logmet container, below) inside the containers that forwards the application logs to an IBM cloud logging service. As part of the process, YAML files are used to create a controller resource for the UrbanCode Deploy (UCD). UCD agent is deployed as a DaemonSet controller, which is used to connect to the UCD server. The whole process of deployment of application happens in UCD. To support the application for public access, we created a service resource to interact between pods and access container services. For storage support, we created persistent volume claims and mounted the volume for the containers.

UCD: IBM UrbanCode Deploy is a tool for automating application deployments through your environments. Armada: Kubernetes implementation of IBM. WH Docker Registry: Docker Private image registry. Common agent containers: We expect to configure our services to use the WHC mandatory agents. We deployed all ion containers.

Reading and displaying logs using logmet container:

Logmet is a cloud logging service that helps to collect, store, and analyze an application’s log data. It also aggregates application and environment logs for consolidated application or environment insights and forwards them. Metrics are transmitted with collectd. We chose a model that runs a logmet agent process inside the container. The agent takes care of forwarding the logs to the cloud logging service configured in containers.

The application pod mounts the application logging directory to the storage space, which is created by persistent volume claim, and stores the logs, which are not lost even when the pod dies. Kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster.

Exposing services with Ingress:

Ingress controllers are reverse proxies that expose services outside cluster through URLs. They act as an external HTTP load balancer that uses a unique public entry point to route requests to the application.

To expose our services to outside the cluster, we used Ingress. In Armada, if we create a paid cluster, an Ingress controller is automatically installed for us to use. We were able to access services through Ingress by creating a YAML resource file that specifies the service path.

–Sandhya Kapoor, Senior Technologist, Watson Platform for Health, IBM

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack
Get involved with the Kubernetes project on GitHub

As we do every July, we’re excited to celebrate Kubernetes 2nd birthday! In the two years since GA 1.0 launched as an open source project, Kubernetes(abbreviated as K8s) has grown to become the highest velocity cloud-related project. With more than 2,611 diverse contributors, from independents to leading global companies, the project has had 50,685 commits in the last 12 months. Of the 54 million projects on GitHub, Kubernetes is in the top 5 for number of unique developers contributing code. It also has more pull requests and issue comments than any other project on GitHub.

Screen Shot 2017-07-18 at 9.39.42 AM.png

Figure 1: Kubernetes Rankings

At the center of the community are Special Interest Groups with members from different companies and organizations, all with a common interest in a specific topic. Given how fast Kubernetes is growing, SIGs help nurture and distribute leadership, while advancing new proposals, designs and release updates. Here's a look at the SIG building blocks supporting Kubernetes:

Screen Shot 2017-07-18 at 11.32.07 AM.png

Kubernetes has also earned the trust of many Fortune 500 companies with deployments at Box, Comcast, Pearson, GolfNow, eBay, Ancestry.com and contributions from CoreOS, Fujitsu, Google, Huawei, Mirantis, Red Hat, Weaveworks and ZTE Company and others. Today, on the second anniversary of the Kubernetes 1.0 launch, we take a look back at some of the major accomplishments of the last year:

July 2016

Kubernauts celebrated its first anniversary of the Kubernetes 1.0 launch with 20#k8sbdayparties hosted worldwide
Kubernetes v1.3 release

September 2016

Kubernetes v1.4 release
Launch of kubeadm, a tool that makes Kubernetes dramatically easier to install
Pokemon Go - one of the largest installs of Kubernetes ever

October 2016

Introduced Kubernetes service partners program and a redesigned partners page

November 2016

CloudNativeCon/KubeCon Seattle
Cloud Native Computing Foundation partners with The Linux Foundation to launch a new Kubernetes certification, training and managed service provider program

December 2016

Kubernetes v1.5 release

January 2017

Survey from CloudNativeCon + KubeCon Seattle showcases the maturation of Kubernetes deployment

March 2017

CloudNativeCon/KubeCon Europe
Kubernetes v1.6 release

April 2017

The Battery Open Source Software (BOSS) Index lists Kubernetes as #33 in the top 100 popular open-source software projects

May 2017

Four Kubernetes projects accepted to TheGoogle Summer of Code(GSOC) 2017 program
Stutterstock and Kubernetes appear in The Wall Street Journal: “On average we [Shutterstock] deploy 45 different releases into production a day using that framework. We use Docker, Kubernetes and Jenkins [to build and run containers and automate development,” said CTO Marty Brodbeck on the company’s IT overhaul and adoption of containerization.

June 2017

Kubernetes v1.7 release
Surveyfrom CloudNativeCon + KubeCon Europe shows Kubernetes leading as the orchestration platform of choice
Kubernetes ranked #4 in the 30 highest velocity open source projects

Figure 2: The 30 highest velocity open source projects. Source: https://github.com/cncf/velocity

July 2017

Kubernauts celebrate the second anniversary of the Kubernetes 1.0 launch with #k8sbdayparties worldwide!

At the one year anniversary of the Kubernetes 1.0 launch, there were 130 Kubernetes-related Meetup groups. Today, there are more than322 Meetup groupswith 104,195 members. Local Meetups around the world joined the#k8sbday celebration! Take a look at some of the pictures from their celebrations. We hope you’ll join us at CloudNativeCon + KubeCon, December 6- 8 in Austin, TX.

Celebrating at the K8s birthday party in San Francisco

Celebrating in RTP, NC with a presentation from Jason McGee, VP and CTO, IBM Cloud Platform. Photo courtesy of @FranklyBriana

The Kubernetes Singapore meetup celebrating with an intro to GKE. Photo courtesy of @hunternield

New York celebrated with mini k8s cupcakes and a presentation on the history of cloud native from CNCF Executive Director, Dan Kohn. Photo courtesy of @arieljatib and @coreos

Quebec City had custom k8s cupcakes too! Photo courtesy of @zig_max

Beijing celebrated with custom k8s lollipops. Photo courtesy of @maxwell9215

-- Sarah Novotny, Program Manager, Kubernetes Community

Inside JD.com's Shift to Kubernetes from OpenStack

Containers as a Service, the foundation for next generation PaaS

Deploying PostgreSQL Clusters using StatefulSets

The K8sPort: Engaging Kubernetes Community One Activity at a Time

Kubernetes 1.6: Multi-user, Multi-workloads at Scale

Five Days of Kubernetes 1.6

Dynamic Provisioning and Storage Classes in Kubernetes

Scalability updates in Kubernetes 1.6: 5,000 node and 150,000 pod clusters

Advanced Scheduling in Kubernetes

Configuring Private DNS Zones and Upstream Nameservers in Kubernetes

RBAC Support in Kubernetes

How Bitmovin is Doing Multi-Stage Canary Deployments with Kubernetes in the Cloud and On-Prem

Dancing at the Lip of a Volcano: The Kubernetes Security Process - Explained

Kargo Ansible Playbooks foster Collaborative Kubernetes Ops

Kubernetes: a monitoring guide

Managing microservices with the Istio service mesh

Draft: Kubernetes container development made easy

Kubernetes 1.7: Security Hardening, Stateful Application Updates and Extensibility

How Watson Health Cloud Deploys Applications with Kubernetes

Happy Second Birthday: A Kubernetes Retrospective