Kompose Helps Developers Move Docker Compose Files to Kubernetes

August 10, 2017, 10:20 am

≫ Next: High Performance Networking with EC2 Virtual Private Clouds

≪ Previous: Happy Second Birthday: A Kubernetes Retrospective

Editor's note: today's post is by Charlie Drage, Software Engineer at Red Hat giving an update about on the Kubernetes project Kompose.

I'm pleased to announce that Kompose, a conversion tool for developers to transition Docker Compose applications to Kubernetes, has graduated from the Kubernetes Incubator to become an official part of the project.

Since our first commit on June 27, 2016, Kompose has achieved 13 releases over 851 commits, gaining 21 contributors since the inception of the project. Our work started at Skippbox (now part of Bitnami) and grew through contributions from Google and Red Hat.

The Kubernetes Incubator allowed contributors to get to know each other across companies, as well as collaborate effectively under guidance from Kubernetes contributors and maintainers. Our incubation led to the development and release of a new and useful tool for the Kubernetes ecosystem.

We’ve created a reliable, scalable Kubernetes environment from an initial Docker Compose file. We worked hard to convert as many keys as possible to their Kubernetes equivalent. Running a single command gets you up and running on Kubernetes: kompose up.

We couldn’t have done it without feedback and contributions from the community!

If you haven’t yet tried Kompose on GitHub check it out!

Kubernetes guestbook

The go-to example for Kubernetes is the famous guestbook, which we use as a base for conversion.

Here is an example from the official kompose.io site, starting with a simple Docker Compose file.

First, we’ll retrieve the file:

$ wget https://raw.githubusercontent.com/kubernetes/kompose/master/examples/docker-compose.yaml

You can test it out by first deploying to Docker Compose:

$ docker-compose up -d

Creating network "examples_default" with the default driver

Creating examples_redis-slave_1

Creating examples_frontend_1

Creating examples_redis-master_1

And when you’re ready to deploy to Kubernetes:

$ kompose up

We are going to create Kubernetes Deployments, Services and PersistentVolumeClaims for your Dockerized application.

If you need different kind of resources, use the kompose convert and kubectl create -f commands instead.

INFO Successfully created Service: redis

INFO Successfully created Service: web

INFO Successfully created Deployment: redis

INFO Successfully created Deployment: web

Your application has been deployed to Kubernetes. You can run kubectl get deployment,svc,pods,pvc for details

Check out other examples of what Kompose can do.

Converting to alternative Kubernetes controllers

Kompose can also convert to specific Kubernetes controllers with the use of flags:

$ kompose convert --help

Usage:

kompose convert [file] [flags]

Kubernetes Flags:

--daemon-set Generate a Kubernetes daemonset object

-d, --deployment Generate a Kubernetes deployment object

-c, --chart Create a Helm chart for converted objects

--replication-controller Generate a Kubernetes replication controller object

…

For example, let’s convert our guestbook example to a DaemonSet:

$ kompose convert --daemon-set

INFO Kubernetes file "frontend-service.yaml" created

INFO Kubernetes file "redis-master-service.yaml" created

INFO Kubernetes file "redis-slave-service.yaml" created

INFO Kubernetes file "frontend-daemonset.yaml" created

INFO Kubernetes file "redis-master-daemonset.yaml" created

INFO Kubernetes file "redis-slave-daemonset.yaml" created

Key Kompose 1.0 features

With our graduation, comes the release of Kompose 1.0.0, here’s what’s new:

Docker Compose Version 3: Kompose now supports Docker Compose Version 3. New keys such as ‘deploy’ now convert to their Kubernetes equivalent.
Docker Push and Build Support: When you supply a ‘build’ key within your `docker-compose.yaml` file, Kompose will automatically build and push the image to the respective Docker repository for Kubernetes to consume.
New Keys: With the addition of version 3 support, new keys such as pid and deploy are supported. For full details on what Kompose supports, view our conversion document.
Bug Fixes: In every release we fix any bugs related to edge-cases when converting. This release fixes issues relating to converting volumes with ‘./’ in the target name.

What’s ahead?

As we continue development, we will strive to convert as many Docker Compose keys as possible for all future and current Docker Compose releases, converting each one to their Kubernetes equivalent. All future releases will be backwards-compatible.

--Charlie Drage, Software Engineer, Red Hat

Post questions (or answer questions) onStack Overflow
Join the community portal for advocates onK8sPort
Follow us on Twitter@Kubernetesiofor latest updates
Connect with the community onSlack
Get involved with the Kubernetes project onGitHub

↧

High Performance Networking with EC2 Virtual Private Clouds

August 11, 2017, 8:23 am

≫ Next: Kubernetes Meets High-Performance Computing

≪ Previous: Kompose Helps Developers Move Docker Compose Files to Kubernetes

One of the most popular platforms for running Kubernetes is Amazon Web Services’ Elastic Compute Cloud (AWS EC2). With more than a decade of experience delivering IaaS, and expanding over time to include a rich set of services with easy to consume APIs, EC2 has captured developer mindshare and loyalty worldwide.

When it comes to networking, however, EC2 has some limits that hinder performance and make deploying Kubernetes clusters to production unnecessarily complex. The preview release of Romana v2.0, a network and security automation solution for Cloud Native applications, includes features that address some well known network issues when running Kubernetes in EC2.

Traditional VPC Networking Performance Roadblocks

A Kubernetes pod network is separate from an Amazon Virtual Private Cloud (VPC) instance network; consequently, off-instance pod traffic needs a route to the destination pods. Fortunately, VPCs support setting these routes. When building a cluster network with the kubenet plugin, whenever new nodes are added, the AWS cloud provider will automatically add a VPC route to the pods running on that node.

Using kubenet to set routes provides native VPC network performance and visibility. However, since kubenet does not support more advanced network functions like network policy for pod traffic isolation, many users choose to run a Container Network Interface (CNI) provider on the back end.

Before Romana v2.0, all CNI network providers required an overlay when used across Availability Zones (AZs), leaving CNI users who want to deploy HA clusters unable to get the performance of native VPC networking.

Even users who don’t need advanced networking encounter restriction, since the VPC route tables support a maximum of 50 entries, which limits the size of a cluster to 50 nodes (or less, if some VPC routes are needed for other purposes). Until Romana v2.0, users also needed to run an overlay network to get around this limit.

Whether you were interested in advanced networking for traffic isolation or running large production HA clusters (or both), you were unable to get the performance and visibility of native VPC networking.

Native VPC Networking Availability
	Advanced Network Features		HA Production Deployment
	Small	>50	Small	>50
Single Zone	Native VPC	Native VPC	XXX	XXX
Multi-zone	N/A	N/A	Native VPC	N/A

Before Romana v2.0, native VPC networking wasn’t available for HA clusters greater than 50 nodes and network policy required overlay across zones.

Kubernetes on Multi-Segment Networks

The way to avoid running out of VPC routes is to use them sparingly by making them forward pod traffic for multiple instances. From a networking perspective, what that means is that the VPC route needs to forward to a router, which can then forward traffic on to the final destination instance.

Romana is a CNI network provider that configures routes on the host to forward pod network traffic without an overlay. Since inter-node routes are installed on hosts, no VPC routes are necessary at all. However, when the VPC is split into subnets for an HA deployment across zones, VPC routes are necessary.

Fortunately, inter-node routes on hosts allows them to act as a network router and forward traffic inbound from another zone just as it would for traffic from local pods. This makes any Kubernetes node configured by Romana able to accept inbound pod traffic from other zones and forward it to the proper destination node on the subnet.

Because of this local routing function, top-level routes to pods on other instances on the subnet can be aggregated, collapsing the total number of routes necessary to as few as one per subnet. To avoid using a single instance to forward all traffic, more routes can be used to spread traffic across multiple instances, up to the maximum number of available routes (i.e. equivalent to kubenet).

The net result is that you can now build clusters of any size across AZs without an overlay. Romana clusters also support network policies for better security through network isolation.

Making it All Work

While the combination of aggregated routes and node forwarding on a subnet eliminates overlays and avoids the VPC 50 route limitation, it imposes certain requirements on the CNI provider. For example, hosts should be configured with inter-node routes only to other nodes in the same zone on the local subnet. Traffic to all other hosts must use the default route off host, then use the (aggregated) VPC route to forward traffic out of the zone. Also: when adding a new host, in order to maintain aggregated VPC routes, the CNI plugin needs to use IP addresses for pods that are reachable on the new host.

The latest release of Romana also addresses questions about how VPC routes are installed; what happens when a node that is forwarding traffic fails; how forwarding node failures are detected; and how routes get updated and the cluster recovers.

Romana v2.0 includes a new AWS route configuration function to set VPC routes. This is part of a new set of network advertising features that automate route configuration in L3 networks. Romana v2.0 includes topology-aware IP address management (IPAM) that enables VPC route aggregation to stay within the 50 route limit as described here, as well as new health checks to update VPC routes when a routing instance fails. For smaller clusters, Romana configures VPC routes as kubenet does, with a route to each instance, taking advantage of every available VPC route.

Native VPC Networking Everywhere

When using Romana v2.0, native VPC networking is now available for clusters of any size, with or without network policies and for HA production deployment split across multiple zones.

Native VPC Networking Availability
	Advanced Network Features		HA Production Deployment
	Small	>50	Small	>50
Single Zone	Native VPC	Native VPC	XXX	XXX
Multi-zone	Native VPC	Native VPC	Native VPC	Native VPC

With Romana v2.0, native VPC networking is available for HA clusters of any size, and network policy never requires an overlay.

The preview release of Romana v2.0 is available here. We welcome comments and feedback so we can make EC2 deployments of Kubernetes as fast and reliable as possible.

--Juergen Brendel and Chris Marino, co-founders of Pani Networks, sponsor of the Romana project

↧

Kubernetes Meets High-Performance Computing

August 22, 2017, 8:58 am

≫ Next: Windows Networking at Parity with Linux for Kubernetes

≪ Previous: High Performance Networking with EC2 Virtual Private Clouds

Editor's note: today's post is by Robert Lalonde, general manager at Univa, on supporting mixed HPC and containerized applications

Anyone who has worked with Docker can appreciate the enormous gains in efficiency achievable with containers. While Kubernetes excels at orchestrating containers, high-performance computing (HPC) applications can be tricky to deploy on Kubernetes.

In this post, I discuss some of the challenges of running HPC workloads with Kubernetes, explain how organizations approach these challenges today, and suggest an approach for supporting mixed workloads on a shared Kubernetes cluster. We will also provide information and links to a case study on a customer, IHME, showing how Kubernetes is extended to service their HPC workloads seamlessly while retaining scalability and interfaces familiar to HPC users.

HPC workloads unique challenges

In Kubernetes, the base unit of scheduling is a Pod: one or more Docker containers scheduled to a cluster host. Kubernetes assumes that workloads are containers. While Kubernetes has the notion of Cron Jobs and Jobs that run to completion, applications deployed on Kubernetes are typically long-running services, like web servers, load balancers or data stores and while they are highly dynamic with pods coming and going, they differ greatly from HPC application patterns.

Traditional HPC applications often exhibit different characteristics:

In financial or engineering simulations, a job may be comprised of tens of thousands of short-running tasks, demanding low-latency and high-throughput scheduling to complete a simulation in an acceptable amount of time.
A computational fluid dynamics (CFD) problem may execute in parallel across many hundred or even thousands of nodes using a message passing library to synchronize state. This requires specialized scheduling and job management features to allocate and launch such jobs and then to checkpoint, suspend/resume or backfill them.
Other HPC workloads may require specialized resources like GPUs or require access to limited software licenses. Organizations may enforce policies around what types of resources can be used by whom to ensure projects are adequately resourced and deadlines are met.

HPC workload schedulers have evolved to support exactly these kinds of workloads. Examples include Univa Grid Engine, IBM Spectrum LSF and Altair’s PBS Professional. Sites managing HPC workloads have come to rely on capabilities like array jobs, configurable pre-emption, user, group or project based quotas and a variety of other features.

Blurring the lines between containers and HPC

HPC users believe containers are valuable for the same reasons as other organizations. Packaging logic in a container to make it portable, insulated from environmental dependencies, and easily exchanged with other containers clearly has value. However, making the switch to containers can be difficult.

HPC workloads are often integrated at the command line level. Rather than requiring coding, jobs are submitted to queues via the command line as binaries or simple shell scripts that act as wrappers. There are literally hundreds of engineering, scientific and analytic applications used by HPC sites that take this approach and have mature and certified integrations with popular workload schedulers.

While the notion of packaging a workload into a Docker container, publishing it to a registry, and submitting a YAML description of the workload is second nature to users of Kubernetes, this is foreign to most HPC users. An analyst running models in R, MATLAB or Stata simply wants to submit their simulation quickly, monitor their execution, and get a result as quickly as possible.

Existing approaches

To deal with the challenges of migrating to containers, organizations running container and HPC workloads have several options:

Maintain separate infrastructures

For sites with sunk investments in HPC, this may be a preferred approach. Rather than disrupt existing environments, it may be easier to deploy new containerized applications on a separate cluster and leave the HPC environment alone. The challenge is that this comes at the cost of siloed clusters, increasing infrastructure and management cost.

Run containerized workloads under an existing HPC workload manager

For sites running traditional HPC workloads, another approach is to use existing job submission mechanisms to launch jobs that in turn instantiate Docker containers on one or more target hosts. Sites using this approach can introduce containerized workloads with minimal disruption to their environment. Leading HPC workload managers such as Univa Grid Engine Container Edition and IBM Spectrum LSF are adding native support for Docker containers. Shifter and Singularity are important open source tools supporting this type of deployment also. While this is a good solution for sites with simple requirements that want to stick with their HPC scheduler, they will not have access to native Kubernetes features, and this may constrain flexibility in managing long-running services where Kubernetes excels.

Use native job scheduling features in Kubernetes

Sites less invested in existing HPC applications can use existing scheduling facilities in Kubernetes for jobs that run to completion. While this is an option, it may be impractical for many HPC users. HPC applications are often either optimized towards massive throughput or large scale parallelism. In both cases startup and teardown latencies have a discriminating impact. Latencies that appear to be acceptable for containerized microservices today would render such applications unable to scale to the required levels.

All of these solutions involve tradeoffs. The first option doesn’t allow resources to be shared (increasing costs) and the second and third options require customers to pick a single scheduler, constraining future flexibility.

Mixed workloads on Kubernetes

A better approach is to support HPC and container workloads natively in the same shared environment. Ideally, users should see the environment appropriate to their workload or workflow type.

One approach to supporting mixed workloads is to allow Kubernetes and the HPC workload manager to co-exist on the same cluster, throttling resources to avoid conflicts. While simple, this means that neither workload manager can fully utilize the cluster.

Another approach is to use a peer scheduler that coordinates with the Kubernetes scheduler. Navops Command by Univa is a solution that takes this third approach, augmenting the functionality of the Kubernetes scheduler. Navops Command provides its own web interface and CLI and allows additional scheduling policies to be enabled on Kubernetes without impacting the operation of the Kubernetes scheduler and existing containerized applications. Navops Command plugs into the Kubernetes architecture via the 'schedulerName' attribute in the pod spec as a peer scheduler that workloads can choose to use instead of the Kubernetes stock scheduler as shown below.

Screen Shot 2017-08-15 at 9.15.45 AM.png

With this approach, Kubernetes acts as a resource manager, making resources available to a separate HPC scheduler. Cluster administrators can use a visual interface to allocate resources based on policy or simply drag sliders via a web UI to allocate different proportions of the Kubernetes environment to non-container (HPC) workloads, and native Kubernetes applications and services.

From a client perspective, the HPC scheduler runs as a service deployed in Kubernetes pods, operating just as it would on a bare metal cluster. Navops Command provides additional scheduling features including things like resource reservation, run-time quotas, workload preemption and more. This environment works equally well for on-premise, cloud-based or hybrid deployments.

Deploying mixed workloads at IHME

One client having success with mixed workloads is the Institute for Health Metrics & Evaluation (IHME), an independent health research center at the University of Washington. In support of their globally recognized Global Health Data Exchange (GHDx), IHME operates a significantly sized environment comprised of 500 nodes and 20,000 cores running a mix of analytic, HPC, and container-based applications on Kubernetes. This case study describes IHME’s success hosting existing HPC workloads on a shared Kubernetes cluster using Navops Command.

For sites deploying new clusters that want access to the rich capabilities in Kubernetes but need the flexibility to run non-containerized workloads, this approach is worth a look. It offers the opportunity for sites to share infrastructure between Kubernetes and HPC workloads without disrupting existing applications and businesses processes. It also allows them to migrate their HPC workloads to use Docker containers at their own pace.

↧

Windows Networking at Parity with Linux for Kubernetes

September 8, 2017, 12:27 pm

≫ Next: Introducing the Resource Management Working Group

≪ Previous: Kubernetes Meets High-Performance Computing

Editor's note: today's post is by Jason Messer, Principal PM Manager at Microsoft, on improvements to the Windows network stack to support the Kubernetes CNI model.

Since I last blogged aboutKubernetes Networking for Windows four months ago, the Windows Core Networking team has made tremendous progress in both the platform and open source Kubernetes projects. With the updates, Windows is now on par with Linux in terms of networking. Customers can now deploy mixed-OS, Kubernetes clusters in any environment including Azure, on-premises, and on 3rd-party cloud stacks with the same network primitives and topologies supported on Linux without any workarounds, “hacks”, or 3rd-party switch extensions.

"So what?", you may ask. There are multiple application and infrastructure-related reasons why these platform improvements make a substantial difference in the lives of developers and operations teams wanting to run Kubernetes. Read on to learn more!

Tightly-Coupled Communication

These improvements enable tightly-coupled communication between multiple Windows Server containers (without Hyper-V isolation) within a single "Pod". Think of Pods as the scheduling unit for the Kubernetes cluster, inside of which, one or more application containers are co-located and able to share storage and networking resources. All containers within a Pod shared the same IP address and port range and are able to communicate with each other using localhost. This enables applications to easily leverage "helper" programs for tasks such as monitoring, configuration updates, log management, and proxies. Another way to think of a Pod is as a compute host with the app containers representing processes.

Simplified Network Topology

We also simplified the network topology on Windows nodes in a Kubernetes cluster by reducing the number of endpoints required per container (or more generally, per pod) to one. Previously, Windows containers (pods) running in a Kubernetes cluster required two endpoints - one for external (internet) communication and a second for intra-cluster communication between other nodes or pods in the cluster. This was due to the fact that external communication from containers attached to a host network with local scope (i.e. not publicly routable) required a NAT operation which could only be provided through the Windows NAT (WinNAT) component on the host. Intra-cluster communication required containers to be attached to a separate network with "global" (cluster-level) scope through a second endpoint. Recent platform improvements now enable NAT''ing to occur directly on a container endpoint which is implemented with the Microsoft Virtual Filtering Platform (VFP) Hyper-V switch extension. Now, both external and intra-cluster traffic can flow through a single endpoint.

Load-Balancing using VFP in Windows kernel

Kubernetes worker nodes rely on the kube-proxy to load-balance ingress network traffic to Service IPs between pods in a cluster. Previous versions of Windows implemented the Kube-proxy's load-balancing through a user-space proxy. We recently added support for "Proxy mode: iptables" which is implemented using VFP in the Windows kernel so that any IP traffic can be load-balanced more efficiently by the Windows OS kernel. Users can also configure an external load balancer by specifying the externalIP parameter in a service definition.

In addition to the aforementioned improvements, we have also added platform support for the following:

Support for DNS search suffixes per container / Pod (Docker improvement - removes additional work previously done by kube-proxy to append DNS suffixes)
[Platform Support] 5-tuple rules for creating ACLs (Looking for help from community to integrate this with support for K8s Network Policy)

Now that Windows Server hasjoined the Windows Insider Program, customers and partners can take advantage of these new platform features today which accrue value to eagerly anticipated, new feature release later this year and new build after six months. The latest Windows Server insider build now includes support for all of these platform improvements.

In addition to the platform improvements for Windows, the team submitted code (PRs) for CNI, kubelet, and kube-proxy with the goal of mainlining Windows support into the Kubernetes v1.8 release. These PRs remove previous work-arounds required on Windows for items such as user-mode proxy for internal load balancing, appending additional DNS suffixes to each Kube-DNS request, and a separate container endpoint for external (internet) connectivity.

These new platform features and work on kubelet and kube-proxy align with the CNI network model used by Kubernetes on Linux and simplify the deployment of a K8s cluster without additional configuration or custom (Azure) resource templates. To this end, we completed work on CNI network and IPAM plugins to create/remove endpoints and manage IP addresses. The CNI plugin works through kubelet to target the Windows Host Networking Service (HNS) APIs to create an 'l2bridge' network (analogous to macvlan on Linux) which is enforced by the VFP switch extension.

The 'l2bridge' network driver re-writes the MAC address of container network traffic on ingress and egress to use the container host's MAC address. This obviates the need for multiple MAC addresses (one per container running on the host) to be "learned" by the upstream network switch port to which the container host is connected. This preserves memory space in physical switch TCAM tables and relies on the Hyper-V virtual switch to do MAC address translation in the host to forward traffic to the correct container. IP addresses are managed by a default, Windows IPAM plug-in which requires that POD CIDR IPs be taken from the container host's network IP space.

The team demoed (link to video) these new platform features and open-source updates to the SIG-Windows group on 8/8. We are working with the community to merge the kubelet and kube-proxy PRs to mainline these changes in time for the Kubernetes v1.8 release due out this September. These capabilities can then be used on current Windows Server insider builds and the Windows Server, version 1709.

Soon after RTM, we will also introduce these improvements into the Azure Container Service (ACS) so that Windows worker nodes and the containers hosted are first-class, Azure VNet citizens. An Azure IPAM plugin for Windows CNI will enable these endpoints to directly attach to Azure VNets with network policies for Windows containers enforced the same way as VMs.

Feature	Windows Server 2016 (In-Market)	Next Windows Server Feature Release, Semi-Annual Channel	Linux
Multiple Containers per Pod with shared network namespace (Compartment)	One Container per Pod	✔	✔
Single (Shared) Endpoint per Pod	Two endpoints: WinNAT (External) + Transparent (Intra-Cluster)	✔	✔
User-Mode, Load Balancing	✔	✔	✔
Kernel-Mode, Load Balancing	Not Supported	✔	✔
Support for DNS search suffixes per Pod (Docker update)	Kube-Proxy added multiple DNS suffixes to each request	✔	✔
CNI Plugin Support	Not Supported	✔	✔

The Kubernetes SIG Windows group meets bi-weekly on Tuesdays at 12:30 PM ET. To join or view notes from previous meetings, check out this document.

↧

Introducing the Resource Management Working Group

September 20, 2017, 5:26 pm

≫ Next: Kubernetes StatefulSets & DaemonSets Updates

≪ Previous: Windows Networking at Parity with Linux for Kubernetes

Editor's note: today's post is by Jeremy Eder, Senior Principal Software Engineer at Red Hat, on the formation of the Resource Management Working Group

Why are we here?

Kubernetes has evolved to support diverse and increasingly complex classes of applications. We can onboard and scale out modern, cloud-native web applications based on microservices, batch jobs, and stateful applications with persistent storage requirements.

However, there are still opportunities to improve Kubernetes; for example, the ability to run workloads that require specialized hardware or those that perform measurably better when hardware topology is taken into account. These conflicts can make it difficult for application classes (particularly in established verticals) to adopt Kubernetes.

We see an unprecedented opportunity here, with a high cost if it’s missed. The Kubernetes ecosystem must create a consumable path forward to the next generation of system architectures by catering to needs of as-yet unserviced workloads in meaningful ways. The Resource Management Working Group, along with other SIGs, must demonstrate the vision customers want to see, while enabling solutions to run well in a fully integrated, thoughtfully planned end-to-end stack.

Kubernetes Working Groups are created when a particular challenge requires cross-SIG collaboration. The Resource Management Working Group, for example, works primarily with sig-node and sig-scheduling to drive support for additional resource management capabilities in Kubernetes. We make sure that key contributors from across SIGs are frequently consulted because working groups are not meant to make system-level decisions on behalf of any SIG.

An example and key benefit of this is the working group’s relationship with sig-node. We were able to ensure completion of several releases of node reliability work (complete in 1.6) before contemplating feature design on top. Those designs are use-case driven: research into technical requirements for a variety of workloads, then sorting based on measurable impact to the largest cross-section.

Target Workloads and Use-cases

One of the working group’s key design tenets is that user experience must remain clean and portable, while still surfacing infrastructure capabilities that are required by businesses and applications.

While not representing any commitment, we hope in the fullness of time that Kubernetes can optimally run financial services workloads, machine learning/training, grid schedulers, map-reduce, animation workloads, and more. As a use-case driven group, we account for potential application integration that can also facilitate an ecosystem of complementary independent software vendors to flourish on top of Kubernetes.

Why do this?

Kubernetes covers generic web hosting capabilities very well, so why go through the effort of expanding workload coverage for Kubernetes at all? The fact is that workloads elegantly covered by Kubernetes today, only represent a fraction of the world’s compute usage. We have a tremendous opportunity to safely and methodically expand upon the set of workloads that can run optimally on Kubernetes.

To date, there’s demonstrable progress in the areas of expanded workload coverage:

Stateful applications such as Zookeeper, etcd, MySQL, Cassandra, ElasticSearch
Jobs, such as timed events to process the day’s logs or any other batch processing
Machine Learning and compute-bound workload acceleration through Alpha GPU support

Collectively, the folks working on Kubernetes are hearing from their customers that we need to go further. Following the tremendous popularity of containers in 2014, industry rhetoric circled around a more modern, container-based, datacenter-level workload orchestrator as folks looked to plan their next architectures.

As a consequence, we began advocating for increasing the scope of workloads covered by Kubernetes, from overall concepts to specific features. Our aim is to put control and choice in users hands, helping them move with confidence towards whatever infrastructure strategy they choose. In this advocacy, we quickly found a large group of like-minded companies interested in broadening the types of workloads that Kubernetes can orchestrate. And thus the working group was born.

Genesis of the Resource Management Working Group

After extensive development/feature discussions during the Kubernetes Developer Summit 2016 after CloudNativeCon | KubeCon Seattle, we decided to formalize our loosely organized group. In January 2017, the Kubernetes Resource Management Working Groupwas formed. This group (led by Derek Carr from Red Hat and Vishnu Kannan from Google) was originally cast as a temporary initiative to provide guidance back to sig-node and sig-scheduling (primarily). However, due to the cross-cutting nature of the goals within the working group, and the depth of roadmap quickly uncovered, the Resource Management Working Group became its own entity within the first few months.

Recently, Brian Grant from Google (@bgrant0607) posted the following image on his Twitter feed. This image helps to explain the role of each SIG, and shows where the Resource Management Working Group fits into the overall project organization.

To help bootstrap this effort, the Resource Management Working Group had its first face-to-face kickoff meeting in May 2017. Thanks to Google for hosting!

Folks from Intel, NVIDIA, Google, IBM, Red Hat. and Microsoft (among others) participated.
You can read the outcomes of that 3-day meetinghere.

The group’s prioritized list of features for increasing workload coverage on Kubernetes enumerated in the charter of the Resource Management Working group includes:

Support for performance sensitive workloads (exclusive cores, cpu pinning strategies, NUMA)
Integrating new hardware devices (GPUs, FPGAs, Infiniband, etc.)
Improving resource isolation (local storage, hugepages, caches, etc.)
Improving Quality of Service (performance SLOs)
Performance benchmarking
APIs and extensions related to the features mentioned above

The discussions made it clear that there was tremendous overlap between needs for various workloads, and that we ought to de-duplicate requirements, and plumb generically.

Workload Characteristics

The set of initially targeted use-cases share one or more of the following characteristics:

Deterministic performance (address long tail latencies)
Isolation within a single node, as well as within groups of nodes sharing a control plane
Requirements on advanced hardware and/or software capabilities
Predictable, reproducible placement: applications need granular guarantees around placement

The Resource Management Working Group is spearheading the feature design and development in support of these workload requirements. Our goal is to provide best practices and patterns for these scenarios.

Initial Scope

In the months leading up to our recent face-to-face, we had discussed how to safely abstract resources in a way that retains portability and clean user experience, while still meeting application requirements. The working group came away with a multi-release roadmap that included 4 short- to mid-term targets with great overlap between target workloads:

Device Manager (Plugin) Proposal

Kubernetes should provide access to hardware devices such as NICs, GPUs, FPGA, Infiniband and so on.

CPU Manager

Kubernetes should provide a way for users to request static CPU assignment via the Guaranteed QoS tier. No support for NUMA in this phase.

HugePages support in Kubernetes

Kubernetes should provide a way for users to consume huge pages of any size.

Resource Class proposal

Kubernetes should implement an abstraction layer (analogous to StorageClasses) for devices other than CPU and memory that allows a user to consume a resource in a portable way. For example, how can a pod request a GPU that has a minimum amount of memory?

Getting Involved & Summary

Our charter document includes a Contact Us section with links to our mailing list, Slack channel, and Zoom meetings. Recordings of previous meetings are uploaded to Youtube. We plan to discuss these topics and more at the 2017 Kubernetes Developer Summit at CloudNativeCon | KubeCon in Austin. Please come and join one of our meetings (users, customers, software and hardware vendors are all welcome) and contribute to the working group!

↧

Kubernetes StatefulSets & DaemonSets Updates

September 26, 2017, 8:51 pm

≫ Next: Kubernetes 1.8: Security, Workloads and Feature Depth

≪ Previous: Introducing the Resource Management Working Group

Editor's note: today's post is by Janet Kuo and Kenneth Owens, Software Engineers at Google.

Google

This post talks about recent updates to the DaemonSet and StatefulSet API objects for Kubernetes. We explore these features using Apache ZooKeeper and Apache KafkaStatefulSets and a Prometheus node exporterDaemonSet.

In Kubernetes 1.6, we added the RollingUpdate update strategy to the DaemonSet API Object. Configuring your DaemonSets with the RollingUpdate strategy causes the DaemonSet controller to perform automated rolling updates to the Pods in your DaemonSets when their spec.template are updated.

In Kubernetes 1.7, we enhanced the DaemonSet controller to track a history of revisions to the PodTemplateSpecs of DaemonSets. This allows the DaemonSet controller to roll back an update. We also added the RollingUpdate strategy to the StatefulSet API Object, and implemented revision history tracking for the StatefulSet controller. Additionally, we added the Parallel pod management policy to support stateful applications that require Pods with unique identities but not ordered Pod creation and termination.

StatefulSet rolling update and Pod management policy

First, we’re going to demonstrate how to use StatefulSet rolling updates and Pod management policies by deploying a ZooKeeper ensemble and a Kafka cluster.

Prerequisites

To follow along, you’ll need to set up a Kubernetes 1.7 cluster with at least 3 schedulable nodes. Each node needs 1 CPU and 2 GiB of memory available. You will also need either a dynamic provisioner to allow the StatefulSet controller to provision 6 persistent volumes (PVs) with 10 GiB each, or you will need to manually provision the PVs prior to deploying the ZooKeeper ensemble or deploying the Kafka cluster.

Deploying a ZooKeeper ensemble

Apache ZooKeeper is a strongly consistent, distributed system used by other distributed systems for cluster coordination and configuration management.

Note: You can create a ZooKeeper ensemble using this zookeeper_mini.yaml manifest. You can learn more about running a ZooKeeper ensemble on Kubernetes here, as well as a more in-depth explanation of the manifest and its contents.

When you apply the manifest, you will see output like the following.

$ kubectl apply -f zookeeper_mini.yaml

service "zk-hs" created

service "zk-cs" created

poddisruptionbudget "zk-pdb" created

statefulset "zk" created

The manifest creates an ensemble of three ZooKeeper servers using a StatefulSet, zk; a Headless Service, zk-hs, to control the domain of the ensemble; a Service, zk-cs, that clients can use to connect to the ready ZooKeeper instances; and a PodDisruptionBugdet, zk-pdb, that allows for one planned disruption. (Note that while this ensemble is suitable for demonstration purposes, it isn’t sized correctly for production use.)

If you use kubectl get to watch Pod creation in another terminal you will see that, in contrast to the OrderedReady strategy (the default policy that implements the full version of the StatefulSet guarantees), all of the Pods in the zk StatefulSet are created in parallel.

$ kubectl get po -lapp=zk -w

NAME READY STATUS RESTARTS AGE

zk-0 0/1 Pending 0 0s

zk-1 0/1 Pending 0 0s

zk-0 0/1 ContainerCreating 0 0s

zk-2 0/1 Pending 0 0s

zk-1 0/1 ContainerCreating 0 0s

zk-2 0/1 Pending 0 0s

zk-2 0/1 ContainerCreating 0 0s

zk-0 0/1 Running 0 10s

zk-2 0/1 Running 0 11s

zk-1 0/1 Running 0 19s

zk-0 1/1 Running 0 20s

zk-1 1/1 Running 0 30s

zk-2 1/1 Running 0 30s

This is because the zookeeper_mini.yaml manifest sets the podManagementPolicy of the StatefulSet to Parallel.

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: zk
spec:
serviceName: zk-hs
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: Parallel
...

Many distributed systems, like ZooKeeper, do not require ordered creation and termination for their processes. You can use the Parallel Pod management policy to accelerate the creation and deletion of StatefulSets that manage these systems. Note that, when Parallel Pod management is used, the StatefulSet controller will not block when it fails to create a Pod. Ordered, sequential Pod creation and termination is performed when a StatefulSet’s podManagementPolicy is set to OrderedReady.

Deploying a Kafka Cluster

Apache Kafka is a popular distributed streaming platform. Kafka producers write data to partitioned topics which are stored, with a configurable replication factor, on a cluster of brokers. Consumers consume the produced data from the partitions stored on the brokers.

Note: Details of the manifests contents can be found here. You can learn more about running a Kafka cluster on Kubernetes here.

To create a cluster, you only need to download and apply the kafka_mini.yaml manifest. When you apply the manifest, you will see output like the following:

$ kubectl apply -f kafka_mini.yaml

service "kafka-hs" created

poddisruptionbudget "kafka-pdb" created

statefulset "kafka" created

The manifest creates a three broker cluster using the kafka StatefulSet, a Headless Service, kafka-hs, to control the domain of the brokers; and a PodDisruptionBudget, kafka-pdb, that allows for one planned disruption. The brokers are configured to use the ZooKeeper ensemble we created above by connecting through the zk-cs Service. As with the ZooKeeper ensemble deployed above, this Kafka cluster is fine for demonstration purposes, but it’s probably not sized correctly for production use.

If you watch Pod creation, you will notice that, like the ZooKeeper ensemble created above, the Kafka cluster uses the Parallel podManagementPolicy.

$ kubectl get po -lapp=kafka -w

NAME READY STATUS RESTARTS AGE

kafka-0 0/1 Pending 0 0s

kafka-1 0/1 Pending 0 0s

kafka-2 0/1 Pending 0 0s

kafka-0 0/1 ContainerCreating 0 0s

kafka-2 0/1 Pending 0 0s

kafka-1 0/1 ContainerCreating 0 0s

kafka-1 0/1 Running 0 11s

kafka-0 0/1 Running 0 19s

kafka-1 1/1 Running 0 23s

kafka-0 1/1 Running 0 32s

Producing and consuming data

You can use kubectl run to execute the kafka-topics.sh script to create a topic named test.

$ kubectl run -ti --image=gcr.io/google_containers/kubernetes-kafka:1.0-10.2.1 createtopic --restart=Never --rm -- kafka-topics.sh --create \

> --topic test \

> --zookeeper zk-cs.default.svc.cluster.local:2181 \

> --partitions 1 \

> --replication-factor 3

Now you can use kubectl run to execute the kafka-console-consumer.sh command to listen for messages.

$ kubectl run -ti --image=gcr.io/google_containers/kubnetes-kafka:1.0-10.2.1 consume --restart=Never --rm -- kafka-console-consumer.sh --topic test --bootstrap-server kafka-0.kafka-hs.default.svc.cluster.local:9093

In another terminal, you can run the kafka-console-producer.sh command.

$kubectl run -ti --image=gcr.io/google_containers/kubernetes-kafka:1.0-10.2.1 produce --restart=Never --rm \

> -- kafka-console-producer.sh --topic test --broker-list kafka-0.kafka-hs.default.svc.cluster.local:9093,kafka-1.kafka-hs.default.svc.cluster.local:9093,kafka-2.kafka-hs.default.svc.cluster.local:9093

Output from the second terminal appears in the first terminal. If you continue to produce and consume messages while updating the cluster, you will notice that no messages are lost. You may see error messages as the leader for the partition changes when individual brokers are updated, but the client retries until the message is committed. This is due to the ordered, sequential nature of StatefulSet rolling updates which we will explore further in the next section.

Updating the Kafka cluster

StatefulSet updates are like DaemonSet updates in that they are both configured by setting the spec.updateStrategy of the corresponding API object. When the update strategy is set to OnDelete, the respective controllers will only create new Pods when a Pod in the StatefulSet or DaemonSet has been deleted. When the update strategy is set to RollingUpdate, the controllers will delete and recreate Pods when a modification is made to the spec.template field of a DaemonSet or StatefulSet. You can use rolling updates to change the configuration (via environment variables or command line parameters), resource requests, resource limits, container images, labels, and/or annotations of the Pods in a StatefulSet or DaemonSet. Note that all updates are destructive, always requiring that each Pod in the DaemonSet or StatefulSet be destroyed and recreated. StatefulSet rolling updates differ from DaemonSet rolling updates in that Pod termination and creation is ordered and sequential.

You can patch the kafka StatefulSet to reduce the CPU resource request to 250m.

$ kubectl patch sts kafka --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value":"250m"}]'

statefulset "kafka" patched

If you watch the status of the Pods in the StatefulSet, you will see that each Pod is deleted and recreated in reverse ordinal order (starting with the Pod with the largest ordinal and progressing to the smallest). The controller waits for each updated Pod to be running and ready before updating the subsequent Pod.

$kubectl get po -lapp=kafka -w

NAME READY STATUS RESTARTS AGE

kafka-0 1/1 Running 0 13m

kafka-1 1/1 Running 0 13m

kafka-2 1/1 Running 0 13m

kafka-2 1/1 Terminating 0 14m

kafka-2 0/1 Terminating 0 14m

kafka-2 0/1 Pending 0 0s

kafka-2 0/1 ContainerCreating 0 0s

kafka-2 0/1 Running 0 10s

kafka-2 1/1 Running 0 21s

kafka-1 1/1 Terminating 0 14m

kafka-1 0/1 Terminating 0 14m

kafka-1 0/1 Pending 0 0s

kafka-1 0/1 ContainerCreating 0 0s

kafka-1 0/1 Running 0 11s

kafka-1 1/1 Running 0 21s

kafka-0 1/1 Terminating 0 14m

kafka-0 0/1 Terminating 0 14m

kafka-0 0/1 Pending 0 0s

kafka-0 0/1 ContainerCreating 0 0s

kafka-0 0/1 Running 0 10s

kafka-0 1/1 Running 0 22s

Note that unplanned disruptions will not lead to unintentional updates during the update process. That is, the StatefulSet controller will always recreate the Pod at the correct version to ensure the ordering of the update is preserved. If a Pod is deleted, and if it has already been updated, it will be created from the updated version of the StatefulSet’s spec.template. If the Pod has not already been updated, it will be created from the previous version of the StatefulSet’s spec.template. We will explore this further in the following sections.

Staging an update

Depending on how your organization handles deployments and configuration modifications, you may want or need to stage updates to a StatefulSet prior to allowing the roll out to progress. You can accomplish this by setting a partition for the RollingUpdate. When the StatefulSet controller detects a partition in the updateStrategy of a StatefulSet, it will only apply the updated version of the StatefulSet’s spec.template to Pods whose ordinal is greater than or equal to the value of the partition.

You can patch the kafka StatefulSet to add a partition to the RollingUpdate update strategy. If you set the partition to a number greater than or equal to the StatefulSet’s spec.replicas (as below), any subsequent updates you perform to the StatefulSet’s spec.template will be staged for roll out, but the StatefulSet controller will not start a rolling update.

$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":3}}}}'

statefulset "kafka" patched

If you patch the StatefulSet to set the requested CPU to 0.3, you will notice that none of the Pods are updated.

$ kubectl patch sts kafka --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value":"0.3"}]'

statefulset "kafka" patched

Even if you delete a Pod and wait for the StatefulSet controller to recreate it, you will notice that the Pod is recreated with current CPU request.

$ kubectl delete po kafka-1

pod "kafka-1" deleted

$ kubectl get po kafka-1 -w

NAME READY STATUS RESTARTS AGE

kafka-1 0/1 ContainerCreating 0 10s

kafka-1 0/1 Running 0 19s

kafka-1 1/1 Running 0 21s

$ kubectl get po kafka-1 -o yaml

apiVersion: v1

kind: Pod

metadata:

...

resources:

requests:

cpu: 250m

memory: 1Gi

Rolling out a canary

Often, we want to verify an image update or configuration change on a single instance of an application before rolling it out globally. If you modify the partition created above to be 2, the StatefulSet controller will roll out a canary that can be used to verify that the update is working as intended.

$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'

statefulset "kafka" patched

You can watch the StatefulSet controller update the kafka-2 Pod and pause after the update is complete.

$ kubectl get po -lapp=kafka -w

NAME READY STATUS RESTARTS AGE

kafka-0 1/1 Running 0 50m

kafka-1 1/1 Running 0 10m

kafka-2 1/1 Running 0 29s

kafka-2 1/1 Terminating 0 34s

kafka-2 0/1 Terminating 0 38s

kafka-2 0/1 Terminating 0 39s

kafka-2 0/1 Pending 0 0s

kafka-2 0/1 Terminating 0 20s

kafka-2 0/1 Pending 0 0s

kafka-2 0/1 ContainerCreating 0 0s

kafka-2 0/1 Running 0 19s

kafka-2 1/1 Running 0 22s

Phased roll outs

Similar to rolling out a canary, you can roll out updates based on a phased progression (e.g. linear, geometric, or exponential roll outs).

If you patch the kafka StatefulSet to set the partition to 1, the StatefulSet controller updates one more broker.

$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":1}}}}'

statefulset "kafka" patched

If you set it to 0, the StatefulSet controller updates the final broker and completes the update.

$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'

statefulset "kafka" patched

Note that you don’t have to decrement the partition by one. For a larger StatefulSet--for example, one with 100 replicas--you might use a progression more like 100, 99, 90, 50, 0. In this case, you would stage your update, deploy a canary, roll out to 10 instances, update fifty percent of the Pods, and then complete the update.

Cleaning up

To delete the API Objects created above, you can use kubectl delete on the two manifests you used to create the ZooKeeper ensemble and the Kafka cluster.

$ kubectl delete -f kafka_mini.yaml

service "kafka-hs" deleted

poddisruptionbudget "kafka-pdb" deleted

Statefulset “kafka” deleted

$ kubectl delete -f zookeeper_mini.yaml

service "zk-hs" deleted

service "zk-cs" deleted

poddisruptionbudget "zk-pdb" deleted

statefulset "zk" deleted

By design, the StatefulSet controller does not delete any persistent volume claims (PVCs): the PVCs created for the ZooKeeper ensemble and the Kafka cluster must be manually deleted. Depending on the storage reclamation policy of your cluster, you many also need to manually delete the backing PVs.

DaemonSet rolling update, history, and rollback

In this section, we’re going to show you how to perform a rolling update on a DaemonSet, look at its history, and then perform a rollback after a bad rollout. We will use a DaemonSet to deploy a Prometheus node exporter on each Kubernetes node in the cluster. These node exporters export node metrics to the Prometheus monitoring system. For the sake of simplicity, we’ve omitted the installation of the Prometheus server and the service for communication with DaemonSet pods from this blogpost.

Prerequisites

To follow along with this section of the blog, you need a working Kubernetes 1.7 cluster and kubectl version 1.7 or later. If you followed along with the first section, you can use the same cluster.

DaemonSet rolling update: Prometheus node exporters

First, prepare the node exporter DaemonSet manifest to run a v0.13 Prometheus node exporter on every node in the cluster:

$ cat >> node-exporter-v0.13.yaml <<EOF

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-exporter
spec:
updateStrategy:
   type: RollingUpdate
template:
   metadata:
     labels:
       app: node-exporter
     name: node-exporter
   spec:
     containers:
     - image: prom/node-exporter:v0.13.0
       name: node-exporter
       ports:
       - containerPort: 9100
         hostPort: 9100
         name: scrape
     hostNetwork: true
     hostPID: true

EOF

Note that you need to enable the DaemonSet rolling update feature by explicitly setting DaemonSet.spec.updateStrategy.type to RollingUpdate.

Apply the manifest to create the node exporter DaemonSet:

$ kubectl apply -f node-exporter-v0.13.yaml --record

daemonset "node-exporter" created

Wait for the first DaemonSet rollout to complete:

$ kubectl rollout status ds node-exporter
daemon set "node-exporter" successfully rolled out

You should see each of your node runs one copy of the node exporter pod:

$ kubectl get pods -l app=node-exporter -o wide

To perform a rolling update on the node exporter DaemonSet, prepare a manifest that includes the v0.14 Prometheus node exporter:

$ cat node-exporter-v0.13.yaml | sed "s/v0.13.0/v0.14.0/g"> node-exporter-v0.14.yaml

Then apply the v0.14 node exporter DaemonSet:

$ kubectl apply -f node-exporter-v0.14.yaml --record

daemonset "node-exporter" configured

Wait for the DaemonSet rolling update to complete:

$ kubectl rollout status ds node-exporter

...

Waiting for rollout to finish: 3 out of 4 new pods have been updated...
Waiting for rollout to finish: 3 of 4 updated pods are available...
daemon set "node-exporter" successfully rolled out

We just triggered a DaemonSet rolling update by updating the DaemonSet template. By default, one old DaemonSet pod will be killed and one new DaemonSet pod will be created at a time.

Now we’ll cause a rollout to fail by updating the image to an invalid value:

$ cat node-exporter-v0.13.yaml | sed "s/v0.13.0/bad/g"> node-exporter-bad.yaml

$ kubectl apply -f node-exporter-bad.yaml --record

daemonset "node-exporter" configured

Notice that the rollout never finishes:

$ kubectl rollout status ds node-exporter
Waiting for rollout to finish: 0 out of 4 new pods have been updated...
Waiting for rollout to finish: 1 out of 4 new pods have been updated…

# Use ^C to exit

This behavior is expected. We mentioned earlier that a DaemonSet rolling update kills and creates one pod at a time. Because the new pod never becomes available, the rollout is halted, preventing the invalid specification from propagating to more than one node. StatefulSet rolling updates implement the same behavior with respect to failed deployments. Unsuccessful updates are blocked until it corrected via roll back or by rolling forward with a specification.

$ kubectl get pods -l app=node-exporter

NAME READY STATUS RESTARTS AGE

node-exporter-f2n14 0/1 ErrImagePull 0 3m

...

# N = number of nodes

$ kubectl get ds node-exporter
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
node-exporter N N N-1 1 N <none> 46m

DaemonSet history, rollbacks, and rolling forward

Next, perform a rollback. Take a look at the node exporter DaemonSet rollout history:

$ kubectl rollout history ds node-exporter
daemonsets "node-exporter"
REVISION        CHANGE-CAUSE
1               kubectl apply --filename=node-exporter-v0.13.yaml --record=true
2               kubectl apply --filename=node-exporter-v0.14.yaml --record=true

3 kubectl apply --filename=node-exporter-bad.yaml --record=true

Check the details of the revision you want to roll back to:

$ kubectl rollout history ds node-exporter --revision=2
daemonsets "node-exporter" with revision #2
Pod Template:
Labels:       app=node-exporter
Containers:
  node-exporter:
   Image:      prom/node-exporter:v0.14.0
   Port:       9100/TCP
   Environment:        <none>
   Mounts:     <none>
Volumes:      <none>

You can quickly roll back to any DaemonSet revision you found through kubectl rollout history:

# Roll back to the last revision

$ kubectl rollout undo ds node-exporter
daemonset "node-exporter" rolled back

# Or use --to-revision to roll back to a specific revision

$ kubectl rollout undo ds node-exporter --to-revision=2
daemonset "node-exporter" rolled back

A DaemonSet rollback is done by rolling forward. Therefore, after the rollback, DaemonSet revision 2 becomes revision 4 (current revision):

$ kubectl rollout history ds node-exporter
daemonsets "node-exporter"
REVISION        CHANGE-CAUSE
1               kubectl apply --filename=node-exporter-v0.13.yaml --record=true
3               kubectl apply --filename=node-exporter-bad.yaml --record=true
4               kubectl apply --filename=node-exporter-v0.14.yaml --record=true

The node exporter DaemonSet is now healthy again:

$ kubectl rollout status ds node-exporter
daemon set "node-exporter" successfully rolled out

# N = number of nodes

$ kubectl get ds node-exporter

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
node-exporter N N N N N <none> 46m

If current DaemonSet revision is specified while performing a rollback, the rollback is skipped:

$ kubectl rollout undo ds node-exporter --to-revision=4
daemonset "node-exporter" skipped rollback (current template already matches revision 4)

You will see this complaint from kubectl if the DaemonSet revision is not found:

$ kubectl rollout undo ds node-exporter --to-revision=10
error: unable to find specified revision 10 in history

Note that kubectl rollout history and kubectl rollout status support StatefulSets, too!

Cleaning up

$ kubectl delete ds node-exporter

What’s next for DaemonSet and StatefulSet

Rolling updates and roll backs close an important feature gap for DaemonSets and StatefulSets. As we plan for Kubernetes 1.8, we want to continue to focus on advancing the core controllers to GA. This likely means that some advanced feature requests (e.g. automatic roll back, infant mortality detection) will be deferred in favor of ensuring the consistency, usability, and stability of the core controllers. We welcome feedback and contributions, so please feel free to reach out on Slack, to ask questions on Stack Overflow, or open issues or pull requests on GitHub.

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack
Get involved with the Kubernetes project on GitHub

↧

Kubernetes 1.8: Security, Workloads and Feature Depth

September 28, 2017, 9:54 pm

≫ Next: Kubernetes Community Steering Committee Election Results

≪ Previous: Kubernetes StatefulSets & DaemonSets Updates

Editor's note: today's post is by Aparna Sinha, Group Product Manager, Kubernetes, Google; Ihor Dvoretskyi, Developer Advocate, CNCF; Jaice Singer DuMars, Kubernetes Ambassador, Microsoft; and Caleb Miles, Technical Program Manager, CoreOS on the latest release of Kubernetes 1.8.

We’re pleased to announce the delivery of Kubernetes 1.8, our third release this year. Kubernetes 1.8 represents a snapshot of many exciting enhancements and refinements underway. In addition to functional improvements, we’re increasing project-wide focus on maturing process, formalizing architecture, and strengthening Kubernetes’ governance model. The evolution of mature processes clearly signals that sustainability is a driving concern, and helps to ensure that Kubernetes is a viable and thriving project far into the future.

Spotlight on security

Kubernetes 1.8 graduates support for role based access control (RBAC) to stable. RBAC allows cluster administrators to dynamically define roles to enforce access policies through the Kubernetes API. Beta support for filtering outbound traffic through network policies augments existing support for filtering inbound traffic to a pod. RBAC and Network Policies are two powerful tools for enforcing organizational and regulatory security requirements within Kubernetes.

Transport Layer Security (TLS) certificate rotation for the Kubelet graduates to beta. Automatic certificate rotation eases secure cluster operation.

Spotlight on workload support

Kubernetes 1.8 promotes the core Workload APIs to beta with the apps/v1beta2 group and version. The beta contains the current version of Deployment, DaemonSet, ReplicaSet, and StatefulSet. The Workloads APIs provide a stable foundation for migrating existing workloads to Kubernetes as well as developing cloud native applications that target Kubernetes natively.

For those considering running Big Data workloads on Kubernetes, the Workloads API now enables native Kubernetes support in Apache Spark.

Batch workloads, such as nightly ETL jobs, will benefit from the graduation of CronJobs to beta.

Custom Resource Definitions (CRDs) remain in beta for Kubernetes 1.8. A CRD provides a powerful mechanism to extend Kubernetes with user-defined API objects. One use case for CRDs is the automation of complex stateful applications such as key-value stores, databases and storage engines through the Operator Pattern. Expect continued enhancements to CRDs such as validation as stabilization continues.

Spoilers ahead

volume snapshots, PV resizing, automatic taints, priority pods, kubectl plugins, oh my!

In addition to stabilizing existing functionality, Kubernetes 1.8 offers a number of alpha features that preview new functionality.

Each Special Interest Group (SIG) in the community continues to deliver the most requested user features for their area. For a complete list, please visit the release notes.

Availability

Kubernetes 1.8 is available for download on GitHub. To get started with Kubernetes, check out these interactive tutorials.

Release team

The Release team for 1.8 was led by Jaice Singer DuMars, Kubernetes Ambassador at Microsoft, and was comprised of 14 individuals responsible for managing all aspects of the release, from documentation to testing, validation, and feature completeness.

As the Kubernetes community has grown, our release process has become an amazing demonstration of collaboration in open source software development. Kubernetes continues to gain new users at a rapid clip. This growth creates a positive feedback cycle where more contributors commit code creating a more vibrant ecosystem.

User highlights

According to Redmonk, 54 percent of Fortune 100 companies are running Kubernetes in some form with adoption coming from every sector across the world. Recent user stories from the community include:

Ancestry.com currently holds 20 billion historical records and 90 million family trees, making it the largest consumer genomics DNA network in the world. With the move to Kubernetes, its deployment time for its Shaky Leaf iconservice was cut down from 50 minutes to 2 or 5 minutes.
Wink, provider of smart home devices and apps, runs 80 percent of its workloads on a unified stack of Kubernetes-Docker-CoreOS, allowing them to continually innovate and improve its products and services.
Pear Deck, a teacher communication app for students, ported their Heroku apps into Kubernetes, allowing them to deploy the exact same configuration in lots of different clusters in 30 seconds.
Buffer, social media management for agencies and marketers, has a remote team of 80 spread across a dozen different time zones. Kubernetes has provided the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary.

Is Kubernetes helping your team? Share your story with the community.

Ecosystem updates

Announced on September 11, Kubernetes Certified Service Providers(KCSPs) are pre-qualified organizations with deep experience helping enterprises successfully adopt Kubernetes.Individual professionals can nowregisterfor the new Certified Kubernetes Administrator (CKA) program and exam, which requires passing an online, proctored, performance-based exam that tests one’s ability to solve multiple issues in a hands-on, command-line environment.
CNCF also offers online trainingthat teaches the skills needed to create and configure a real-world Kubernetes cluster.

KubeCon

Join the community at KubeCon + CloudNativeCon in Austin, December 6-8 for the largest Kubernetes gathering ever. The premiere Kubernetes event will feature technical sessions, case studies, developer deep dives, salons and more! A full schedule of events and speakers will be available here on September 28. Discounted registration ends October 6.

Open Source Summit EU

Ihor Dvoretskyi, Kubernetes 1.8 features release lead, will presentnew features and enhancements at Open Source Summit EU in Prague, October 23. Registration is still open.

Get involved

The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below.

Thank you for your continued feedback and support.

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Follow us on Twitter @Kubernetesio for latest updates
Chat with the community on Slack

Share your Kubernetes story.

↧

Kubernetes Community Steering Committee Election Results

October 5, 2017, 10:30 am

≫ Next: Request Routing and Policy Management with the Istio Service Mesh

≪ Previous: Kubernetes 1.8: Security, Workloads and Feature Depth

Beginning with the announcement of Kubernetes 1.0 at OSCON in 2015, there has been a concerted effort to share the power and burden of leadership across the Kubernetes community.

With the work of the Bootstrap Governance Committee, consisting of Brandon Phillips, Brendan Burns, Brian Grant, Clayton Coleman, Joe Beda, Sarah Novotny and Tim Hockin - a cross section of long-time leaders representing 5 different companies with major investments of talent and effort in the Kubernetes Ecosystem - we wrote an initial Steering Committee Charter and launched a community wide election to seat a Kubernetes Steering Committee.

To quote from the Charter -

The initial role of the steering committee is to instantiate the formal process for Kubernetes governance. In addition to defining the initial governance process, the bootstrap committee strongly believes that it is important to provide a means for iterating the processes defined by the steering committee. We do not believe that we will get it right the first time, or possibly ever, and won’t even complete the governance development in a single shot. The role of the steering committee is to be a live, responsive body that can refactor and reform as necessary to adapt to a changing project and community.

This is our largest step yet toward making an implicit governance structure explicit. Kubernetes vision has been one of an inclusive and broad community seeking to build software which empowers our users with the portability of containers. The Steering Committee will be a strong leadership voice guiding the project toward success.

The Kubernetes Community is pleased to announce the results of the 2017 Steering Committee Elections. Please congratulate Aaron Crickenberger, Derek Carr, Michelle Noorali, Phillip Wittrock, Quinton Hoole and Timothy St. Clair, who will be joining the members of the Bootstrap Governance committee on the newly formed Kubernetes Steering Committee. Derek, Michelle, and Phillip will serve for 2 years. Aaron, Quinton, and Timothy will serve for 1 year.

This group will meet regularly in order to clarify and streamline the structure and operation of the project. Early work will include electing a representative to the CNCF Governing Board, evolving project processes, refining and documenting the vision and scope of the project, and chartering and delegating to more topical community groups.

Please see the full Steering Committee backlog for more details.

↧

Request Routing and Policy Management with the Istio Service Mesh

October 10, 2017, 10:43 am

≫ Next: Introducing Software Certification for Kubernetes

≪ Previous: Kubernetes Community Steering Committee Election Results

Editor's note: Today’s post by Frank Budinsky, Software Engineer, IBM, Andra Cismaru, Software Engineer, Google, and Israel Shalom, Product Manager, Google, is the second post in a three-part series on Istio. It offers a closer look at request routing and policy management.

In a previous article, we looked at a simple application (Bookinfo) that is composed of four separate microservices. The article showed how to deploy an application with Kubernetes and an Istio-enabled cluster without changing any application code. The article also outlined how to view Istio provided L7 metrics on the running services.

This article follows up by taking a deeper look at Istio using Bookinfo. Specifically, we’ll look at two more features of Istio: request routing and policy management.

Running the Bookinfo Application

As before, we run the v1 version of the Bookinfo application. After installing Istio in our cluster, we start the app defined in bookinfo-v1.yaml using the following command:

kubectl apply -f <(istioctl kube-inject -f bookinfo-v1.yaml)

We created an Ingress resource for the app:

cat <<EOF | kubectl create -f -

apiVersion: extensions/v1beta1

kind: Ingress

metadata:

annotations:

kubernetes.io/ingress.class: "istio"

spec:

rules:

- http:

paths:

- path: /productpage

backend:

serviceName: productpage

servicePort: 9080

- path: /login

backend:

serviceName: productpage

servicePort: 9080

- path: /logout

backend:

serviceName: productpage

servicePort: 9080

EOF

Then we retrieved the NodePort address of the Istio Ingress controller:

export BOOKINFO_URL=$(kubectl get po -n istio-system -l istio=ingress -o jsonpath={.items[0].status.hostIP}):$(kubectl get svc -n istio-system istio-ingress -o jsonpath={.spec.ports[0].nodePort})

Finally, we pointed our browser to http://$BOOKINFO_URL/productpage, to see the running v1 application:

HTTP request routing

Existing container orchestration platforms like Kubernetes, Mesos, and other microservice frameworks allow operators to control when a particular set of pods/VMs should receive traffic (e.g., by adding/removing specific labels). Unlike existing techniques, Istio decouples traffic flow and infrastructure scaling. This allows Istio to provide a variety of traffic management features that reside outside the application code, including dynamic HTTP request routing for A/B testing, canary releases, gradual rollouts, failure recovery using timeouts, retries, circuit breakers, and fault injection to test compatibility of failure recovery policies across services.

To demonstrate, we’ll deploy v2 of the reviews service and use Istio to make it visible only for a specific test user. We can create a Kubernetes deployment, reviews-v2, with this YAML file:

apiVersion: extensions/v1beta1

kind: Deployment

metadata:

spec:

replicas: 1

template:

metadata:

labels:

app: reviews

version: v2

spec:

containers:

- name: reviews

image: istio/examples-bookinfo-reviews-v2:0.2.3

imagePullPolicy: IfNotPresent

ports:

- containerPort: 9080

From a Kubernetes perspective, the v2 deployment adds additional pods that the reviews service selector includes in the round-robin load balancing algorithm. This is also the default behavior for Istio.

Before we start reviews:v2, we’ll start the last of the four Bookinfo services, ratings, which is used by the v2 version to provide ratings stars corresponding to each review:

kubectl apply -f <(istioctl kube-inject -f bookinfo-ratings.yaml)

If we were to start reviews:v2 now, we would see browser responses alternating between v1 (reviews with no corresponding ratings) and v2 (review with black rating stars). This will not happen, however, because we’ll use Istio’s traffic management feature to control traffic.

With Istio, new versions don’t need to become visible based on the number of running pods. Version visibility is controlled instead by rules that specify the exact criteria. To demonstrate, we start by using Istio to specify that we want to send 100% of reviews traffic to v1 pods only.

Immediately setting a default rule for every service in the mesh is an Istio best practice. Doing so avoids accidental visibility of newer, potentially unstable versions. For the purpose of this demonstration, however, we’ll only do it for the reviews service:

cat <<EOF | istioctl create -f -

apiVersion: config.istio.io/v1alpha2

kind: RouteRule

metadata:

spec:

destination:

route:

- labels:

version: v1

weight: 100

EOF

This command directs the service mesh to send 100% of traffic for the reviews service to pods with the label “version: v1”. With this rule in place, we can safely deploy the v2 version without exposing it.

kubectl apply -f <(istioctl kube-inject -f bookinfo-reviews-v2.yaml)

Refreshing the Bookinfo web page confirms that nothing has changed.

At this point we have all kinds of options for how we might want to expose reviews:v2. If for example we wanted to do a simple canary test, we could send 10% of the traffic to v2 using a rule like this:

apiVersion: config.istio.io/v1alpha2

kind: RouteRule

metadata:

spec:

destination:

route:

- labels:

version: v2

weight: 10

- labels:

version: v1

weight: 90

A better approach for early testing of a service version is to instead restrict access to it much more specifically. To demonstrate, we’ll set a rule to only make reviews:v2 visible to a specific test user. We do this by setting a second, higher priority rule that will only be applied if the request matches a specific condition:

cat <<EOF | istioctl create -f -

apiVersion: config.istio.io/v1alpha2

kind: RouteRule

metadata:

spec:

destination:

precedence: 2

match:

request:

headers:

cookie:

regex: "^(.*?;)?(user=jason)(;.*)?$"

route:

- labels:

version: v2

weight: 100

EOF

Here we’re specifying that the request headers need to include a user cookie with value “tester” as the condition. If this rule is not matched, we fall back to the default routing rule for v1.

If we login to the Bookinfo UI with the user name “tester” (no password needed), we will now see version v2 of the application (each review includes 1-5 black rating stars). Every other user is unaffected by this change.

Once the v2 version has been thoroughly tested, we can use Istio to proceed with a canary test using the rule shown previously, or we can simply migrate all of the traffic from v1 to v2, optionally in a gradual fashion by using a sequence of rules with weights less than 100 (for example: 10, 20, 30, ... 100). This traffic control is independent of the number of pods implementing each version. If, for example, we had auto scaling in place, and high traffic volumes, we would likely see a corresponding scale up of v2 and scale down of v1 pods happening independently at the same time. For more about version routing with autoscaling, check out "Canary Deployments using Istio".

In our case, we’ll send all of the traffic to v2 with one command:

cat <<EOF | istioctl replace -f -

apiVersion: config.istio.io/v1alpha2

kind: RouteRule

metadata:

spec:

destination:

route:

- labels:

version: v2

weight: 100

EOF

We should also remove the special rule we created for the tester so that it doesn’t override any future rollouts we decide to do:

istioctl delete routerule reviews-test-v2

In the Bookinfo UI, we’ll see that we are now exposing the v2 version of reviews to all users.

Policy enforcement

Istio provides policy enforcement functions, such as quotas, precondition checking, and access control. We can demonstrate Istio’s open and extensible framework for policies with an example: rate limiting.

Let’s pretend that the Bookinfo ratings service is an external paid service--for example, Rotten Tomatoes®--with a free quota of 1 request per second (req/sec). To make sure the application doesn’t exceed this limit, we’ll specify an Istio policy to cut off requests once the limit is reached. We’ll use one of Istio’s built-in policies for this purpose.

To set a 1 req/sec quota, we first configure a memquota handler with rate limits:

cat <<EOF | istioctl create -f -

apiVersion: "config.istio.io/v1alpha2"

kind: memquota

metadata:

namespace: default

spec:

quotas:

- name: requestcount.quota.default

maxAmount: 5000

validDuration: 1s

overrides:

- dimensions:

destination: ratings

maxAmount: 1

validDuration: 1s

EOF

Then we create a quota instance that maps incoming attributes to quota dimensions, and create a rule that uses it with the memquota handler:

cat <<EOF | istioctl create -f -

apiVersion: "config.istio.io/v1alpha2"

kind: quota

metadata:

namespace: default

spec:

dimensions:

source: source.labels["app"] | source.service | "unknown"

sourceVersion: source.labels["version"] | "unknown"

destination: destination.labels["app"] | destination.service | "unknown"

destinationVersion: destination.labels["version"] | "unknown"

---

apiVersion: "config.istio.io/v1alpha2"

kind: rule

metadata:

namespace: default

spec:

actions:

- handler: handler.memquota

instances:

- requestcount.quota

EOF

To see the rate limiting in action, we’ll generate some load on the application:

wrk -t1 -c1 -d20s http://$BOOKINFO_URL/productpage

In the web browser, we’ll notice that while the load generator is running (i.e., generating more than 1 req/sec), browser traffic is cut off. Instead of the black stars next to each review, the page now displays a message indicating that ratings are not currently available.

Stopping the load generator means the limit will no longer be exceeded: the black stars return when we refresh the page.

Summary

We’ve shown you how to introduce advanced features like HTTP request routing and policy injection into a service mesh configured with Istio without restarting any of the services. This lets you develop and deploy without worrying about the ongoing management of the service mesh; service-wide policies can always be added later.

In the next and last installment of this series, we’ll focus on Istio’s security and authentication capabilities. We’ll discuss how to secure all interservice communications in a mesh, even against insiders with access to the network, without any changes to the application code or the deployment.

↧

Introducing Software Certification for Kubernetes

October 19, 2017, 8:00 am

≫ Next: Five Days of Kubernetes 1.8

≪ Previous: Request Routing and Policy Management with the Istio Service Mesh

Editor's Note: Today's post is by William Denniss, Product Manager, Google Cloud on the new Kubernetes Software Conformance Certification program.

Over the last three years, Kubernetes® has seen wide-scale adoption by a vibrant and diverse community of providers. In fact, there are now more than 60 known Kubernetes platforms and distributions. From the start, one goal of Kubernetes has been consistency and portability.

In order to better serve this goal, today the Kubernetes community and the Cloud Native Computing Foundation® (CNCF®) announce the availability of the beta Kubernetes Software Conformance Certification program. The Kubernetes conformance certification program gives users the confidence that when they use a Certified Kubernetes™ product, they can rely on a high level of common functionality. Certification provides Independent Software Vendors (ISVs) confidence that if their customer is using a Certified Kubernetes product, their software will behave as expected.

CNCF and the Kubernetes Community invites all vendors to run the conformance test suite, and submit conformance testing results for review and certification by the CNCF. When the program graduates to GA (generally available) later this year, all vendors receiving certification during the beta period will be listed in the launch announcement.

Just like Kubernetes itself, conformance certification is an evolving program managed by contributors in our community. Certification is versioned alongside Kubernetes, and certification requirements receive updates with each version of Kubernetes as features are added and the architecture changes. The Kubernetes community, through SIG Architecture, controls changes and overseers what it means to be Certified Kubernetes. The Testing SIG works on the mechanics of conformance tests, while the Conformance Working Group develops process and policy for the certification program.

Once the program moves to GA, certified products can proudly display the new Certified Kubernetes logo mark with stylized version information on their marketing materials. Certified products can also take advantage of a new combination trademark rule the CNCF adopted for Certified Kubernetes providers that keep their certification up to date.

Products must complete a recertification each year for the current or previous version of Kubernetes to remain certified. This ensures that when you see the Certified Kubernetes™ mark on a product, you’re not only getting something that’s proven conformant, but also contains the latest features and improvements from the community.

Visit https://github.com/cncf/k8s-conformance for more information about Kubernetes Software Compliance Certification, and learn how you can include your product in a growing list of Certified Kubernetes providers.

“Cloud Native Computing Foundation”, “CNCF” and “Kubernetes” are registered trademarks of The Linux Foundation in the United States and other countries. “Certified Kubernetes” and the Certified Kubernetes design are trademarks of The Linux Foundation in the United States and other countries.

↧

Five Days of Kubernetes 1.8

October 24, 2017, 12:17 pm

≫ Next: kubeadm v1.8 Released: Introducing Easy Upgrades for Kubernetes Clusters

≪ Previous: Introducing Software Certification for Kubernetes

Kubernetes 1.8 is live, made possible by hundreds of contributors pushing thousands of commits in this latest releases.

The community has tallied more than 66,000 commits in the main repo and continues rapid growth outside of the main repo, which signals growing maturity and stability for the project. The community has logged more than 120,000 commits across all repos and 17,839 commits across all repos for v1.7.0 to v1.8.0 alone.

With the help of our growing community of 1,400 plus contributors, we issued more than 3,000 PRs and pushed more than 5,000 commits to deliver Kubernetes 1.8 with significant security and workload support updates. This all points to increased stability, a result of our project-wide focus on maturing process, formalizing architecture, and strengthening Kubernetes’ governance model.

While many improvements have been contributed, we highlight key features in this series of in-depth posts listed below. Follow along and see what’s new and improved with storage, security and more.

Day 1: 5 Days of Kubernetes 1.8
Day 2: kubeadm v1.8 Introduces Easy Upgrades for Kubernetes Clusters
Day 3: Kuberentes v.1.8 Retrospective: It Takes a Village to Raise a Kubernetes
Day 4: Using RBAC, Generally Available in Kubernetes v1.8
Day 5: Enforcing Network Policies in Kubernetes

Connect

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack
Get involved with the Kubernetes project on GitHub

↧

kubeadm v1.8 Released: Introducing Easy Upgrades for Kubernetes Clusters

October 25, 2017, 10:39 am

≫ Next: It Takes a Village to Raise a Kubernetes

≪ Previous: Five Days of Kubernetes 1.8

Editor’s note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.8

Since its debut in September 2016, the Cluster Lifecycle Special Interest Group (SIG) has established kubeadm as the easiest Kubernetes bootstrap method. Now, we’re releasing kubeadm v1.8.0 in tandem with the release of Kubernetes v1.8.0. In this blog post, I’ll walk you through the changes we’ve made to kubeadm since the last update, the scope of kubeadm, and how you can contribute to this effort.

Security first: kubeadm v1.6 & v1.7

Previously, we discussed planned updates for kubeadm v1.6. Our primary focus for v1.6 was security. We started enforcing role based access control (RBAC) as it graduated to beta, gave unique identities and locked-down privileges for different system components in the cluster, disabled the insecure `localhost:8080` API server port, started authorizing all API calls to the kubelets, and improved the token discovery method used formerly in v1.5. Token discovery (aka Bootstrap Tokens) graduated to beta in v1.8.

In number of features, kubeadm v1.7.0 was a much smaller release compared to v1.6.0 and v1.8.0. The main additions were enforcing the Node Authorizer, which significantly reduces the attack surface for a Kubernetes cluster, and initial, limited upgrading support from v1.6 clusters.

Easier upgrades, extensibility, and stabilization in v1.8

We had eight weeks between Kubernetes v1.7.0 and our stabilization period (code freeze) to implement new features and to stabilize the upcoming v1.8.0 release. Our goal for kubeadm v1.8.0 was to make it more extensible. We wanted to add a lot of new features and improvements in this cycle, and we succeeded.Upgrades along with better introspectability. The most important update in kubeadm v1.8.0 (and my favorite new feature) is one-command upgrades of the control plane. While v1.7.0 had the ability to upgrade clusters, the user experience was far from optimal, and the process was risky.

Now, you can easily check to see if your system can handle an upgrade by entering:

$ kubeadm upgrade plan

This gives you information about which versions you can upgrade to, as well as the health of your cluster.

You can examine the effects an upgrade will have on your system by specifying the --dry-run flag. In previous versions of kubeadm, upgrades were essentially blind in that you could only make assumptions about how an upgrade would impact your cluster. With the new dry run feature, there is no more mystery. You can see exactly what applying an upgrade would do before applying it.

After checking to see how an upgrade will affect your cluster, you can apply the upgrade by typing:

$ kubeadm upgrade apply v1.8.0

This is a much cleaner and safer way of performing an upgrade than the previous version. As with any type of upgrade or downgrade, it’s a good idea to backup your cluster first using your preferred solution.

Self-hosting

Self-hosting in this context refers to a specific way of setting up the control plane. The self-hosting concept was initially developed by CoreOS in their bootkube project. The long-term goal is to move this functionality (currently in an alpha stage) to the generic kubeadm toolbox. Self-hosting means that the control plane components, the API Server, Controller Manager and Scheduler are workloads themselves in the cluster they run. This means the control plane components can be managed using Kubernetes primitives, which has numerous advantages. For instance, leader-elected components like the scheduler and controller-manager will automatically be run on all masters when HA is implemented if they are run in a DaemonSet. Rolling upgrades in Kubernetes can be used for upgrades of the control plane components, and next to no extra code has to be written for that to work; it’s one of Kubernetes’ built-in primitives!

Self-hosting won’t be the default until v1.9.0, but users can easily test the feature in experimental clusters. If you test this feature, we’d love your feedback!

You can test out self-hosting by enabling its feature gate:

$ kubeadm init --feature-gates=SelfHosting=true

Extensibility

We’ve added some new extensibility features. You can delegate some tasks, like generating certificates or writing control plane arguments to kubeadm, but still drive the control plane bootstrap process yourself. Basically, you can let kubeadm do some parts and fill in yourself where you need customizations. Previously, you could only use kubeadm init to perform “the full meal deal.” The inclusion of the kubeadm alpha phase command supports our aim to make kubeadm more modular, letting you invoke atomic sub-steps of the bootstrap process.

In v1.8.0, kubeadm alpha phase is just that: an alpha preview. We hope that we can graduate the command to beta as kubeadm phase in v1.9.0. We can’t wait for feedback from the community on how to better improve this feature!

Improvements

Along with our new kubeadm features, we’ve also made improvements to existing ones. The Bootstrap Token feature that makes `kubeadm join` so short and sweet has graduated from alpha to beta and gained even more security features.

If you made customizations to your system in v1.6 or v1.7, you had to remember what those customizations were when you upgraded your cluster. No longer: beginning with v1.8.0, kubeadm uploads your configuration to a ConfigMap inside of the cluster, and later reads that configuration when upgrading for a seamless user experience.

The first certificate rotation feature has graduated to beta in v1.8, which is great to see. Thanks to the Auth Special Interest Group, the Kubernetes node component kubelet can now rotate its client certificate automatically. We expect this area to improve continuously, and will continue to be a part of this cross-SIG effort to easily rotate all certificates in any cluster.

Last but not least, kubeadm is more resilient now. kubeadm init will detect even more faulty environments earlier, and time out instead of waiting forever for the expected condition.

The scope of kubeadm

As there are so many different end-to-end installers for Kubernetes, there is some fragmentation in the ecosystem. With each new release of Kubernetes, these installers naturally become more divergent. This can create problems down the line if users rely on installer-specific variations and hooks that aren’t standardized in any way. Our goal from the beginning has been to make kubeadm a building block for deploying Kubernetes clusters and to provide kubeadm init and kubeadm join as best-practice “fast paths” for new Kubernetes users. Ideally, using kubeadm as the basis of all deployments will make it easier to create conformant clusters.

kubeadm performs the actions necessary to get a minimum viable cluster up and running. It only cares about bootstrapping, not about provisioning machines, by design. Likewise, installing various nice-to-have addons by default like the Kubernetes Dashboard, some monitoring solution, cloud provider-specific addons, etc. is not in scope. Instead, we expect higher-level and more tailored tooling to be built on top of kubeadm, that installs the software the end user needs.

v1.9.0 and beyond

What’s in store for the future of kubeadm?

Planned features

We plan to address high availability (replicated etcd and multiple, redundant API servers and other control plane components) as an alpha feature in v1.9.0. This has been a regular request from our user base.

Also, we want to make self-hosting the default way to deploy your control plane: Kubernetes becomes much easier to manage if we can rely on Kubernetes' own tools to manage the cluster components.

Promoting kubeadm adoption and getting involved

The kubeadm adoption working group is an ongoing effort between SIG Cluster Lifecycle and other parties in the Kubernetes ecosystem. This working group focuses on making kubeadm more extensible in order to promote adoption of it for other end-to-end installers in the community. Everyone is welcome to join. So far, we’re glad to announce that kubespray started using kubeadm under the hood, and gained new features at the same time! We’re excited to see others follow and make the ecosystem stronger.

kubeadm is a great way to learn about Kubernetes: it binds all of Kubernetes’ components together in a single package. To learn more about what kubeadm really does under the hood, this document describes kubeadm functions in v1.8.0.

If you want to get involved in these efforts, join SIG Cluster Lifecycle. We meet on Zoom once a week on Tuesdays at 16:00 UTC. For more information about what we talk about in our weekly meetings, check out our meeting notes. Meetings are a great educational opportunity, even if you don’t want to jump in and present your own ideas right away. You can also sign up for our mailing list, join our Slack channel,or check out the video archive of our past meetings. Even if you’re only interested in watching the video calls initially, we’re excited to welcome you as a new member to SIG Cluster Lifecycle!

If you want to know what a kubeadm developer does at a given time in the Kubernetes release cycle, check out this doc. Finally, don’t hesitate to join if any of our upcoming projects are of interest to you!

Thank you,
Lucas Käldström
Kubernetes maintainer & SIG Cluster Lifecycle co-lead
Weaveworks contractor

↧

It Takes a Village to Raise a Kubernetes

October 26, 2017, 4:48 pm

≫ Next: Using RBAC, Generally Available in Kubernetes v1.8

≪ Previous: kubeadm v1.8 Released: Introducing Easy Upgrades for Kubernetes Clusters

Editor’s note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.8, written by Jaice Singer DuMars from Microsoft.

Each time we release a new version of Kubernetes, it’s enthralling to see how the community responds to all of the hard work that went into it. Blogs on new or enhanced capabilities crop up all over the web like wildflowers in the spring. Talks, videos, webinars, and demos are not far behind. As soon as the community seems to take this all in, we turn around and add more to the mix. It’s a thrilling time to be a part of this project, and even more so, the movement. It’s not just software anymore.

When circumstances opened the door for me to lead the 1.8 release, I signed on despite a minor case of the butterflies. In a private conversation with another community member, they assured me that “being organized, following up, and knowing when to ask for help” were the keys to being a successful lead. That’s when I knew I could do it — and so I did.

From that point forward, I was wrapped in a patchwork quilt of community that magically appeared at just the right moments. The community’s commitment and earnest passion for quality, consistency, and accountability formed a bedrock from which the release itself was chiseled.

The 1.8 release team proved incredibly cohesive despite a late start. We approached even the most difficult situations with humor, diligence, and sincere curiosity. My experience leading large teams served me well, and underscored another difference about this release: it was more valuable for me to focus on leadership than diving into the technical weeds to solve every problem.

Also, the uplifting power of emoji in Slack cannot be overestimated.

An important inflection point is underway in the Kubernetes project. If you’ve taken a ride on a “startup rollercoaster,” this is a familiar story. You come up with an idea so crazy that it might work. You build it, get traction, and slowly clickity-clack up that first big hill. The view from the top is dizzying, as you’ve poured countless hours of life into something completely unknown. Once you go over the top of that hill, everything changes. Breakneck acceleration defines or destroys what has been built.

In my experience, that zero gravity point is where everyone in the company (or in this case, project) has to get serious about not only building something, but also maintaining it. Without a commitment to maintenance, things go awry really quickly. From codebases that resemble the Winchester Mystery House to epidemics of crashing production implementations, a fiery descent into chaos can happen quickly despite the outward appearance of success. Thankfully, the Kubernetes community seems to be riding our growth rollercoaster with increasing success at each release.

As software startups mature, there is a natural evolution reflected in the increasing distribution of labor. Explosive adoption means that full-time security, operations, quality, documentation, and project management staff become necessary to deliver stability, reliability, and extensibility. Also, you know things are getting serious when intentional architecture becomes necessary to ensure consistency over time.

Kubernetes has followed a similar path. In the absence of company departments or skill-specific teams, Special Interest Groups (SIGs) have organically formed around core project needs like storage, networking, API machinery, applications, and the operational lifecycle. As SIGs have proliferated, the Kubernetes governance model has crystallized around them, providing a framework for code ownership and shared responsibility. SIGs also help ensure the community is sustainable because success is often more about people than code.

At the Kubernetes leadership summit in June, a proposed SIG architecture was ratified with a unanimous vote, underscoring a stability theme that seemed to permeate every conversation in one way or another. The days of filling in major functionality gaps appear to be over, and a new era of feature depth has emerged in its place.

Another change is the move away from project-level release “feature themes” to SIG-level initiatives delivered in increments over the course of several releases. That’s an important shift: SIGs have a mission, and everything they deliver should ultimately serve that. As a community, we need to provide facilitation and support so SIGs are empowered to do their best work with minimal overhead and maximum transparency.

Wisely, the community also spotted the opportunity to provide safe mechanisms for innovation that are increasingly less dependent on the code in kubernetes/kubernetes. This in turn creates a flourishing habitat for experimentation without hampering overall velocity. The project can also address technical debt created during the initial ride up the rollercoaster. However, new mechanisms for innovation present an architectural challenge in defining what is and is not Kubernetes. SIG Architecture addresses the challenge of defining Kubernetes’ boundaries. It’s a work in progress that trends toward continuous improvement.

This can be a little overwhelming at the individual level. In reality, it’s not that much different from any other successful startup, save for the fact that authority does not come from a traditional org chart. It comes from SIGs, community technical leaders, the newly-formed steering committee, and ultimately you.

The Kubernetes release process provides a special opportunity to see everything that makes this project tick. I’ll tell you what I saw: people, working together, to do the best they can, in service to everyone who sets out on the cloud native journey.

↧

Using RBAC, Generally Available in Kubernetes v1.8

October 27, 2017, 8:25 pm

≫ Next: Enforcing Network Policies in Kubernetes

≪ Previous: It Takes a Village to Raise a Kubernetes

Editor's note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.8. Today’s post comes from Eric Chiang, software engineer, CoreOS, and SIG-Auth co-lead.

Kubernetes 1.8 represents a significant milestone for the role-based access control (RBAC) authorizer, which was promoted to GA in this release. RBAC is a mechanism for controlling access to the Kubernetes API, and since its beta in 1.6, many Kubernetes clusters and provisioning strategies have enabled it by default.

Going forward, we expect to see RBAC become a fundamental building block for securing Kubernetes clusters. This post explores using RBAC to manage user and application access to the Kubernetes API.

Granting access to users

RBAC is configured using standard Kubernetes resources. Users can be bound to a set of roles (ClusterRoles and Roles) through bindings (ClusterRoleBindings and RoleBindings). Users start with no permissions and must explicitly be granted access by an administrator.

All Kubernetes clusters install a default set of ClusterRoles, representing common buckets users can be placed in. The “edit” role lets users perform basic actions like deploying pods; “view” lets a user observe non-sensitive resources; “admin” allows a user to administer a namespace; and “cluster-admin” grants access to administer a cluster.

$ kubectl get clusterroles

NAME AGE

admin 40m

cluster-admin 40m

edit 40m

# ...

view 40m

ClusterRoleBindings grant a user, group, or service account a ClusterRole’s power across the entire cluster. Using kubectl, we can let a sample user “jane” perform basic actions in all namespaces by binding her to the “edit” ClusterRole:

$ kubectl create clusterrolebinding jane --clusterrole=edit --user=jane

$ kubectl get namespaces --as=jane

NAME STATUS AGE

default Active 43m

kube-public Active 43m

kube-system Active 43m

$ kubectl auth can-i create deployments --namespace=dev --as=jane

yes

RoleBindings grant a ClusterRole’s power within a namespace, allowing administrators to manage a central list of ClusterRoles that are reused throughout the cluster. For example, as new resources are added to Kubernetes, the default ClusterRoles are updated to automatically grant the correct permissions to RoleBinding subjects within their namespace.

Next we’ll let the group “infra” modify resources in the “dev” namespace:

$ kubectl create rolebinding infra --clusterrole=edit --group=infra --namespace=dev

rolebinding "infra" created

Because we used a RoleBinding, these powers only apply within the RoleBinding’s namespace. In our case, a user in the “infra” group can view resources in the “dev” namespace but not in “prod”:

$ kubectl get deployments --as=dave --as-group=infra --namespace dev

No resources found.

$ kubectl get deployments --as=dave --as-group=infra --namespace prod

Error from server (Forbidden): deployments.extensions is forbidden: User "dave" cannot list deployments.extensions in the namespace "prod".

Creating custom roles

When the default ClusterRoles aren’t enough, it’s possible to create new roles that define a custom set of permissions. Since ClusterRoles are just regular API resources, they can be expressed as YAML or JSON manifests and applied using kubectl.

Each ClusterRole holds a list of permissions specifying “rules.” Rules are purely additive and allow specific HTTP verb to be performed on a set of resource. For example, the following ClusterRole holds the permissions to perform any action on "deployments”, “configmaps,” or “secrets”, and to view any “pod”:

kind: ClusterRole

apiVersion: rbac.authorization.k8s.io/v1

metadata:

rules:

- apiGroups: ["apps"]

resources: ["deployments"]

verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]

- apiGroups: [""] # "" indicates the core API group

resources: ["configmaps", "secrets"]

verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]

- apiGroups: [""] # "" indicates the core API group

resources: ["pods"]

verbs: ["get", "list", "watch"]

Verbs correspond to the HTTP verb of the request, while the resource and API groups refer to the the resource being referenced. Consider the following Ingress resource:

apiVersion: extensions/v1beta1

kind: Ingress

metadata:

spec:

backend:

serviceName: testsvc

servicePort: 80

To POST the resource, the user would need the following permissions:

rules:

- apiGroups: ["extensions"] # "apiVersion" without version

resources: ["ingresses"] # Plural of "kind"

verbs: ["create"] # "POST" maps to "create"

Roles for applications

When deploying containers that require access to the Kubernetes API, it’s good practice to ship an RBAC Role with your application manifests. Besides ensuring your app works on RBAC enabled clusters, this helps users audit what actions your app will perform on the cluster and consider their security implications.

A namespaced Role is usually more appropriate for an application, since apps are traditionally run inside a single namespace and the namespace's resources should be tied to the lifecycle of the app. However, Roles cannot grant access to non-namespaced resources (such as nodes) or across namespaces, so some apps may still require ClusterRoles.

The following Role allows a Prometheus instance to monitor and discover services, endpoints, and pods in the “dev” namespace:

kind: Role

metadata:

namespace: dev

rules:

- apiGroups: [""] # "" refers to the core API group

Resources: ["services", "endpoints", "pods"]

verbs: ["get", "list", "watch"]

Containers running in a Kubernetes cluster receive service account credentials to talk to the Kubernetes API, and service accounts can be targeted by a RoleBinding. Pods normally run with the “default” service account, but it’s good practice to run each app with a unique service account so RoleBindings don’t unintentionally grant permissions to other apps.

To run a pod with a custom service account, create a ServiceAccount resource in the same namespace and specify the `serviceAccountName` field of the manifest.

apiVersion: apps/v1beta2 # Abbreviated, not a full manifest

kind: Deployment

metadata:

namespace: dev

spec:

replicas: 1

template:

spec:

containers:

- name: prometheus

image: prom/prometheus:v1.8.0

command: ["prometheus", "-config.file=/etc/prom/config.yml"]

# Run this pod using the "prometheus-sa" service account.

serviceAccountName: prometheus-sa

---

apiVersion: v1

kind: ServiceAccount

metadata:

namespace: dev

Get involved

Development of RBAC is a community effort organized through the Auth Special Interest Group, one of the many SIGs responsible for maintaining Kubernetes. A great way to get involved in the Kubernetes community is to join a SIG that aligns with your interests, provide feedback, and help with the roadmap.

About the author

Eric Chiang is a software engineer and technical lead of Kubernetes development at CoreOS, the creator of Tectonic, the enterprise-ready Kubernetes platform. Eric co-leads Kubernetes SIG Auth and maintains several open source projects and libraries on behalf of CoreOS.

↧

Enforcing Network Policies in Kubernetes

October 30, 2017, 11:56 am

≫ Next: Kubernetes the Easy Way

≪ Previous: Using RBAC, Generally Available in Kubernetes v1.8

Editor's note: this post is part of a series of in-depth articles on what's new in Kubernetes 1.8. Today’s post comes from Ahmet Alp Balkan, Software Engineer, Google.

Kubernetes now offers functionality to enforce rules about which pods can communicate with each other using network policies. This feature is has become stable Kubernetes 1.7 and is ready to use with supported networking plugins. The Kubernetes 1.8 release has added better capabilities to this feature.

Network policy: What does it mean?

In a Kubernetes cluster configured with default settings, all pods can discover and communicate with each other without any restrictions. The new Kubernetes object type NetworkPolicy lets you allow and block traffic to pods.

If you’re running multiple applications in a Kubernetes cluster or sharing a cluster among multiple teams, it’s a security best practice to create firewalls that permit pods to talk to each other while blocking other network traffic. Networking policy corresponds to the Security Groups concepts in the Virtual Machines world.

How do I add Network Policy to my cluster?

Networking Policies are implemented by networking plugins. These plugins typically install an overlay network in your cluster to enforce the Network Policies configured. A number of networking plugins, including Calico, Romana and Weave Net, support using Network Policies.

Google Container Engine (GKE) also provides beta support for Network Policies using the Calico networking plugin when you create clusters with the following command:

gcloud beta container clusters create --enable-network-policy

How do I configure a Network Policy?

Once you install a networking plugin that implements Network Policies, you need to create a Kubernetes resource of type NetworkPolicy. This object describes two set of label-based pod selector fields, matching:

a set of pods the network policy applies to (required)
a set of pods allowed access to each other (optional). If you omit this field, it matches to no pods; therefore, no pods are allowed. If you specify an empty pod selector, it matches to all pods; therefore, all pods are allowed.

Example: restricting traffic to a pod

The following example of a network policy blocks all in-cluster traffic to a set of web server pods, except the pods allowed by the policy configuration.

To achieve this setup, create a NetworkPolicy with the following manifest:

kind: NetworkPolicy

apiVersion: networking.k8s.io/v1

metadata:

spec:

podSelector:

matchLabels:

app: nginx

ingress:

- from:

- podSelector:

matchLabels:

app: foo

Once you apply this configuration, only pods with label app: foo can talk to the pods with the label app: nginx. For a more detailed tutorial, see the Kubernetes documentation.

Example: restricting traffic between all pods by default

If you specify the spec.podSelector field as empty, the set of pods the network policy matches to all pods in the namespace, blocking all traffic between pods by default. In this case, you must explicitly create network policies whitelisting all communication between the pods.

You can enable a policy like this by applying the following manifest in your Kubernetes cluster:

apiVersion: networking.k8s.io/v1

kind: NetworkPolicy

metadata:

spec:

podSelector:

Other Network Policy features

In addition to the previous examples, you can make the Network Policy API enforce more complicated rules:

Egress network policies: Introduced in Kubernetes 1.8, you can restrict your workloads from establishing connections to resources outside specified IP ranges.
IP blocks support: In addition to using podSelector/namespaceSelector, you can specify IP ranges with CIDR blocks to allow/deny traffic in ingress or egress rules.
Cross-namespace policies: Using the ingress.namespaceSelector field, you can enforce Network Policies for particular or for all namespaces in the cluster. For example, you can create privileged/system namespaces that can communicate with pods even though the default policy is to block traffic.
Restricting traffic to port numbers: Using the ingress.ports field, you can specify port numbers for the policy to enforce. If you omit this field, the policy matches all ports by default. For example, you can use this to allow a monitoring pod to query only the monitoring port number of an application.
Multiple ingress rules on a single policy: Because spec.ingress field is an array, you can use the same NetworkPolicy object to give access to different ports using different pod selectors. For example, a NetworkPolicy can have one ingress rule giving pods with the kind: monitoring label access to port 9000, and another ingress rule for the label app: foo giving access to port 80, without creating an additional NetworkPolicy resource.

Learn more

Read more: Networking Policy documentation
Read more: Unofficial Network Policy Guide
Hands-on: Declare a Network Policy
Try: Network Policy Recipes

↧

Kubernetes the Easy Way

November 1, 2017, 3:02 pm

≫ Next: Containerd Brings More Container Runtime Options for Kubernetes

≪ Previous: Enforcing Network Policies in Kubernetes

Editor's note: Today's post is by Dan Garfield, VP of Marketing at Codefresh, on how to set up and easily deploy a Kubernetes cluster.

Kelsey Hightower wrote an invaluable guide for Kubernetes called Kubernetes the Hard Way. It’s an awesome resource for those looking to understand the ins and outs of Kubernetes—but what if you want to put Kubernetes on easy mode? That’s something we’ve been working on together with Google Cloud. In this guide, we’ll show you how to get a cluster up and running, as well as how to actually deploy your code to that cluster and run it.

This is Kubernetes the easy way.

What We’ll Accomplish

Set up a cluster
Deploy an application to the cluster
Automate deployment with rolling updates

Prerequisites

A containerized application

You can also use a demo app.

A Google Cloud Account or a Kubernetes cluster on another provider

Everything after Cluster creation is identical with all providers.

A free account on Codefresh

Codefresh is a service that handles Kubernetes deployment configuration and automation.

We made Codefresh free for open-source projects and offer 200 builds/mo free for private projects, to make adopting Kubernetes as easy as possible. Deploy as much as you like on as many clusters as you like.

Set Up a Cluster

1. Create an account at cloud.google.com and log in.

Note: If you’re using a Cluster outside of Google Cloud, you can skip this step.

Google Container Engine is Google Cloud’s managed Kubernetes service. In our testing, it’s both powerful and easy to use.

If you’re new to the platform, you can get a $500 credit at the end of this process.

2. Open the menu and scroll down to Container Engine. Then select Container Clusters.

3. Click Create cluster.

We’re done with step 1. In my experience it usually takes less than 5 minutes for a cluster to be created.

Deploy an Application to Kubernetes

First go to Codefresh and create an account using Github, Bitbucket, or Gitlab. As mentioned previously, Codefresh is free for both open source and smaller private projects for open-source projects. We’ll use it to create the configuration Yaml necessary to deploy our application to Kubernetes. Then we'll deploy our application and automate the process to happen every time we commit code changes. Here are the steps:

Create a Codefresh account
Connect to Google Cloud (or other cluster)
Add Cluster
Deploy static image
Build and deploy an image
Automate the process

Connect to Google Cloud

To connect your Clusters in Google Container Engine, go to Account Settings > Integrations > Kubernetes and click Authenticate. This prompts you to login with your Google credentials.

Once you log in, all of your clusters are available within Codefresh.

Add Cluster

To add your cluster, click the down arrow, and then click add cluster, select the project and cluster name. You can now deploy images!

Optional: Use an Alternative Cluster

To connect a non-GKE cluster we’ll need to add a token and certificate to Codefresh. Go to Account Settings (bottom left) > Integrations > Kubernetes > Configure > Add Provider > Custom Providers. Expand the dropdown and click Add Cluster.

Follow the instructions on how to generate the needed information and click Save. Your cluster now appears under the Kubernetes tab.

Deploy Static Image to Kubernetes

Now for the fun part! Codefresh provides an easily modifiable boilerplate that takes care of the heavy lifting of configuring Kubernetes for your application.

1. Click on the Kubernetes tab: this shows a list of namespaces.

Think of namespaces as acting a bit like VLANs on a Kubernetes cluster. Each namespace can contain all the services that need to talk to each other on a Kubernetes cluster. For now, we’ll just work off the default namespace (the easy way!).

2. Click Add Service and fill in the details.

You can use the demo application I mentioned earlier that has a Node.js frontend with a MongoDB.

Here’s the info we need to pass:

Cluster - This is the cluster we added earlier, our application will be deployed there.
Namespace - We’ll use default for our namespace but you can create and use a new one if you’d prefer. Namespaces are discrete units for grouping all the services associated with an application.
Service name - You can name the service whatever you like. Since we’re deploying Mongo, I’ll just name it mongo!
Expose port - We don’t need to expose the port outside of our cluster so we won’t check the box for now but we will specify a port where other containers can talk to this service. Mongo’s default port is ‘27017’.
Image - Mongo is a public image on Dockerhub, so I can reference it by name and tag, ‘mongo:latest’.
Internal Ports - This is the port the mongo application listens on, in this case it’s ‘27017’ again.

We can ignore the other options for now.

3. Scroll down and click Deploy.

Boom! You’ve just deployed this image to Kubernetes. You can see by clicking on the status that the service, deployment, replicas, and pods are all configured and running. If you click Edit > Advanced, you can see and edit all the raw YAML files associated with this application, or copy them and put them into your repository for use on any cluster.

Build and Deploy an Image

To get the rest of our demo application up and running we need to build and deploy the Node.js portion of the application. To do that we’ll need to add our repository to Codefresh.

1. Click on Repositories > Add Repository, then copy and paste the demochat repo url (or use your own repo).

We have the option to use a dockerfile, or to use a template if we need help creating a dockerfile. In this case, the demochat repo already has a dockerfile so we’ll select that. Click through the next few screens until the image builds.

Once the build is finished the image is automatically saved inside of the Codefresh docker registry. You can also add any other registry to your account and use that instead.

To deploy the image we’ll need

a pull secret
the image name and registry
the ports that will be used

Creating the Pull Secret

The pull secret is a token that the Kubernetes cluster can use to access a private Docker registry. To create one, we’ll need to generate the token and save it to Codefresh.

1. Click on User Settings (bottom left) and generate a new token.

2. Copy the token to your clipboard.

3. Go to Account Settings > Integrations > Docker Registry > Add Registry and select Codefresh Registry. Paste in your token and enter your username (entry is case sensitive). Your username must match your name displayed at the bottom left of the screen.

4. Test and save it.

We’ll now be able to create our secret later on when we deploy our image.

Get the image name

1. Click on Images and open the image you just built. Under Comment you’ll see the image name starting with r.cfcr.io.

2. Copy the image name; we’ll need to paste it in later.

Deploy the private image to Kubernetes

We’re now ready to deploy the image we built.

1. Go to the Kubernetes page and, like we did with mongo, click Add Service and fill out the page. Make sure to select the same namespace you used to deploy mongo earlier.

Now let’s expose the port so we can access this application. This provisions an IP address and automatically configures ingress.

2. Click Deploy: your application will be up and running within a few seconds! The IP address may take longer to provision depending on your cluster location.

From this view you can scale the replicas, see application status, and similar tasks.

3. Click on the IP address to view the running application.

At this point you should have your entire application up and running! Not so bad huh? Now to automate deployment!

Automate Deployment to Kubernetes

Every time we make a change to our application, we want to build a new image and deploy it to our cluster. We’ve already set up automated builds, but to automate deployment:

1. Click on Repositories (top left).

2. Click on the pipeline for the demochat repo (the gear icon).

3. It’s a good idea to run some tests before deploying. Under Build and Unit Test, add npm test for the unit test script.

4. Click Deploy Script and select Kubernetes (Beta). Enter the information for the service you’ve already deployed.

You can see the option to use a deployment file from your repo, or to use the deployment file that you just generated.

5. Click Save.

You’re done with deployment automation! Now whenever a change is made, the image will build, test, and deploy.

Conclusions

We want to make it easy for every team, not just big enterprise teams, to adopt Kubernetes while preserving all of Kubernetes’ power and flexibility. At any point on the Kubernetes service screen you can switch to YAML to view all of the YAMLfiles generated by the configuration you performed in this walkthrough. You can tweak the file content, copy and paste them into local files, etc.

This walkthrough gives everyone a solid base to start with. When you’re ready, you can tweak the entities directly to specify the exact configuration you’d like.

We’d love your feedback! Please share with us on Twitter, or reach out directly.

Addendums

Do you have a video to walk me through this? You bet.

Does this work with Helm Charts? Yes! We’re currently piloting Helm Charts with a limited set of users. Ping us if you’d like to try it early.

Does this work with any Kubernetes cluster? It should work with any Kubernetes cluster and is tested for Kubernetes 1.5 forward.

Can I deploy Codefresh in my own data center? Sure, Codefresh is built on top of Kubernetes using Helm Charts. Codefresh cloud is free for open source, and 200 builds/mo. Codefresh on prem is currently for enterprise users only.

Won’t the database be wiped every time we update? Yes, in this case we skipped creating a persistent volume. It’s a bit more work to get the persistent volume configured, if you’d like, feel free to reach out and we’re happy to help!

↧

Containerd Brings More Container Runtime Options for Kubernetes

November 2, 2017, 12:12 pm

≫ Next: Securing Software Supply Chain with Grafeas

≪ Previous: Kubernetes the Easy Way

Editor's note: Today's post is by Lantao Liu, Software Engineer at Google, and Mike Brown, Open Source Developer Advocate at IBM.

A container runtime is software that executes containers and manages container images on a node. Today, the most widely known container runtime is Docker, but there are other container runtimes in the ecosystem, such as rkt, containerd, and lxd. Docker is by far the most common container runtime used in production Kubernetes environments, but Docker’s smaller offspring, containerd, may prove to be a better option. This post describes using containerd with Kubernetes.

Kubernetes 1.5 introduced an internal plugin API named Container Runtime Interface (CRI) to provide easy access to different container runtimes. CRI enables Kubernetes to use a variety of container runtimes without the need to recompile. In theory, Kubernetes could use any container runtime that implements CRI to manage pods, containers and container images.

Over the past 6 months, engineers from Google, Docker, IBM, ZTE, and ZJU have worked to implement CRI for containerd. The project is called cri-containerd, which had its feature complete v1.0.0-alpha.0 release on September 25, 2017. With cri-containerd, users can run Kubernetes clusters using containerd as the underlying runtime without Docker installed.

containerd

Containerd is an OCI compliant core container runtime designed to be embedded into larger systems. It provides the minimum set of functionality to execute containers and manages images on a node. It was initiated by Docker Inc. and donated to CNCF in March of 2017. The Docker engine itself is built on top of earlier versions of containerd, and will soon be updated to the newest version. Containerd is close to a feature complete stable release, with 1.0.0-beta.1 available right now.

Containerd has a much smaller scope than Docker, provides a golang client API, and is more focused on being embeddable.The smaller scope results in a smaller codebase that’s easier to maintain and support over time, matching Kubernetes requirements as shown in the following table:

	Containerd Scope (In/Out)	Kubernetes Requirement
Container Lifecycle Management	In	Container Create/Start/Stop/Delete/List/Inspect (✔️)
Image Management	In	Pull/List/Inspect (✔️)
Networking	Out No concrete network solution. User can setup network namespace and put containers into it.	Kubernetes networking deals with pods, rather than containers, so container runtimes should not provide complex networking solutions that don't satisfy requirements. (✔️)
Volumes	Out No volume management. User can setup host path, and mount it into container.	Kubernetes manages volumes. Container runtimes should not provide internal volume management that may conflict with Kubernetes. (✔️)
Persistent Container Logging	Out No persistent container log. Container STDIO is provided as FIFOs, which can be redirected/decorated as is required.	Kubernetes has specific requirements for persistent container logs, such as format and path etc. Container runtimes should not persist an unmanageable container log. (✔️)
Metrics	In Containerd provides container and snapshot metrics as part of the API.	Kubernetes expects container runtime to provide container metrics (CPU, Memory, writable layer size, etc.) and image filesystem usage (disk, inode usage, etc.). (✔️)

Overall, from a technical perspective, containerd is a very good alternative container runtime for Kubernetes.

cri-containerd

Cri-containerd is exactly that: an implementation of CRI for containerd. It operates on the same node as the Kubelet and containerd. Layered between Kubernetes and containerd, cri-containerd handles all CRI service requests from the Kubelet and uses containerd to manage containers and container images. Cri-containerd manages these service requests in part by forming containerd service requests while adding sufficient additional function to support the CRI requirements.

Compared with the current Docker CRI implementation (dockershim), cri-containerd eliminates an extra hop in the stack, making the stack more stable and efficient.

Architecture

Cri-containerd uses containerd to manage the full container lifecycle and all container images. As also shown below, cri-containerd manages pod networking via CNI (another CNCF project).

Let’s use an example to demonstrate how cri-containerd works for the case when Kubelet creates a single-container pod:

Kubelet calls cri-containerd, via the CRI runtime service API, to create a pod;
cri-containerd uses containerd to create and start a special pause container (the sandbox container) and put that container inside the pod’s cgroups and namespace (steps omitted for brevity);
cri-containerd configures the pod’s network namespace using CNI;
Kubelet subsequently calls cri-containerd, via the CRI image service API, to pull the application container image;
cri-containerd further uses containerd to pull the image if the image is not present on the node;
Kubelet then calls cri-containerd, via the CRI runtime service API, to create and start the application container inside the pod using the pulled container image;
cri-containerd finally calls containerd to create the application container, put it inside the pod’s cgroups and namespace, then to start the pod’s new application container.

After these steps, a pod and its corresponding application container is created and running.

Status

Cri-containerd v1.0.0-alpha.0 was released on Sep. 25, 2017.

It is feature complete. All Kubernetes features are supported.

All CRI validation tests have passed. (A CRI validation is a test framework for validating whether a CRI implementation meets all the requirements expected by Kubernetes.)

All regular node e2e tests have passed. (The Kubernetes test framework for testing Kubernetes node level functionalities such as managing pods, mounting volumes etc.)

To learn more about the v1.0.0-alpha.0 release, see the project repository.

Try it Out

For a multi-node cluster installer and bring up steps using ansible and kubeadm, see this repo link.

For creating a cluster from scratch on Google Cloud, see Kubernetes the Hard Way.

For a custom installation from release tarball, see this repo link.

For a installation with LinuxKit on a local VM, see this repo link.

Next Steps

We are focused on stability and usability improvements as our next steps.

Stability:

Set up a full set of Kubernetes integration test in the Kubernetes test infrastructure on various OS distros such as Ubuntu, COS (Container-Optimized OS) etc.
Actively fix any test failures and other issues reported by users.

Usability:

Improve the user experience of crictl. Crictl is a portable command line tool for all CRI container runtimes. The goal here is to make it easy to use for debug and development scenarios.
Integrate cri-containerd with kube-up.sh, to help users bring up a production quality Kubernetes cluster using cri-containerd and containerd.
Improve our documentation for users and admins alike.

We plan to release our v1.0.0-beta.0 by the end of 2017.

Contribute

Cri-containerd is a Kubernetes incubator project located at https://github.com/kubernetes-incubator/cri-containerd. Any contributions in terms of ideas, issues, and/or fixes are welcome. The getting started guide for developers is a good place to start for contributors.

Community

Cri-containerd is developed and maintained by the Kubernetes SIG-Node community. We’d love to hear feedback from you. To join the community:

sig-node community site
Slack: #sig-node channel in Kubernetes (kubernetes.slack.com)
Mailing List: https://groups.google.com/forum/#!forum/kubernetes-sig-node

↧

Securing Software Supply Chain with Grafeas

November 3, 2017, 7:00 am

≫ Next: Kubernetes is Still Hard (for Developers)

≪ Previous: Containerd Brings More Container Runtime Options for Kubernetes

Editor's note: This post is written by Kelsey Hightower, Staff Developer Advocate at Google, and Sandra Guo, Product Manager at Google.

Kubernetes has evolved to support increasingly complex classes of applications, enabling the development of two major industry trends: hybrid cloud and microservices. With increasing complexity in production environments, customers—especially enterprises—are demanding better ways to manage their software supply chain with more centralized visibility and control over production deployments.

On October 12th, Google and partners announced Grafeas, an open source initiative to define a best practice for auditing and governing the modern software supply chain. With Grafeas (“scribe” in Greek), developers can plug in components of the CI/CD pipeline into a central source of truth for tracking and enforcing policies. Google is also working on Kritis (“judge” in Greek), allowing devOps teams to enforce deploy-time image policy using metadata and attestations stored in Grafeas.

Grafeas allows build, auditing and compliance tools to exchange comprehensive metadata on container images using a central API. This allows enforcing policies that provide central control over the software supply process.

Example application: PaymentProcessor

Let’s consider a simple application, PaymentProcessor, that retrieves, processes and updates payment info stored in a database. This application is made up of two containers: a standard ruby container and custom logic.

Due to the sensitive nature of the payment data, the developers and DevOps team really want to make sure that the code meets certain security and compliance requirements, with detailed records on the provenance of this code. There are CI/CD stages that validate the quality of the PaymentProcessor release, but there is no easy way to centrally view/manage this information:

Visibility and governance over the PaymentProcessor Code

Grafeas provides an API for customers to centrally manage metadata created by various CI/CD components and enables deploy time policy enforcement through a Kritis implementation.

Let’s consider a basic example of how Grafeas can provide deploy time control for the PaymentProcessor app using a demo verification pipeline.

Assume that a PaymentProcessor container image has been created and pushed to Google Container Registry. This example uses the gcr.io/exampleApp/PaymentProcessor container for testing. You as the QA engineer want to create an attestation certifying this image for production usage. Instead of trusting an image tag like 0.0.1, which can be reused and point to a different container image later, we can trust the image digest to ensure the attestation links to the full image contents.

1. Set up the environment

Generate a signing key:

gpg --quick-generate-key --yes qa_bob@example.com

Export the image signer's public key:

gpg --armor --export image.signer@example.com > ${GPG_KEY_ID}.pub

Create the ‘qa’ AttestationAuthority note via the Grafeas API:

curl -X POST \
"http://127.0.0.1:8080/v1alpha1/projects/image-signing/notes?noteId=qa" \
-d @note.json

Create the Kubernetes ConfigMap for admissions control and store the QA signer's public key:

kubectl create configmap image-signature-webhook \
--from-file ${GPG_KEY_ID}.pub

kubectl get configmap image-signature-webhook -o yaml

Set up an admissions control webhook to require QA signature during deployment.

kubectl apply -f kubernetes/image-signature-webhook.yaml

2. Attempt to deploy an image without QA attestation

Attempt to run the image in paymentProcessor.ymal before it is QA attested:

kubectl apply -f pods/nginx.yaml

apiVersion: v1

kind: Pod

metadata:

spec:

containers:

- name: payment

image: "gcr.io/hightowerlabs/payment@sha256:aba48d60ba4410ec921f9d2e8169236c57660d121f9430dc9758d754eec8f887"

Create the paymentProcessor pod:

kubectl apply -f pods/paymentProcessor.yaml

Notice the paymentProcessor pod was not created and the following error was returned:

The "" is invalid: : No matched signatures for container image: gcr.io/hightowerlabs/payment@sha256:aba48d60ba4410ec921f9d2e8169236c57660d121f9430dc9758d754eec8f887

3. Create an image signature

Assume the image digest is stored in Image-digest.txt, sign the image digest:

gpg -u qa_bob@example.com \
--armor \
--clearsign \
--output=signature.gpg \
Image-digest.txt

4. Upload the signature to the Grafeas API

Generate a pgpSignedAttestation occurrence from the signature :

cat > occurrence.json <<EOF
{
"resourceUrl": "$(cat image-digest.txt)",
"noteName": "projects/image-signing/notes/qa",
"attestation": {
   "pgpSignedAttestation": {
      "signature": "$(cat signature.gpg)",
      "contentType": "application/vnd.gcr.image.url.v1",
      "pgpKeyId": "${GPG_KEY_ID}"
   }
}
}
EOF

Upload the attestation through the Grafeas API:

curl -X POST \
'http://127.0.0.1:8080/v1alpha1/projects/image-signing/occurrences' \
-d @occurrence.json

5. Verify QA attestation during a production deployment

Attempt to run the image in paymentProcessor.ymal now that it has the correct attestation in the Grafeas API:

kubectl apply -f pods/paymentProcessor.yaml

pod "PaymentProcessor" created

With the attestation added the pod will be created as the execution criteria are met.

For more detailed information, see this Grafeas tutorial.

Summary

The demo above showed how you can integrate your software supply chain with Grafeas and gain visibility and control over your production deployments. However, the demo verification pipeline by itself is not a full Kritis implementation. In addition to basic admission control, Kritis provides additional support for workflow enforcement, multi-authority signing, breakglass deployment and more. You can read the Kritis whitepaper for more details. The team is actively working on a full open-source implementation. We’d love your feedback!

In addition, a hosted alpha implementation of Kritis, called Binary Authorization, is available on Google Container Engine and will be available for broader consumption soon.

Google, JFrog, and other partners joined forces to create Grafeas based on our common experiences building secure, large, and complex microservice deployments for internal and enterprise customers. Grafeas is an industry-wide community effort.

To learn more about Grafeas and contribute to the project:

Register for the JFrog-Google webinar [here]
Try Grafeas now and join the GitHub project: https://github.com/grafeas
Try out the Grafeas demo and tutorial: https://github.com/kelseyhightower/grafeas-tutorial
Attend Shopify’s talks at KubeCon in December
Fill out [this form] if you’re interested in learning more about our upcoming releases or talking to us about integrations
See grafeas.io for documentation and examples

We hope you join us!
The Grafeas Team

↧

Kubernetes is Still Hard (for Developers)

November 15, 2017, 2:02 pm

≫ Next: Certified Kubernetes Conformance Program: Launch Celebration Round Up

≪ Previous: Securing Software Supply Chain with Grafeas

Kubernetes has made the Ops experience much easier, but how does the developer experience compare? Ops teams can deploy a Kubernetes cluster in a matter of minutes. But developers need to understand a host of new concepts before beginning to work with Kubernetes. This can be a tedious and manual process, but it doesn’t have to be. In this talk, Michelle Noorali, co-lead of SIG-Apps, reimagines the Kubernetes developer experience. She shares her top 3 tips for building a successful developer experience including:

A framework for thinking about cloud native applications
An integrated experience for debugging and fine-tuning cloud native applicationsA way to get a cloud native application out the door quickly

Interested in learning how far the Kubernetes developer experience has come? Join us at KubeCon in Austin on December 6-8. Register Now >>

Check out Michelle’s keynote to learn about exciting new updates from CNCF projects.

↧

Certified Kubernetes Conformance Program: Launch Celebration Round Up

November 16, 2017, 9:00 am

≫ Next: Autoscaling in Kubernetes

≪ Previous: Kubernetes is Still Hard (for Developers)

This week the CNCF^Ⓡcertified the first group of Kubernetes^Ⓡofferings under the Certified Kubernetes Conformance Program. These first certifications follow a beta phase during which we invited participants to submit conformance results. The community response was overwhelming: CNCF certified offerings from 32 vendors!

The new Certified Kubernetes Conformance Program gives enterprise organizations the confidence that workloads running on any Certified Kubernetes distribution or platform will work correctly on other Certified Kubernetes distributions or platforms. A Certified Kubernetes product guarantees that the complete Kubernetes API functions as specified, so users can rely on a seamless, stable experience.

Here’s what the world had to say about the Certified Kubernetes Conformance Program.

Press coverage:

IBM, Google, Microsoft, and 33 more partner to ensure Kubernetes workload portability, VentureBeat.
The CNCF just got 36 companies to agree to a Kubernetes certification standard, TechCrunch
CNCF Ensures Kubernetes Interoperability with a New Cert Program, The New Stack
Cloud Native launches Certified Kubernetes program, SD Times
CNCF offers Kubernetes Software Conformance Certification program RCR Wireless
New Kubernetes Certification Program Announced Virtualization Review
Key Cloud Computing Group Launches Interoperability Certification for Kubernetes GeekWire
New Kubernetes Certification Program Helps Deliver Consistency in the Cloud BetaNews
Kubernetes vendors agree on standardization, ZDNet
CNCF Launches Kubernetes Conformance Certification Program, sdxcentral

Community blog round-up:

Visit https://www.cncf.io/certification/software-conformance for more information about the Certified Kubernetes Conformance Program, and learn how you can join a growing list of Certified Kubernetes providers.

“Cloud Native Computing Foundation”, “CNCF” and “Kubernetes” are registered trademarks of The Linux Foundation in the United States and other countries. “Certified Kubernetes” and the Certified Kubernetes design are trademarks of The Linux Foundation in the United States and other countries.

↧