Kubernetes on OpenStack

May 19, 2015, 9:00 am

≫ Next: Weekly Kubernetes Community Hangout Notes - May 22 2015

≪ Previous: Weekly Kubernetes Community Hangout Notes - May 15 2015

Today, the OpenStack foundation made it even easier for you deploy and manage clusters of Docker containers on OpenStack clouds by including Kubernetes in its Community App Catalog. At a keynote today at the OpenStack Summit in Vancouver, Mark Collier, COO of the OpenStack Foundation, and Craig Peters, Mirantis product line manager, demonstrated the Community App Catalog workflow by launching a Kubernetes cluster in a matter of seconds by leveraging the compute, storage, networking and identity systems already present in an OpenStack cloud.

The entries in the catalog include not just the ability to start a Kubernetes cluster, but also a range of applications deployed in Docker containers managed by Kubernetes. These applications include:

Apache web server
Nginx web server
Crate - The Distributed Database for Docker
GlassFish - Java EE 7 Application Server
Tomcat - An open-source web server and servlet container
InfluxDB - An open-source, distributed, time series database
Grafana - Metrics dashboard for InfluxDB
Jenkins - An extensible open source continuous integration server
MariaDB database
MySql database
Redis - Key-value cache and store
PostgreSQL database
MongoDB NoSQL database
Zend Server - The Complete PHP Application Platform

This list will grow, and is curated here. You can examine (and contribute to) the YAML file that tells Murano how to install and start the Kubernetes cluster here.

The Kubernetes open source project has continued to see fantastic community adoption and increasing momentum, with over 11,000 commits and 7,648 stars on GitHub. With supporters ranging from Red Hat and Intel to CoreOS and Box.net, it has come to represent a range of customer interests ranging from enterprise IT to cutting edge startups. We encourage you to give it a try, give us your feedback, and get involved in our growing community.

- Martin Buhr, Product Manager, Kubernetes Open Source Project

↧

Weekly Kubernetes Community Hangout Notes - May 22 2015

June 2, 2015, 11:18 am

≫ Next: Cluster Level Logging with Kubernetes

≪ Previous: Kubernetes on OpenStack

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Discussion / Topics

Code Freeze
Upgrades of cluster
E2E test issues

Code Freeze process starts EOD 22-May, including

Code Slush -- draining PRs that are active. If there are issues for v1 to raise, please do so today.

Drain timeframe will be about 1-week.

Community PRs -- plan is to reopen in ~6 weeks.
Key areas for fixes in v1 -- docs, the experience.

E2E issues and LGTM process

Seen end-to-end tests go red.
Plan is to limit merging to on-call. Quinton to communicate.

Community committers, please label with LGTM and on-call will merge based on on-call’s judgement.

Can we expose Jenkins runs to community? (Paul)

Question/concern to work out is securing Jenkins. Short term conclusion: Will look at pushing Jenkins logs into GCS bucket. Lavalamp will follow up with Jeff Grafton.
Longer term solution may be a merge queue, where e2e runs for each merge (as opposed to multiple merges). This exists in Openshift today.

Cluster Upgrades for Kubernetes as final v1 feature

GCE will use Persistent Disk (PD) to mount new image.
OpenShift will follow a tradition update model, with “yum update”.
A strawman approach is to have an analog of “kube-push” to update the master, in-place. Feedback in the meeting was

Upgrading Docker daemon on the master will kill the master’s pods. Agreed. May consider an ‘upgrade’ phase or explicit step.
How is this different than HA master upgrade? See HA case as a superset. The work to do an upgrade would be a prerequisite for HA master upgrade.

Mesos scheduler implements a rolling node upgrade.

Attention requested for v1 in the Hangout

Downward plug-in #5093.

Discussed that it’s an eventually consistent design.
In the meeting, the outcome was: seeking a pattern for atomicity of update across multiple piece. Paul to ping Tim when ready to review.

Regression in e2e #8499 (Eric Paris)
Asking for review of direction, if not review. #8334 (Mark)
Handling graceful termination (e.g. sigterm to postgres) is not implemented. #2789 (Clayton)

Need is to bump up grace period or finish plumbing. In API, client tools, missing is kubelet does use and we don’t set the timeout (>0) value.
Brendan will look into this graceful term issue.

Load balancer almost ready by JustinSB.

↧

Cluster Level Logging with Kubernetes

June 11, 2015, 4:35 pm

≫ Next: Slides: Cluster Management with Kubernetes, talk given at the University of Edinburgh

≪ Previous: Weekly Kubernetes Community Hangout Notes - May 22 2015

A Kubernetes cluster will typically be humming along running many system and application pods. How does the system administrator collect, manage and query the logs of the system pods? How does a user query the logs of their application which is composed of many pods which may be restarted or automatically generated by the Kubernetes system? These questions are addressed by the Kubernetes cluster level logging services.

Cluster level logging for Kubernetes allows us to collect logs which persist beyond the lifetime of the pod’s container images or the lifetime of the pod or even cluster. In this article we assume that a Kubernetes cluster has been created with cluster level logging support for sending logs to Google Cloud Logging. This is an option when creating a Google Container Engine (GKE) cluster, and can be enabled for the open source GCE Kubernetes distribution by setting the environment variable KUBE_LOGGING_DESTINATION to gcp before creating a cluster (soon to be the default as it currently is for GKE). After a cluster has been created you will have a collection of system pods running that support monitoring, logging and DNS resolution for names of Kubernetes services:

$ kubectl get pods

NAME READY REASON RESTARTS AGE

fluentd-cloud-logging-kubernetes-minion-0f64 1/1 Running 0 32m

fluentd-cloud-logging-kubernetes-minion-27gf 1/1 Running 0 32m

fluentd-cloud-logging-kubernetes-minion-pk22 1/1 Running 0 31m

fluentd-cloud-logging-kubernetes-minion-20ej 1/1 Running 0 31m

kube-dns-v3-pk22 3/3 Running 0 32m

monitoring-heapster-v1-20ej 0/1 Running 9 32m

Here is the same information in a picture which shows how the pods might be placed on specific nodes.

Here is a close up of what is running on each node.

The first diagram shows four nodes created on a GCE cluster with the name of each VM node on a purple background. The internal and public IPs of each node are shown on gray boxes and the pods running in each node are shown in green boxes. Each pod box shows the name of the pod and the namespace it runs in, the IP address of the pod and the images which are run as part of the pod’s execution. The default namespace is used to run privileged system services rather than user applications. Here we see that every node is running a fluentd-cloud-logging pod which is collecting the log output of the containers running on the same node and sending them to Google Cloud Logging. A pod which provides a DNS service runs on one of the nodes and a pod which provides monitoring support runs on another node.

To help explain how cluster level logging works let’s start off with a synthetic log generator pod specification counter-pod.yaml:

apiVersion: v1

kind: Pod

metadata:

name: counter

spec:

containers:

- name: count

image: ubuntu:14.04

args: [bash, -c,

'for ((i = 0; ; i++)); do echo "$i: $(date)"; sleep 1; done']

This pod specification has one container which runs a bash script when the container is born. This script simply writes out the value of a counter and the date once per second and runs indefinitely. Let’s create the pod.

$ kubectl create -f counter-pod.yaml

pods/counter

We can observe the running pod:

$ kubectl get pods

NAME READY REASON RESTARTS AGE

counter 1/1 Running 0 5m

fluentd-cloud-logging-kubernetes-minion-0f64 1/1 Running 0 55m

fluentd-cloud-logging-kubernetes-minion-27gf 1/1 Running 0 55m

fluentd-cloud-logging-kubernetes-minion-pk22 1/1 Running 0 55m

fluentd-cloud-logging-kubernetes-minion-20ej 1/1 Running 0 55m

kube-dns-v3-pk22 3/3 Running 0 55m

monitoring-heapster-v1-20ej 0/1 Running 9 56m

This step may take a few minutes to download the ubuntu:14.04 image during which the pod status will be shown as Pending.

One of the nodes is now running the counter pod:

When the pod status changes to Running we can use the kubectl logs command to view the output of this counter pod.

$ kubectl logs counter

0: Tue Jun 2 21:37:31 UTC 2015

1: Tue Jun 2 21:37:32 UTC 2015

2: Tue Jun 2 21:37:33 UTC 2015

3: Tue Jun 2 21:37:34 UTC 2015

4: Tue Jun 2 21:37:35 UTC 2015

5: Tue Jun 2 21:37:36 UTC 2015

...

This command fetches the log text from the Docker log file for the image that is running in this container. We can connect to the running container and observe the running counter bash script.

$ kubectl exec -i counter bash

ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

root 1 0.0 0.0 17976 2888 ? Ss 00:02 0:00 bash -c for ((i = 0; ; i++)); do echo "$i: $(date)"; sleep 1; done

root 468 0.0 0.0 17968 2904 ? Ss 00:05 0:00 bash

root 479 0.0 0.0 4348 812 ? S 00:05 0:00 sleep 1

root 480 0.0 0.0 15572 2212 ? R 00:05 0:00 ps aux

What happens if for any reason the image in this pod is killed off and then restarted by Kubernetes? Will we still see the log lines from the previous invocation of the container followed by the log lines for the started container? Or will we lose the log lines from the original container’s execution and only see the log lines for the new container? Let’s find out. First let’s stop the currently running counter.

$ kubectl stop pod counter

pods/counter

Now let’s restart the counter.

$ kubectl create -f counter-pod.yaml

pods/counter

Let’s wait for the container to restart and get the log lines again.

$ kubectl logs counter

0: Tue Jun 2 21:51:40 UTC 2015

1: Tue Jun 2 21:51:41 UTC 2015

2: Tue Jun 2 21:51:42 UTC 2015

3: Tue Jun 2 21:51:43 UTC 2015

4: Tue Jun 2 21:51:44 UTC 2015

5: Tue Jun 2 21:51:45 UTC 2015

6: Tue Jun 2 21:51:46 UTC 2015

7: Tue Jun 2 21:51:47 UTC 2015

8: Tue Jun 2 21:51:48 UTC 2015

Oh no! We’ve lost the log lines from the first invocation of the container in this pod! Ideally, we want to preserve all the log lines from each invocation of each container in the pod. Furthermore, even if the pod is restarted we would still like to preserve all the log lines that were ever emitted by the containers in the pod. But don’t fear, this is the functionality provided by cluster level logging in Kubernetes. When a cluster is created, the standard output and standard error output of each container can be ingested using a Fluentd agent running on each node into either Google Cloud Logging or into Elasticsearch and viewed with Kibana. This blog article focuses on Google Cloud Logging.

When a Kubernetes cluster is created with logging to Google Cloud Logging enabled, the system creates a pod called fluentd-cloud-logging on each node of the cluster to collect Docker container logs. These pods were shown at the start of this blog article in the response to the first get pods command.

This log collection pod has a specification which looks something like this fluentd-gcp.yaml:

apiVersion: v1

kind: Pod

metadata:

name: fluentd-cloud-logging

spec:

containers:

- name: fluentd-cloud-logging

image: gcr.io/google_containers/fluentd-gcp:1.6

env:

- name: FLUENTD_ARGS

value: -qq

volumeMounts:

- name: containers

mountPath: /var/lib/docker/containers

volumes:

- name: containers

hostPath:

path: /var/lib/docker/containers

This pod specification maps the the directory on the host containing the Docker log files, /var/lib/docker/containers, to a directory inside the container which has the same path. The pod runs one image, gcr.io/google_containers/fluentd-gcp:1.6, which is configured to collect the Docker log files from the logs directory and ingest them into Google Cloud Logging. One instance of this pod runs on each node of the cluster. Kubernetes will notice if this pod fails and automatically restart it.

We can click on the Logs item under the Monitoring section of the Google Developer Console and select the logs for the counter container, which will be called kubernetes.counter_default_count. This identifies the name of the pod (counter), the namespace (default) and the name of the container (count) for which the log collection occurred. Using this name we can select just the logs for our counter container from the drop down menu:

When we view the logs in the Developer Console we observe the logs for both invocations of the container.

Screen Shot 2015-06-02 at 10.57.54 PM.png

Note the first container counted to 108 and then it was terminated. When the next container image restarted the counting process resumed from 0. Similarly if we deleted the pod and restarted it we would capture the logs for all instances of the containers in the pod whenever the pod was running.

Logs ingested into Google Cloud Logging may be exported to various other destinations includingGoogle Cloud Storagebuckets and BigQuery. Use the Exports tab in the Cloud Logging console to specify where logs should be streamed to.

We could query the ingested logs from BigQuery using the SQL query which reports the counter log lines showing the newest lines first.

SELECT metadata.timestamp, structPayload.log FROM [mylogs.kubernetes_counter_default_count_20150611] ORDERBY metadata.timestamp DESC

Here is some sample output:

We could also fetch the logs from Google Cloud Storage buckets to our desktop or laptop and then search them locally. The following command fetches logs for the counter pod running in a cluster which is itself in a GCE project called myproject. Only logs for the date 2015-06-11 are fetched.

$ gsutil -m cp -r gs://myproject/kubernetes.counter_default_count/2015/06/11 .

Now we can run queries over the ingested logs. The example below uses the jq program to extract just the log lines.

$ cat 21\:00\:00_21\:59\:59_S0.json | jq '.structPayload.log'

"0: Thu Jun 11 21:39:38 UTC 2015\n"

"1: Thu Jun 11 21:39:39 UTC 2015\n"

"2: Thu Jun 11 21:39:40 UTC 2015\n"

"3: Thu Jun 11 21:39:41 UTC 2015\n"

"4: Thu Jun 11 21:39:42 UTC 2015\n"

"5: Thu Jun 11 21:39:43 UTC 2015\n"

"6: Thu Jun 11 21:39:44 UTC 2015\n"

"7: Thu Jun 11 21:39:45 UTC 2015\n"

...

This article has touched briefly on the underlying mechanisms that support gathering cluster level logs on a Kubernetes deployment. The approach here only works for gathering the standard output and standard error output of the processes running in the pod’s containers. To gather other logs that are stored in files one can use a sidecar container to gather the required files and send them to Google Cloud Logging as described at the page Collecting log files within containers with Fluentd and sending them to the Google Cloud Logging service.

↧

Slides: Cluster Management with Kubernetes, talk given at the University of Edinburgh

June 25, 2015, 10:32 pm

≫ Next: The Distributed System ToolKit: Patterns for Composite Containers

≪ Previous: Cluster Level Logging with Kubernetes

On Friday 5 June 2015 I gave a talk called Cluster Management with Kubernetes to a general audience at the University of Edinburgh. The talk includes an example of a music store system with a Kibana front end UI and an Elasticsearch based back end which helps to make concrete concepts like pods, replication controllers and services. Please use the gears on the bottom row to select speaker notes. For a larger view of the slides please follow the link Cluster Management with Kubernetes.

↧

The Distributed System ToolKit: Patterns for Composite Containers

June 29, 2015, 12:08 pm

≫ Next: How did the Quake demo from DockerCon Work?

≪ Previous: Slides: Cluster Management with Kubernetes, talk given at the University of Edinburgh

Having had the privilege of presenting some ideas from Kubernetes at DockerCon 2015, I thought I would make a blog post to share some of these ideas for those of you who couldn’t be there.

Over the past two years containers have become an increasingly popular way to package and deploy code. Container images solve many real-world problems with existing packaging and deployment tools, but in addition to these significant benefits, containers offer us an opportunity to fundamentally re-think the way we build distributed applications. Just as service oriented architectures (SOA) encouraged the decomposition of applications into modular, focused services, containers should encourage the further decomposition of these services into closely cooperating modular containers. By virtue of establishing a boundary, containers enable users to build their services using modular, reusable components, and this in turn leads to services that are more reliable, more scalable and faster to build than applications built from monolithic containers.

In many ways the switch from VMs to containers is like the switch from monolithic programs of the 1970s and early 80s to modular object-oriented programs of the late 1980s and onward. The abstraction layer provided by the container image has a great deal in common with the abstraction boundary of the class in object-oriented programming, and it allows the same opportunities to improve developer productivity and application quality. Just like the right way to code is the separation of concerns into modular objects, the right way to package applications in containers is the separation of concerns into modular containers. Fundamentally this means breaking up not just the overall application, but also the pieces within any one server into multiple modular containers that are easy to parameterize and re-use. In this way, just like the standard libraries that are ubiquitous in modern languages, most application developers can compose together modular containers that are written by others, and build their applications more quickly and with higher quality components.

The benefits of thinking in terms of modular containers are enormous, in particular, modular containers provide the following:

Speed application development, since containers can be re-used between teams and even larger communities
Codify expert knowledge, since everyone collaborates on a single containerized implementation that reflects best-practices rather than a myriad of different home-grown containers with roughly the same functionality
Enable agile teams, since the container boundary is a natural boundary and contract for team responsibilities
Provide separation of concerns and focus on specific functionality that reduces spaghetti dependencies and un-testable components

Building an application from modular containers means thinking about symbiotic groups of containers that cooperate to provide a service, not one container per service. In Kubernetes, the embodiment of this modular container service is a Pod. A Pod is a group of containers that share resources like file systems, kernel namespaces and an IP address. The Pod is the atomic unit of scheduling in a Kubernetes cluster, precisely because the symbiotic nature of the containers in the Pod require that they be co-scheduled onto the same machine, and the only way to reliably achieve this is by making container groups atomic scheduling units.

When you start thinking in terms of Pods, there are naturally some general patterns of modular application development that re-occur multiple times. I’m confident that as we move forward in the development of Kubernetes more of these patterns will be identified, but here are three that we see commonly:

Example #1: Sidecar containers

Sidecar containers extend and enhance the "main" container, they take existing containers and make them better. As an example, consider a container that runs the Nginx web server. Add a different container that syncs the file system with a git repository, share the file system between the containers and you have built Git push-to-deploy. But you’ve done it in a modular manner where the git synchronizer can be built by a different team, and can be reused across many different web servers (Apache, Python, Tomcat, etc). Because of this modularity, you only have to write and test your git synchronizer once and reuse it across numerous apps. And if someone else writes it, you don’t even need to do that.

Screen Shot 2015-06-25 at 2.40.46 PM.png

Example #2: Ambassador containers

Ambassador containers proxy a local connection to the world. As an example, consider a Redis cluster with read-replicas and a single write master. You can create a Pod that groups your main application with a Redis ambassador container. The ambassador is a proxy is responsible for splitting reads and writes and sending them on to the appropriate servers. Because these two containers share a network namespace, they share an IP address and your application can open a connection on “localhost” and find the proxy without any service discovery. As far as your main application is concerned, it is simply connecting to a Redis server on localhost. This is powerful, not just because of separation of concerns and the fact that different teams can easily own the components, but also because in the development environment, you can simply skip the proxy and connect directly to a Redis server that is running on localhost.

Screen Shot 2015-06-24 at 8.24.00 PM.png

Example #3: Adapter containers

Adapter containers standardize and normalize output. Consider the task of monitoring N different applications. Each application may be built with a different way of exporting monitoring data. (e.g. JMX, StatsD, application specific statistics) but every monitoring system expects a consistent and uniform data model for the monitoring data it collects. By using the adapter pattern of composite containers, you can transform the heterogeneous monitoring data from different systems into a single unified representation by creating Pods that groups the application containers with adapters that know how to do the transformation. Again because these Pods share namespaces and file systems, the coordination of these two containers is simple and straightforward.

Screen Shot 2015-06-25 at 4.37.51 PM.png

In all of these cases, we've used the container boundary as an encapsulation/abstraction boundary that allows us to build modular, reusable components that we combine to build out applications. This reuse enables us to more effectively share containers between different developers, reuse our code across multiple applications, and generally build more reliable, robust distributed systems more quickly. I hope you’ve seen how Pods and composite container patterns can enable you to build robust distributed systems more quickly, and achieve container code re-use. To try these patterns out yourself in your own applications. I encourage you to go check out open source Kubernetes or Google Container Engine.

- Brendan Burns, Software Engineer at Google

↧

How did the Quake demo from DockerCon Work?

July 1, 2015, 10:07 pm

≫ Next: Weekly Kubernetes Community Hangout Notes - April 24 2015

≪ Previous: The Distributed System ToolKit: Patterns for Composite Containers

Shortly after its release in 2013, Docker became a very popular open source container management tool for Linux. Docker has a rich set of commands to control the execution of a container. Commands such as start, stop, restart, kill, pause, and unpause. However, what is still missing is the ability to Checkpoint and Restore (C/R) a container natively via Docker itself.

We’ve been actively working with upstream and community developers to add support in Docker for native C/R and hope that checkpoint and restore commands will be introduced in Docker 1.8. As of this writing, it’s possible to C/R a container externally because this functionality was recently merged in libcontainer.

External container C/R was demo’d at DockerCon 2015:

Screen Shot 2015-06-30 at 3.37.46 PM.png

Container C/R offers many benefits including the following:

Stop and restart the Docker daemon (say for an upgrade) without having to kill the running containers and restarting them from scratch, losing precious work they had done when they were stopped
Reboot the system without having to restart the containers from scratch. Same benefits as use case 1 above
Speed up the start time of slow-start applications
“Forensic debugging" of processes running in a container by examining their checkpoint images (open files, memory segments, etc.)
Migrate containers by restoring them on a different machine

CRIU

Implementing C/R functionality from scratch is a major undertaking and a daunting task. Fortunately, there is a powerful open source tool written in C that has been used in production for checkpointing and restoring entire process trees in Linux. The tool is called CRIU which stands for Checkpoint Restore In Userspace (http://criu.org). CRIU works by:

Freezing a running application.
Checkpointing the address space and state of the entire process tree to a collection of “image” files.
Restoring the process tree from checkpoint image files.
Resuming application from the point it was frozen.

In April 2014, we decided to find out if CRIU could checkpoint and restore Docker containers to facilitate container migration.

Phase 1 - External C/R

The first phase of this effort invoking CRIU directly to dump a process tree running inside a container and determining why the checkpoint or restore operation failed. There were quite a few issues that caused CRIU failure. The following three issues were among the more challenging ones.

External Bind Mounts

Docker sets up /etc/{hostname,hosts,resolv.conf} as targets with source files outside the container's mount namespace.

The --ext-mount-map command line option was added to CRIU to specify the path of the external bind mounts. For example, assuming default Docker configuration, /etc/hostname in the container's mount namespace is bind mounted from the source at /var/lib/docker/containers/<container-id>/hostname. When checkpointing, we tell CRIU to record /etc/hostname's "map" as, say, etc_hostname. When restoring, we tell CRIU that that the file previously recorded as etc_hostname should be mapped from the external bind mount at /var/lib/docker/containers/<container-id>/hostname.

AUFS Pathnames

Docker initially used AUFS as its preferred filesystem which is still in wide usage (the preferred filesystem is now OverlayFS).. Due to a bug, the AUFS symbolic link paths of /proc/<pid>/map_files point inside AUFS branches instead of their pathnames relative to the container's root. This problem has been fixed in AUFS source code but hasn't made it to all the distros yet. CRIU would get confused seeing the same file in its physical location (in the branch) and its logical location (from the root of mount namespace).

The --root command line option that was used only during restore was generalized to understand the root of the mount namespace during checkpoint and automatically "fix" the exposed AUFS pathnames.

Cgroups

After checkpointing, the Docker daemon removes the container’s cgroups subdirectories (because the container has “exited”). This causes restore to fail.

The --manage-cgroups command line option was added to CRIU to dump and restore the process's cgroups along with their properties.

The CRIU command lines are a simple container are shown below:

$ docker run -d busybox:latest /bin/sh -c 'i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done'

$ docker ps
CONTAINER ID IMAGE           COMMAND           CREATED        STATUS
168aefb8881b busybox:latest "/bin/sh -c 'i=0; 6 seconds ago Up 4 seconds

$ sudo criu dump -o dump.log -v4 -t 17810 \
       -D /tmp/img/<container_id> \
       --root /var/lib/docker/aufs/mnt/<container_id> \
       --ext-mount-map /etc/resolv.conf:/etc/resolv.conf \
       --ext-mount-map /etc/hosts:/etc/hosts \
       --ext-mount-map /etc/hostname:/etc/hostname \
       --ext-mount-map /.dockerinit:/.dockerinit \
       --manage-cgroups \
       --evasive-devices

$ docker ps -a
CONTAINER ID IMAGE           COMMAND           CREATED        STATUS
168aefb8881b busybox:latest "/bin/sh -c 'i=0; 6 minutes ago Exited (-1) 4 minutes ago

$ sudo mount -t aufs -o br=\
/var/lib/docker/aufs/diff/<container_id>:\
/var/lib/docker/aufs/diff/<container_id>-init:\
/var/lib/docker/aufs/diff/a9eb172552348a9a49180694790b33a1097f546456d041b6e82e4d7716ddb721:\
/var/lib/docker/aufs/diff/120e218dd395ec314e7b6249f39d2853911b3d6def6ea164ae05722649f34b16:\
/var/lib/docker/aufs/diff/42eed7f1bf2ac3f1610c5e616d2ab1ee9c7290234240388d6297bc0f32c34229:\
/var/lib/docker/aufs/diff/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158:\
none /var/lib/docker/aufs/mnt/<container_id>

$ sudo criu restore -o restore.log -v4 -d
       -D /tmp/img/<container_id> \
       --root /var/lib/docker/aufs/mnt/<container_id> \
       --ext-mount-map /etc/resolv.conf:/var/lib/docker/containers/<container_id>/resolv.conf \
       --ext-mount-map /etc/hosts:/var/lib/docker/containers/<container_id>/hosts \
       --ext-mount-map /etc/hostname:/var/lib/docker/containers/<container_id>/hostname \
       --ext-mount-map /.dockerinit:/var/lib/docker/init/dockerinit-1.0.0 \
       --manage-cgroups \
       --evasive-devices

$ ps -ef | grep /bin/sh
root     18580     1 0 12:38 ?        00:00:00 /bin/sh -c i=0; while true; do echo $i >> /foo; i=$(expr $i + 1); sleep 3; done

$ docker ps -a
CONTAINER ID IMAGE           COMMAND           CREATED        STATUS
168aefb8881b busybox:latest "/bin/sh -c 'i=0; 7 minutes ago Exited (-1) 5 minutes ago

docker_cr.sh

Since the command line arguments to CRIU were long, a helper script called docker_cr.sh was provided in the CRIU source tree to simplify the proces. So, for the above container, one would simply C/R the container as follows (for details see http://criu.org/Docker):

$ sudo docker_cr.sh -c 4397
dump successful

$ sudo docker_cr.sh -r 4397
restore successful

At the end of Phase 1, it was possible to externally checkpoint and restore a Docker 1.0 container using either VFS, AUFS, or UnionFS storage drivers with CRIU v1.3.

Phase 2 - Native C/R

While external C/R served as a successful proof of concept for container C/R, it had the following limitations:

State of a checkpointed container would show as "Exited".
Docker commands such as logs, kill, etc. will not work on a restored container.
The restored process tree will be a child of /etc/init instead of the Docker daemon.

Therefore, the second phase of the effort concentrated on adding native checkpoint and restore commands to Docker.

libcontainer, nsinit

Libcontainer is Docker’s native execution driver. It provides a set of APIs to create and manage containers. The first step of adding native support was the introduction of two methods, checkpoint() and restore(), to libcontainer and the corresponding checkpoint and restore subcommands to nsinit. Nsinit is a simple utility that is used to test and debug libcontainer.

docker checkpoint, docker restore

With C/R support in libcontainer, the next step was adding checkpoint and restore subcommands to Docker itself. A big challenge in this step was to rebuild the “plumbing” between the container and the daemon. When the daemon initially starts a container, it sets up individual pipes between itself (parent) and the standard input, output, and error file descriptors of the container (child). This is how docker logs can show the output of a container.

When a container exits after being checkpointed, the pipes between it and the daemon are deleted. During container restore, it’s actually CRIU that is the parent. Therefore, setting up a pipe between the child (container) and an unrelated process (Docker daemon) required is not a bit of challenge.

To address this issue, the --inherit-fd command line option was added to CRIU. Using this option, the Docker daemon tells CRIU to let the restored container “inherit” certain file descriptors passed from the daemon to CRIU.

The first version of native C/R was demo'ed at the Linux Plumbers Conference (LPC) in October 2014 (http://linuxplumbersconf.org/2014/ocw/proposals/1899).

The LPC demo was done with a simple container that did not require network connectivity. Support for restoring network connections was done in early 2015 and demonstrated in this 2-minute video clip.

Current Status of Container C/R

In May 2015, the criu branch of libcontainer was merged into master. Using the newly-introduced lightweight runC container runtime, container migration was demo’ed at DockerCon15. In this demo (minute 23:00), a container running Quake was checkpointed and restored on a different machine, effectively implementing container migration.

At the time of this writing, there are two repos on Github that have native C/R support in Docker:

Docker 1.5 (old libcontainer, relatively stable)
Docker 1.7 (newer, less stable)

Work is underway to merge C/R functionality into Docker. You can use either of the above repositories to experiment with Docker C/R. If you are using OverlayFS or your container workload uses AIO, please note the following:

OverlayFS

When OverlayFS support was officially merged into the Linux kernel version 3.18, it became the preferred storage driver (instead of AUFS) . However, OverlayFS in 3.18 has the following issues:

/proc/<pid>/fdinfo/<fd> contains mnt_id which isn’t in /proc/<pid>/mountinfo
/proc/<pid>/fd/<fd> does not contain an absolute path to the opened file

Both issues are fixed in this patch but the patch has not been merged upstream yet.

AIO

If you are using a kernel older than 3.19 and your container uses AIO, you need the following kernel patches from 3.19:

torvalds: bd9b51e7 by Al Viro
torvalds: e4a0d3e72 by Pavel Emelyanov

- Saied Kazemi, Software Engineer at Google

↧

Weekly Kubernetes Community Hangout Notes - April 24 2015

April 29, 2015, 5:27 pm

≫ Next: Kubernetes 1.0 Launch Event at OSCON

≪ Previous: How did the Quake demo from DockerCon Work?

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Agenda:

Flocker and Kubernetes integration demo

Notes:

flocker and kubernetes integration demo
Demo steps and DIY instructions at: https://clusterhq.com/blog/data-migration-kubernetes-flocker/

Cool demo by Kai Davenport

Flocker Q/A

Does the file still exists on node1 after migration?

Luke: still exists, but unmounted and cannot be written to. Data persists and can be used. Working on support for multiple storage backend.

Brendan: Any plan this to make it a volume? So we don't need powerstrip?

Luke: Need to figure out interest to decide if we want to make it a first-class persistent disk provider in kube.
Brendan: Removing need for powerstrip would make it simple to use. Totally go for it.
Tim: Should take no more than 45 minutes to add it to kubernetes:)

Derek: Contrast this with persistent volumes and claims?

Luke: Not much difference, except for the novel ZFS based backend. Makes workloads really portable.
Tim: very different than network-based volumes. Its interesting that it is the only offering that allows upgrading media.
Brendan: claims, how does it look for replicated claims? eg Cassandra wants to have replicated data underneath. It would be efficient to scale up and down. Create storage on the fly based on load dynamically. Its step beyond taking snapshots - programmatically creating replicas with preallocation.
Tim: helps with auto-provisioning.

Brian: Does flocker requires any other component?

Kai: Flocker control service co-located with the master. (dia on blog post). Powerstrip + Powerstrip Flocker. Very interested in mpersisting state in etcd. It keeps metadata about each volume.
Brendan: In future, flocker can be a plugin and we'll take care of persistence. Post v1.0.
Brian: Interested in adding generic plugin for services like flocker.
Luke: Zfs can become really valuable when scaling to lot of containers on a single node.

Alex: Can flocker service can be run as a pod?

Kai: Yes, only requirement is the flocker control service should be able to talk to zfs agent. zfs agent needs to be installed on the host and zfs binaries need to be accessible.
Brendan: In theory, all zfs bits can be put it into a container with devices.
Luke: Yes, still working through cross-container mounting issue.
Tim: pmorie is working through it to make kubelet work in a container. Possible re-use.

Kai: Cinder support is coming. Few days away.

Bob: What’s the process of pushing kube to GKE? Need more visibility for confidence.

Brendan: Not well documented, but released marked latest after GKE picks it up.
GKE does lot of heavy testing before a version is blessed.
gsutil cat gs://kubernetes-release/release/latest.txt
$ curl https://storage.googleapis.com/kubernetes-release/release/latest.txt

↧

Kubernetes 1.0 Launch Event at OSCON

July 2, 2015, 10:15 am

≫ Next: Announcing the First Kubernetes Enterprise Training Course

≪ Previous: Weekly Kubernetes Community Hangout Notes - April 24 2015

In case you haven't heard, the Kubernetes project team & community have some awesome stuff lined up for our release event at OSCON in a few weeks.

If you haven't already registered for in person or live stream, please do it now! check out kuberneteslaunch.com for all the details. You can also find out there how to get a free expo pass for OSCON which you'll need to attend in person.

We'll have talks from Google executives Brian Stevens, VP of Cloud Product, and Eric Brewer, VP of Google Infrastructure. They will share their perspective on where Kubernetes is and where it's going that you won't want to miss.

Several of our community partners will be there including CoreOS, Redapt, Intel, Mesosphere, Mirantis, the OpenStack Foundation, CloudBees, Kismatic and Bitnami.

And real life users of Kubernetes will be there too. We've announced that zulily Principal Engineer Steve Reed is speaking, and we will let you know about others over the next few days. Let's just say it's a pretty cool list.

Check it out now - kuberneteslaunch.com

↧

Announcing the First Kubernetes Enterprise Training Course

July 8, 2015, 8:44 am

≫ Next: Weekly Kubernetes Community Hangout Notes - July 10 2015

≪ Previous: Kubernetes 1.0 Launch Event at OSCON

At Google we rely on Linux application containers to run our core infrastructure. Everything from Search to Gmail runs in containers. In fact, we like containers so much that even our Google Compute Engine VMs run in containers! Because containers are critical to our business, we have been working with the community on many of the basic container technologies (from cgroups to Docker’s LibContainer) and even decided to build the next generation of Google’s container scheduling technology, Kubernetes, in the open.

One year into the Kubernetes project, and on the eve of our planned V1 release at OSCON, we are pleased to announce the first-ever formal Kubernetes enterprise-focused training session organized by a key Kubernetes contributor, Mesosphere. The inaugural session will be taught by Zed Shaw and Michael Hausenblas from Mesosphere, and will take place on July 20 at OSCON in Portland. Pre-registration is free for early registrants, but space is limited so act soon!

This one-day course will cover the basics of building and deploying containerized applications using Kubernetes. It will walk attendees through the end-to-end process of creating a Kubernetes application architecture, building and configuring Docker images, and deploying them on a Kubernetes cluster. Users will also learn the fundamentals of deploying Kubernetes applications and services on our Google Container Engine and Mesosphere’s Datacenter Operating System.

The upcoming Kubernetes bootcamp will be a great way to learn how to apply Kubernetes to solve long-standing deployment and application management problems. This is just the first of what we hope are many, and from a broad set of contributors.

↧

Weekly Kubernetes Community Hangout Notes - July 10 2015

July 13, 2015, 12:40 pm

≫ Next: Strong, Simple SSL for Kubernetes Services

≪ Previous: Announcing the First Kubernetes Enterprise Training Course

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Here are the notes from today's meeting:

1.0 Release Plan

0.21.0 is basically 1.0 but we plan to cut another release branch today for 1.0.0
This will be the 1.0 release modulo any cherry picks
Please use the new cherry pick script to propose a cherry pick to the release-1.0 branch
The first binary will be built today to soak over the weekend

Gabe / Josh from Engine Yard: Demo/Overview of Deis + Kubernetes

Deis is open source PAAS - make it easy to deploy and manage distributed apps on a CoreOS cluster (using docker and now kubernetes)

133 contributors

Looked at many orchestration layers (Fleet, Mesos, Swarm, Kubernetes)

Currently running on Mesos
k8s APIs feel right for an orchestration engine and k8s feels like a great building block for a PAAS

What does Deis add?

Integrated http routing with https
Builder (for push to deploy)
Integrated docker registry
Integrated ceph cluster for scale out storage
Log routing and aggregation
User management with LDAP and AD support
Providing a CLI workflow to drive k8s

Demo

deis create example-go
git push deis master

packaging via Heroku build-pack -> Docker image
push to registry co-located with the cluster
done and deployed

deis scale web=4
deis logs …

aggregates logs

deis config

application is running a release
release is made up of config + build
effectively sets environment variables

deis config:set POWERD_BY=k8s

tells example-go to print different output (based on the environment)

deis releases

ledger of changes that allows you to do rollbacks

deis rollback v2

actually a roll-forward to ‘v4’ with the old config

deis run ‘ls -la’
deisctl

shows components on a 5 node cluster on AWS using CoreOS

Future

Plan to embrace k8s on a deeper level
In the limit, run k8s with a small number of pods specific to Deis that turn the cluster into a heroku-like PAAS

Working on HTTP Router post 1.0

Going to use nginx or HAProxy (or both)
Work on getting API right, then implement with existing solutions

Google Intern Turbo Demos

Daemons (per-node controller)

Launch an application on every node of the cluster or on all nodes that have specific labels
kubectl create -f sample_dc.json

kind: DaemonController
No label selection → runs on all nodes

kubectl describe dc redis-master-copy

Tells you that the daemon is supposed to run on 4 nodes and is running on 4 nodes

kubectl create -f sample_dc_nodeselector.json

spec: nodeSelector inside the template
Will run on nodes that match the selector

kubectl label node kubernetes-minion-f917 color=red

Will run the pod on this node as well
If the node is full, the pod will try to launch and will stay pending
In the future, we may bump existing pods to make space

kubectl label --overwrite node kubernetes-minion-f917 color=grey

Removes the pod from this node

If the daemon and replication controller overlap, then they will fight
Can use a node selector or node name to restrict where the daemon runs
Can it run on a % of nodes?

Not in the current implementation

DiurnalController (PR #10881)

Varies the number of pod replicas that are running throughout the day
Specify the times when the number of replicas change and how many to run at each time (uses absolute time -- currently in UTC)

↧

Strong, Simple SSL for Kubernetes Services

July 14, 2015, 8:06 am

≫ Next: Weekly Kubernetes Community Hangout Notes - July 17 2015

≪ Previous: Weekly Kubernetes Community Hangout Notes - July 10 2015

Hi, I’m Evan Brown (@evandbrown) and I work on the solutions architecture team for Google Cloud Platform. I recently wrote an article and tutorial about using Jenkins on Kubernetes to automate the Docker and GCE image build process. Today I’m going to discuss how I used Kubernetes services and secrets to add SSL to the Jenkins web UI. After reading this, you’ll be able to add SSL termination (and HTTP->HTTPS redirects + basic auth) to your public HTTP Kubernetes services.

In the beginning

In the spirit of minimum viability, the first version of Jenkins-on-Kubernetes I built was very basic but functional:

The Jenkins leader was just a single container in one pod, but it was managed by a replication controller, so if it failed it would automatically respawn.
The Jenkins leader exposes two ports - TCP 8080 for the web UI and TCP 50000 for build agents to register - and those ports are made available as a Kubernetes service with a public load balancer.

Here’s a visual of that first version:

This works, but I have a few problems with it. First, authentication isn’t configured in a default Jenkins installation. The leader is sitting on the public Internet, accessible to anyone, until you connect and configure authentication. And since there’s no encryption, configuring authentication is kind of a symbolic gesture. We need SSL, and we need it now!

Do what you know

For a few milliseconds I considered trying to get SSL working directly on Jenkins. I’d never done it before, and I caught myself wondering if it would be as straightforward as working with SSL on Nginx, something I do have experience with. I’m all for learning new things, but this seemed like a great place to not invent a new wheel: SSL on Nginx is straightforward and well documented (as are its reverse-proxy capabilities), and Kubernetes is all about building functionality by orchestrating and composing containers. Let’s use Nginx, and add a few bonus features that Nginx makes simple: HTTP->HTTPS redirection, and basic access authentication.

SSL termination proxy as an nginx service

I started by putting together a Dockerfile that inherited from the standard nginx image, copied a few Nginx config files, and added a custom entrypoint (start.sh). The entrypoint script checks an environment variable (ENABLE_SSL) and activates the correct Nginx config accordingly (meaning that unencrypted HTTP reverse proxy is possible, but that defeats the purpose). The script also configures basic access authentication if it’s enabled (the ENABLE_BASIC_AUTH env var).

Finally, start.sh evaluates the SERVICE_HOST_ENV_NAME and SERVICE_PORT_ENV_NAME env vars. These variables should be set to the names of the environment variables for the Kubernetes service you want to proxy to. In this example, the service for our Jenkins leader is cleverly named jenkins, which means pods in the cluster will see an environment variable named JENKINS_SERVICE_HOST and JENKINS_SERVICE_PORT_UI (the port that 8080 is mapped to on the Jenkins leader). SERVICE_HOST_ENV_NAME and SERVICE_PORT_ENV_NAME simply reference the correct service to use for a particular scenario, allowing the image to be used generically across deployments.

Defining the Controller and Service

LIke every other pod in this example, we’ll deploy Nginx with a replication controller, allowing us to scale out or in, and recover automatically from container failures. This excerpt from a complete descriptor in the sample app shows some relevant bits of the pod spec:

spec:

containers:

name:"nginx-ssl-proxy"

image:"gcr.io/cloud-solutions-images/nginx-ssl-proxy:latest"

env:

name:"SERVICE_HOST_ENV_NAME"

value:"JENKINS_SERVICE_HOST"

name:"SERVICE_PORT_ENV_NAME"

value:"JENKINS_SERVICE_PORT_UI"

name:"ENABLE_SSL"

value:"true"

name:"ENABLE_BASIC_AUTH"

value:"true"

ports:

name:"nginx-ssl-proxy-http"

containerPort:80

name:"nginx-ssl-proxy-https"

containerPort:443

The pod will have a service exposing TCP 80 and 443 to a public load balancer. Here’s the service descriptor (also available in the sample app):

kind:"Service"

apiVersion:"v1"

metadata:

name:"nginx-ssl-proxy"

labels:

name:"nginx"

role:"ssl-proxy"

spec:

ports:

name:"https"

port:443

targetPort:"nginx-ssl-proxy-https"

protocol:"TCP"

name:"http"

port:80

targetPort:"nginx-ssl-proxy-http"

protocol:"TCP"

selector:

name:"nginx"

role:"ssl-proxy"

type:"LoadBalancer"

And here’s an overview with the SSL termination proxy in place. Notice that Jenkins is no longer directly exposed to the public Internet:

Now, how did the Nginx pods get ahold of the super-secret SSL key/cert and htpasswd file (for basic access auth)?

Keep it secret, keep it safe

Kubernetes has an API and resource for Secrets. Secrets “are intended to hold sensitive information, such as passwords, OAuth tokens, and ssh keys. Putting this information in a secret is safer and more flexible than putting it verbatim in a pod definition or in a docker image.”

You can create secrets in your cluster in 3 simple steps:

Base64-encode your secret data (i.e., SSL key pair or htpasswd file)

$ cat ssl.key | base64
LS0tLS1CRUdJTiBDRVJUS...

Create a json document describing your secret, and add the base64-encoded values:

apiVersion:"v1"

kind:"Secret"

metadata:

name:"ssl-proxy-secret"

namespace:"default"

data:

proxycert:"LS0tLS1CRUd..."

proxykey:"LS0tLS1CR..."

htpasswd:"ZXZhb..."

Create the secrets resource:

$ kubectl create -f secrets.json

To access the secrets from a container, specify them as a volume mount in your pod spec. Here’s the relevant excerpt from the Nginx proxy template we saw earlier:

spec:

containers:

name:"nginx-ssl-proxy"

image:"gcr.io/cloud-solutions-images/nginx-ssl-proxy:latest"

env:[...]

ports:...[]

volumeMounts:

name:"secrets"

mountPath:"/etc/secrets"

readOnly:true

volumes:

name:"secrets"

secret:

secretName:"ssl-proxy-secret"

A volume of type secret that points to the ssl-proxy-secret secret resource is defined, and then mounted into /etc/secrets in the container. The secrets spec in the earlier example defined data.proxycert, data.proxykey, and data.htpasswd, so we would see those files appear (base64-decoded) in /etc/secrets/proxycert, /etc/secrets/proxykey, and /etc/secrets/htpasswd for the Nginx process to access.

All together now

I have “containers and Kubernetes are fun and cool!” moments all the time, like probably every day. I’m beginning to have “containers and Kubernetes are extremely useful and powerful and are adding value to what I do by helping me do important things with ease” more frequently. This SSL termination proxy with Nginx example is definitely one of the latter. I didn’t have to waste time learning a new way to use SSL. I was able to solve my problem using well-known tools, in a reusable way, and quickly (from idea to working took about 2 hours).

Check out the complete Automated Image Builds with Jenkins, Packer, and Kubernetes repo to see how the SSL termination proxy is used in a real cluster, or dig into the details of the proxy image in the nginx-ssl-proxy repo (complete with a Dockerfile and Packer template so you can build the image yourself).

↧

Weekly Kubernetes Community Hangout Notes - July 17 2015

July 23, 2015, 2:44 pm

≫ Next: The Growing Kubernetes Ecosystem

≪ Previous: Strong, Simple SSL for Kubernetes Services

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Here are the notes from today's meeting:

Eric Paris: replacing salt with ansible (if we want)

In contrib, there is a provisioning tool written in ansible
The goal in the rewrite was to eliminate as much of the cloud provider stuff as possible
The salt setup does a bunch of setup in scripts and then the environment is setup with salt

This means that things like generating certs is done differently on GCE/AWS/Vagrant

For ansible, everything must be done within ansible
Background on ansible

Does not have clients
Provisioner ssh into the machine and runs scripts on the machine
You define what you want your cluster to look like, run the script, and it sets up everything at once
If you make one change in a config file, ansible re-runs everything (which isn’t always desirable)
Uses a jinja2 template

Create machines with minimal software, then use ansible to get that machine into a runnable state

Sets up all of the add-ons

Eliminates the provisioner shell scripts
Full cluster setup currently takes about 6 minutes

CentOS with some packages
Redeploy to the cluster takes 25 seconds

Questions for Eric

Where does the provider-specific configuration go?

The only network setup that the ansible config does is flannel; you can turn it off

What about init vs. systemd?

Should be able to support in the code w/o any trouble (not yet implemented)

Discussion

Why not push the setup work into containers or kubernetes config?

To bootstrap a cluster drop a kubelet and a manifest

Running a kubelet and configuring the network should be the only things required. We can cut a machine image that is preconfigured minus the data package (certs, etc)

The ansible scripts install kubelet & docker if they aren’t already installed

Each OS (RedHat, Debian, Ubuntu) could have a different image. We could view this as part of the build process instead of the install process.
There needs to be solution for bare metal as well.
In favor of the overall goal -- reducing the special configuration in the salt configuration
Everything except the kubelet should run inside a container (eventually the kubelet should as well)

Running in a container doesn’t cut down on the complexity that we currently have
But it does more clearly define the interface about what the code expects

These tools (Chef, Puppet, Ansible) conflate binary distribution with configuration

Containers more clearly separate these problems

The mesos deployment is not completely automated yet, but the mesos deployment is completely different: kubelets get put on top on an existing mesos cluster

The bash scripts allow the mesos devs to see what each cloud provider is doing and re-use the relevant bits
There was a large reverse engineering curve, but the bash is at least readable as opposed to the salt

Openstack uses a different deployment as well
We need a well documented list of steps (e.g. create certs) that are necessary to stand up a cluster

This would allow us to compare across cloud providers
We should reduce the number of steps as much as possible
Ansible has 241 steps to launch a cluster

1.0 Code freeze

How are we getting out of code freeze?
This is a topic for next week, but the preview is that we will move slowly rather than totally opening the firehose

We want to clear the backlog as fast as possible while maintaining stability both on HEAD and on the 1.0 branch
The backlog of almost 300 PRs but there are also various parallel feature branches that have been developed during the freeze

Cutting a cherry pick release today (1.0.1) that fixes a few issues
Next week we will discuss the cadence for patch releases

↧

The Growing Kubernetes Ecosystem

July 23, 2015, 9:51 pm

≫ Next: Weekly Kubernetes Community Hangout Notes - May 22 2015

≪ Previous: Weekly Kubernetes Community Hangout Notes - July 17 2015

Over the past year, we’ve seen fantastic momentum in the Kubernetes project, culminating with the release of Kubernetes v1 earlier this week. We’ve also witnessed the ecosystem around Kubernetes blossom, and wanted to draw attention to some of the cooler offerings we’ve seen.

	CloudBees and the Jenkins community have created a Kubernetes plugin, allowing Jenkins slaves to be built as Docker images and run in Docker hosts managed by Kubernetes, either on the Google Cloud Platform or on a more local Kubernetes instance. These elastic slaves are then brought online as Jenkins schedules jobs for them and destroyed after their builds are complete, ensuring masters have steady access to clean workspaces and minimizing builds’ resource footprint.
	CoreOS has created launched Tectonic, an opinionated enterprise distribution of Kubernetes, CoreOS and Docker. Tectonic includes a management console for workflows and dashboards, an integrated registry to build and share containers, and additional tools to automate deployment and customize rolling updates. At KuberCon, CoreOS launched Tectonic Preview, giving users easy access to Kubernetes 1.0, 24x7 enterprise ready support, Kubernetes guides and Kubernetes training to help enterprises begin experiencing the power of Kubernetes, CoreOS and Docker.
	Hitachi Data Systems has announced that Kubernetes now joins the list of solutions validated to run on their enterprise Unified Computing Platform. With this announcement Hitachi has validated Kubernetes and VMware running side-by-side on the UCP platform, providing an enterprise solution for container-based applications and traditional virtualized workloads.
	Kismatic is providing enterprise support for pure play open source Kubernetes. They have announced open source and commercially supported Kubernetes plug-ins specifically built for production-grade enterprise environments. Any Kubernetes deployment can now benefit from modular role-based access controls (RBAC), Kerberos for bedrock authentication, LDAP/AD integration, rich auditing and platform-agnostic Linux distro packages.
	Meteor Development Group, creators of Meteor, a JavaScript App Platform, are using Kubernetes to build Galaxy to run Meteor apps in production. Galaxy will scale from free test apps to production-suitable high-availability hosting.
	Mesosphere has incorporated Kubernetes into its Data Center Operating System (DCOS) platform as a first class citizen. Using DCOS, enterprises can deploy Kubernetes across thousands of nodes, both bare-metal and virtualized machines that can run on-premise and in the cloud. Mesosphere also launched a beta of their Kubernetes Training Bootcamp and will be offering more in the future.
	Mirantis is enabling hybrid cloud applications across OpenStack and other clouds supporting Kubernetes. An OpenStack Murano app package supports full application lifecycle actions such as deploy, create cluster, create pod, add containers to pods, scale up and scale down.
	OpenContrail is creating a kubernetes-contrail plugin designed to stitch the cluster management capabilities of Kubernetes with the network service automation capabilities of OpenContrail. Given the event-driven abstractions of pods and services inherent in Kubernetes, it is a simple extension to address network service enforcement by leveraging OpenContrail’s Virtual Network policy approach and programmatic API’s.
	Pachyderm is a containerized data analytics engine which provides the broad functionality of Hadoop with the ease of use of Docker. Users simply provide containers with their data analysis logic and Pachyderm will distribute that computation over the data. They have just released full deployment on Kubernetes for on premise deployments, and on Google Container Engine, eliminating all the operational overhead of running a cluster yourself.
	Platalytics, Inc. and announced the release of one-touch deploy-anywhere feature for its Spark Application Platform. Based on Kubernetes, Docker, and CoreOS, it allows simple and automated deployment of Apache Hadoop, Spark, and Platalytics platform, with a single click, to all major public clouds, including Google, Amazon, Azure, Digital Ocean, and private on-premise clouds. It also enables hybrid cloud scenarios, where resources on public and private clouds can be mixed.
	Rackspace has created Corekube as a simple, quick way to deploy Kubernetes on OpenStack. By using a decoupled infrastructure that is coordinated by etcd, fleet and flannel, it enables users to try Kubernetes and CoreOS without all the fuss of setting things up by hand.
	Red Hat is a long time proponent of Kubernetes, and a significant contributor to the project. In their own words, “From Red Hat Enterprise Linux 7 and Red Hat Enterprise Linux Atomic Host to OpenShift Enterprise 3 and the forthcoming Red Hat Atomic Enterprise Platform, we are well-suited to bring container innovations into the enterprise, leveraging Kubernetes as the common backbone for orchestration.”
	Redapt has launching a variety of turnkey, on-premises Kubernetes solutions co-engineered with other partners in the Kubernetes partner ecosystem. These include appliances built to leverage the CoreOS/Tectonic, Mirantis OpenStack, and Mesosphere platforms for management and provisioning. Redapt also offers private, public, and multi-cloud solutions that help customers accelerate their Kubernetes deployments successfully into production.

We’ve also seen a community of services partners spring up to assist in adopting Kubernetes and containers:

	Biarca is using Kubernetes to ease application deployment and scale on demand across available hybrid and multi-cloud clusters through strategically managed policy. A video on their website illustrates how to use Kubernetes to deploy applications in a private cloud infrastructure based on OpenStack and use a public cloud like GCE to address bursting demand for applications.
	Cloud Technology Partners has developed a Container Services Offering featuring Kubernetes to assist enterprises with container best practices, adoption and implementation. This offering helps organizations understand how containers deliver competitive edge.
	DoIT International is offering a Kubernetes Bootcamp which consists of a series of hands-on exercises interleaved with mini-lectures covering hands on topics such as Container Basics, Using Docker, Kubernetes and Google Container Engine.
	OpenCredo provides a practical, lab style container and scheduler course in addition to consulting and solution delivery. The three-day course allows development teams to quickly ramp up and make effective use of containers in real world scenarios, covering containers in general along with Docker and Kubernetes.
	Pythian focuses on helping clients design, implement, and manage systems that directly contribute to revenue and business success. They provide small, dedicated teams of highly trained and experienced data experts have the deep Kubernetes and container experience necessary to help companies solve Big Data problems with containers.

- Martin Buhr, Product Manager at Google

↧

Weekly Kubernetes Community Hangout Notes - May 22 2015

June 2, 2015, 11:18 am

≫ Next: Weekly Kubernetes Community Hangout Notes - July 31 2015

≪ Previous: The Growing Kubernetes Ecosystem

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Discussion / Topics

Code Freeze
Upgrades of cluster
E2E test issues

Code Freeze process starts EOD 22-May, including

Code Slush -- draining PRs that are active. If there are issues for v1 to raise, please do so today.

Drain timeframe will be about 1-week.

Community PRs -- plan is to reopen in ~6 weeks.
Key areas for fixes in v1 -- docs, the experience.

E2E issues and LGTM process

Seen end-to-end tests go red.
Plan is to limit merging to on-call. Quinton to communicate.

Community committers, please label with LGTM and on-call will merge based on on-call’s judgement.

Can we expose Jenkins runs to community? (Paul)

Question/concern to work out is securing Jenkins. Short term conclusion: Will look at pushing Jenkins logs into GCS bucket. Lavalamp will follow up with Jeff Grafton.
Longer term solution may be a merge queue, where e2e runs for each merge (as opposed to multiple merges). This exists in Openshift today.

Cluster Upgrades for Kubernetes as final v1 feature

GCE will use Persistent Disk (PD) to mount new image.
OpenShift will follow a tradition update model, with “yum update”.
A strawman approach is to have an analog of “kube-push” to update the master, in-place. Feedback in the meeting was

Upgrading Docker daemon on the master will kill the master’s pods. Agreed. May consider an ‘upgrade’ phase or explicit step.
How is this different than HA master upgrade? See HA case as a superset. The work to do an upgrade would be a prerequisite for HA master upgrade.

Mesos scheduler implements a rolling node upgrade.

Attention requested for v1 in the Hangout

Downward plug-in #5093.

Discussed that it’s an eventually consistent design.
In the meeting, the outcome was: seeking a pattern for atomicity of update across multiple piece. Paul to ping Tim when ready to review.

Regression in e2e #8499 (Eric Paris)
Asking for review of direction, if not review. #8334 (Mark)
Handling graceful termination (e.g. sigterm to postgres) is not implemented. #2789 (Clayton)

Need is to bump up grace period or finish plumbing. In API, client tools, missing is kubelet does use and we don’t set the timeout (>0) value.
Brendan will look into this graceful term issue.

Load balancer almost ready by JustinSB.

↧

Weekly Kubernetes Community Hangout Notes - July 31 2015

August 4, 2015, 3:14 pm

≫ Next: Introducing Kubernetes API Version v1beta3

≪ Previous: Weekly Kubernetes Community Hangout Notes - May 22 2015

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Here are the notes from today's meeting:

Private Registry Demo - Muhammed

Run docker-registry as an RC/Pod/Service
Run a proxy on every node
Access as localhost:5000
Discussion:

Should we back it by GCS or S3 when possible?
Run real registry backed by $object_store on each node
DNS instead of localhost?

disassemble image strings?
more like DNS policy?

Running Large Clusters - Joe

Samsung keen to see large scale O(1000)

Starting on AWS

RH also interested - test plan needed
Plan for next week: discuss working-groups
If you are interested in joining conversation on cluster scalability send mail to joe@0xBEDA.com

Resource API Proposal - Clayton

New stuff wants more info on resources
Proposal for resources API - ask apiserver for info on pods
Send feedback to: #11951
Discussion on snapshot vs time-series vs aggregates

Containerized kubelet - Clayton

Open pull
Docker mount propagation - RH carries patches
Big issues around whole bootstrap of the system

dual: boot-docker/system-docker

Kube-in-docker is really nice, but maybe not critical

Do the small stuff to make progress
Keep pressure on docker

Web UI (preilly)

Where does web UI stand?

OK to split it back out
Use it as a container image
Build image as part of kube release process
Vendor it back in? Maybe, maybe not.

Will DNS be split out?

Probably more tightly integrated, instead

Other potential spin-outs:

apiserver
clients

↧

Introducing Kubernetes API Version v1beta3

April 16, 2015, 4:10 pm

≫ Next: Weekly Kubernetes Community Hangout Notes - April 17 2015

≪ Previous: Weekly Kubernetes Community Hangout Notes - July 31 2015

We've been hard at work on cleaning up the API over the past several months (see https://github.com/GoogleCloudPlatform/kubernetes/issues/1519 for details). The result is v1beta3, which is considered to be the release candidate for the v1 API.

We would like you to move to this new API version as soon as possible. v1beta1 and v1beta2 are deprecated, and will be removed by the end of June, shortly after we introduce the v1 API.

As of the latest release, v0.15.0, v1beta3 is the primary, default API. We have changed the default kubectl and client API versions as well as the default storage version (which means objects persisted in etcd will be converted from v1beta1 to v1beta3 as they are rewritten).

You can take a look at v1beta3 examples such as:

https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/guestbook/v1beta3

https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/walkthrough/v1beta3

https://github.com/GoogleCloudPlatform/kubernetes/tree/master/examples/update-demo/v1beta3

To aid the transition, we've also created a conversion tool and put together a list of important different API changes.

The resource id is now called name.
name, labels, annotations, and other metadata are now nested in a map called metadata
desiredState is now called spec, and currentState is now called status
/minions has been moved to /nodes, and the resource has kind Node
The namespace is required (for all namespaced resources) and has moved from a URL parameter to the path:/api/v1beta3/namespaces/{namespace}/{resource_collection}/{resource_name}
The names of all resource collections are now lower cased - instead of replicationControllers, usereplicationcontrollers.
To watch for changes to a resource, open an HTTP or Websocket connection to the collection URL and provide the?watch=true URL parameter along with the desired resourceVersion parameter to watch from.
The container entrypoint has been renamed to command, and command has been renamed to args.
Container, volume, and node resources are expressed as nested maps (e.g., resources{cpu:1}) rather than as individual fields, and resource values support scaling suffixes rather than fixed scales (e.g., milli-cores).
Restart policy is represented simply as a string (e.g., "Always") rather than as a nested map ("always{}").
The volume source is inlined into volume rather than nested.
Host volumes have been changed to hostDir to hostPath to better reflect that they can be files or directories

And the most recently generated Swagger specification of the API is here:

http://kubernetes.io/third_party/swagger-ui/#!/v1beta3

More details about our approach to API versioning and the transition can be found here:

https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/api.md

Another change we discovered is that with the change to the default API version in kubectl, commands that use "-o template" will break unless you specify "--api-version=v1beta1" or update to v1beta3 syntax. An example of such a change can be seen here:

https://github.com/GoogleCloudPlatform/kubernetes/pull/6377/files

If you use "-o template", I recommend always explicitly specifying the API version rather than relying upon the default. We may add this setting to kubeconfig in the future.

Let us know if you have any questions. As always, we're available on IRC (#google-containers) and github issues.

↧

Weekly Kubernetes Community Hangout Notes - April 17 2015

April 17, 2015, 3:27 pm

≫ Next: Kubernetes and the Mesosphere DCOS

≪ Previous: Introducing Kubernetes API Version v1beta3

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Agenda

Mesos Integration
High Availability (HA)
Adding performance and profiling details to e2e to track regressions
Versioned clients

Notes

Mesos integration

Mesos integration proposal: https://github.com/GoogleCloudPlatform/kubernetes/issues/6676
No blockers to integration.
Documentation needs to be updated.

Proposal should land today.
Etcd cluster.
Load-balance apiserver.
Cold standby for controller manager and other master components.

Adding performance and profiling details to e2e to track regression

Want red light for performance regression
Need a public DB to post the data

See https://github.com/GoogleCloudPlatform/kubernetes/issues/3118

Justin working on multi-platform e2e dashboard

Versioned clients

https://github.com/GoogleCloudPlatform/kubernetes/issues/4874
https://github.com/GoogleCloudPlatform/kubernetes/issues/3955
Client library currently uses internal API objects.
Nobody reported that frequent changes to types.go have been painful, but we are worried about it.
Structured types are useful in the client. Versioned structs would be ok.
If start with json/yaml (kubectl), shouldn’t convert to structured types. Use swagger.

Security context

https://github.com/GoogleCloudPlatform/kubernetes/pull/6287
Administrators can restrict who can run privileged containers or require specific unix uids
Kubelet will be able to get pull credentials from apiserver
Policy proposal coming in the next week or so

Discussing upstreaming of users, etc. into Kubernetes, at least as optional
1.0 Roadmap

Focus is performance, stability, cluster upgrades
TJ has been making some edits to roadmap.md but hasn’t sent out a PR yet

Kubernetes UI

Dependencies broken out into third-party
@lavalamp is reviewer

↧

Kubernetes and the Mesosphere DCOS

April 22, 2015, 6:00 am

≫ Next: Borg: The Predecessor to Kubernetes

≪ Previous: Weekly Kubernetes Community Hangout Notes - April 17 2015

Today Mesosphere announced the addition of Kubernetes as a standard part of their DCOS offering. This is a great step forwards in bringing cloud native application management to the world, and should lay to rest many questions we hear about ‘Kubernetes or Mesos, which one should I use?’. Now you can have your cake and eat it too: use both. Today’s announcement extends the reach of Kubernetes to a new class of users, and add some exciting new capabilities for everyone.

By way of background, Kubernetes is a cluster management framework that was started by Google nine months ago, inspired by the internal system known as Borg. You can learn a little more about Borg by checking out this paper. At the heart of it Kubernetes offers what has been dubbed ‘cloud native’ application management. To us, there are three things that together make something ‘cloud native’:

Container oriented deployments. Package up your application components with all their dependencies and deploy them using technologies like Docker or Rocket. Containers radically simplify the deployment process, making rollouts repeatable and predictable.

Dynamically managed. Rely on modern control systems to make moment-to-moment decisions around the health management and scheduling of applications to radically improve reliability and efficiency. There are some things that just machines do better than people, and actively running applications is one of those things.

Micro-services oriented. Tease applications apart into small semi-autonomous services that can be consumed easily so that the resulting systems are easier to understand, extend and adapt.

Kubernetes was designed from the start to make these capabilities available to everyone, and built by the same engineers that built the system internally known as Borg. For many users the promise of ‘Google style app management’ is interesting, but they want to run these new classes of applications on the same set of physical resources as their existing workloads like Hadoop, Spark, Kafka, etc. Now they will have access to commercially supported offering that brings the two worlds together.

Mesosphere, one of the earliest supporters of the Kubernetes project, has been working closely with the core Kubernetes team to create a natural experience for users looking to get the best of both worlds, adding Kubernetes to every Mesos deployment they instantiate, whether it be in the public cloud, private cloud, or in a hybrid deployment model. This is well aligned with the overall Kubernetes vision of creating ubiquitous management framework that runs anywhere a container can. It will be interesting to see how you blend together the old world and the new on a commercially supported, versatile platform.

Craig McLuckie

Product Manager, Google and Kubernetes co-founder

↧

Borg: The Predecessor to Kubernetes

April 23, 2015, 6:00 am

≫ Next: Weekly Kubernetes Community Hangout Notes - April 24 2015

≪ Previous: Kubernetes and the Mesosphere DCOS

Google has been running containerized workloads in production for more than a decade. Whether it's service jobs like web front-ends and stateful servers, infrastructure systems like Bigtable and Spanner, or batch frameworks like MapReduce and Millwheel, virtually everything at Google runs as a container. Today, we took the wraps off of Borg, Google’s long-rumored internal container-oriented cluster-management system, publishing details at the academic computer systems conference Eurosys. You can find the paper here.

Kubernetes traces its lineage directly from Borg. Many of the developers at Google working on Kubernetes were formerly developers on the Borg project. We've incorporated the best ideas from Borg in Kubernetes, and have tried to address some pain points that users identified with Borg over the years.

To give you a flavor, here are four Kubernetes features that came from our experiences with Borg:

1) Pods. A pod is the unit of scheduling in Kubernetes. It is a resource envelope in which one or more containers run. Containers that are part of the same pod are guaranteed to be scheduled together onto the same machine, and can share state via local volumes.

Borg has a similar abstraction, called an alloc (short for “resource allocation”). Popular uses of allocs in Borg include running a web server that generates logs alongside a lightweight log collection process that ships the logto a cluster filesystem (not unlike fluentd or logstash); running a web server that serves data from a disk directory that is populated by a process that reads data from a cluster filesystem and prepares/stages it for the web server (not unlike a Content Management System); and running user-defined processing functions alongside a storage shard. Pods not only support these use cases, but they also provide an environment similar to running multiple processes in a single VM -- Kubernetes users can deploy multiple co-located, cooperating processes in a pod without having to give up the simplicity of a one-application-per-container deployment model.

2) Services. Although Borg’s primary role is to manage the lifecycles of tasks and machines, the applications that run on Borg benefit from many other cluster services, including naming and load balancing. Kubernetes supports naming and load balancing using the service abstraction: a service has a name and maps to a dynamic set of pods defined by a label selector (see next section). Any container in the cluster can connect to the service using the service name. Under the covers, Kubernetes automatically load-balances connections to the service among the pods that match the label selector, and keeps track of where the pods are running as they get rescheduled over time due to failures.

3) Labels. A container in Borg is usually one replica in a collection of identical or nearly identical containers that correspond to one tier of an Internet service (e.g. the front-ends for Google Maps) or to the workers of a batch job (e.g. a MapReduce). The collection is called a Job, and each replica is called a Task. While the Job is a very useful abstraction, it can be limiting. For example, users often want to manage their entire service (composed of many Jobs) as a single entity, or to uniformly manage several related instances of their service, for example separate canary and stable release tracks. At the other end of the spectrum, users frequently want to reason about and control subsets of tasks within a Job -- the most common example is during rolling updates, when different subsets of the Job need to have different configurations.

Kubernetes supports more flexible collections than Borg by organizing pods using labels, which are arbitrary key/value pairs that users attach to pods (and in fact to any object in the system). Users can create groupings equivalent to Borg Jobs by using a “job:<jobname>” label on their pods, but they can also use additional labels to tag the service name, service instance (production, staging, test), and in general, any subset of their pods. A label query (called a “label selector”) is used to select which set of pods an operation should be applied to. Taken together, labels and replication controllers allow for very flexible update semantics, as well as for operations that span the equivalent of Borg Jobs.

4) IP-per-Pod. In Borg, all tasks on a machine use the IP address of that host, and thus share the host’s port space. While this means Borg can use a vanilla network, it imposes a number of burdens on infrastructure and application developers: Borg must schedule ports as a resource; tasks must pre-declare how many ports they need, and take as start-up arguments which ports to use; the Borglet (node agent) must enforce port isolation; and the naming and RPC systems must handle ports as well as IP addresses.

Thanks to the advent of software-defined overlay networks such as flannel or those built into public clouds, Kubernetes is able to give every pod and service its own IP address. This removes the infrastructure complexity of managing ports, and allows developers to choose any ports they want rather than requiring their software to adapt to the ones chosen by the infrastructure. The latter point is crucial for making it easy to run off-the-shelf open-source applications on Kubernetes--pods can be treated much like VMs or physical hosts, with access to the full port space, oblivious to the fact that they may be sharing the same physical machine with other pods.

With the growing popularity of container-based microservice architectures, the lessons Google has learned from running such systems internally have become of increasing interest to the external DevOps community. By revealing some of the inner workings of our cluster manager Borg, and building our next-generation cluster manager as both an open-source project (Kubernetes) and a publicly available hosted service (Google Container Engine), we hope these lessons can benefit the broader community outside of Google and advance the state-of-the-art in container scheduling and cluster management.

↧

Weekly Kubernetes Community Hangout Notes - April 24 2015

April 29, 2015, 5:27 pm

≫ Next: AppC Support for Kubernetes through RKT

≪ Previous: Borg: The Predecessor to Kubernetes

Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Agenda:

Flocker and Kubernetes integration demo

Notes:

flocker and kubernetes integration demo
Demo steps and DIY instructions at: https://clusterhq.com/blog/data-migration-kubernetes-flocker/

Cool demo by Kai Davenport

Flocker Q/A

Does the file still exists on node1 after migration?

Luke: still exists, but unmounted and cannot be written to. Data persists and can be used. Working on support for multiple storage backend.

Brendan: Any plan this to make it a volume? So we don't need powerstrip?

Luke: Need to figure out interest to decide if we want to make it a first-class persistent disk provider in kube.
Brendan: Removing need for powerstrip would make it simple to use. Totally go for it.
Tim: Should take no more than 45 minutes to add it to kubernetes:)

Derek: Contrast this with persistent volumes and claims?

Luke: Not much difference, except for the novel ZFS based backend. Makes workloads really portable.
Tim: very different than network-based volumes. Its interesting that it is the only offering that allows upgrading media.
Brendan: claims, how does it look for replicated claims? eg Cassandra wants to have replicated data underneath. It would be efficient to scale up and down. Create storage on the fly based on load dynamically. Its step beyond taking snapshots - programmatically creating replicas with preallocation.
Tim: helps with auto-provisioning.

Brian: Does flocker requires any other component?

Kai: Flocker control service co-located with the master. (dia on blog post). Powerstrip + Powerstrip Flocker. Very interested in mpersisting state in etcd. It keeps metadata about each volume.
Brendan: In future, flocker can be a plugin and we'll take care of persistence. Post v1.0.
Brian: Interested in adding generic plugin for services like flocker.
Luke: Zfs can become really valuable when scaling to lot of containers on a single node.

Alex: Can flocker service can be run as a pod?

Kai: Yes, only requirement is the flocker control service should be able to talk to zfs agent. zfs agent needs to be installed on the host and zfs binaries need to be accessible.
Brendan: In theory, all zfs bits can be put it into a container with devices.
Luke: Yes, still working through cross-container mounting issue.
Tim: pmorie is working through it to make kubelet work in a container. Possible re-use.