Quantcast
Channel: Kubernetes – Production-Grade Container Orchestration
Viewing all 290 articles
Browse latest View live

Creating a Raspberry Pi cluster running Kubernetes, the installation (Part 2)

$
0
0

At Devoxx Belgium and Devoxx Morocco, Ray Tsang and I (Arjen Wassink) showed a Raspberry Pi cluster we built at Quintor running HypriotOS, Docker and Kubernetes. While we received many compliments on the talk, the most common question was about how to build a Pi cluster themselves! We’ll be doing just that, in two parts. The first part covered the shopping list for the cluster, and this second one will show you how to get kubernetes up and running . . .


Now you got your Raspberry Pi Cluster all setup, it is time to run some software on it. As mentioned in the previous blog I based this tutorial on the Hypriot linux distribution for the ARM processor. Main reason is the bundled support for Docker. I used this version of Hypriot for this tutorial, so if you run into trouble with other versions of Hypriot, please consider the version I’ve used.
First step is to make sure every Pi has Hypriot running, if not yet please check the getting started guide of them. Also hook up the cluster switch to a network so that Internet is available and every Pi get an IP-address assigned via DHCP. Because we will be running multiple Pi’s it is practical to give each Pi a unique hostname. I renamed my Pi’s to rpi-master, rpi-node-1, rpi-node-2, etc for my convenience. Note that on Hypriot the hostname is set by editing the /boot/occidentalis.txt file, not the /etc/hostname. You could also set the hostname using the Hypriot flash tool.

The most important thing about running software on a Pi is the availability of an ARM distribution. Thanks to Brendan Burns, there are Kubernetes components for ARM available in the Google Cloud Registry. That’s great. The second hurdle is how to install Kubernetes. There are two ways; directly on the system or in a Docker container. Although the container support has an experimental status, I choose to go for that because it makes it easier to install Kubernetes for you. Kubernetes requires several processes (etcd, flannel, kubeclt, etc) to run on a node, which should be started in a specific order. To ease that, systemd services are made available to start the necessary processes in the right way. Also the systemd services make sure that Kubernetes is spun up when a node is (re)booted. To make the installation real easy I created an simple install script for the master node and the worker nodes. All is available at Github. So let’s get started now!

Installing the Kubernetes master node

First we will be installing Kubernetes on the master node and add the worker nodes later to the cluster. It comes basically down to getting the git repository content and executing the installation script.

$ curl -L -o k8s-on-rpi.zip https://github.com/awassink/k8s-on-rpi/archive/master.zip
$ apt-get update
$ apt-get install unzip
$ unzip k8s-on-rpi.zip
$ k8s-on-rpi-master/install-k8s-master.sh

The install script will install five services:
  • docker-bootstrap.service - is a separate Docker daemon to run etcd and flannel because flannel needs to be running before the standard Docker daemon (docker.service) because of network configuration.
  • k8s-etcd.service - is the etcd service for storing flannel and kubelet data.
  • k8s-flannel.service - is the flannel process providing an overlay network over all nodes in the cluster.
  • docker.service - is the standard Docker daemon, but with flannel as a network bridge. It will run all Docker containers.
  • k8s-master.service - is the kubernetes master service providing the cluster functionality.

The basic details of this installation procedure is also documented in the Getting Started Guide of Kubernetes. Please check it to get more insight on how a multi node Kubernetes cluster is setup.

Let’s check if everything is working correctly. Two docker daemon processes must be running.
$ ps -ef|grep docker
root       302     1  0 04:37 ?        00:00:14 /usr/bin/docker daemon -H unix:///var/run/docker-bootstrap.sock -p /var/run/docker-bootstrap.pid --storage-driver=overlay --storage-opt dm.basesize=10G --iptables=false --ip-masq=false --bridge=none --graph=/var/lib/docker-bootstrap
root       722     1 11 04:38 ?        00:16:11 /usr/bin/docker -d -bip=10.0.97.1/24 -mtu=1472 -H fd:// --storage-driver=overlay -D

The etcd and flannel containers must be up.
$ docker -H unix:///var/run/docker-bootstrap.sock ps
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS               NAMES
4855cc1450ff        andrewpsuedonym/flanneld    "flanneld --etcd-endp"  2 hours ago         Up 2 hours                              k8s-flannel
ef410b986cb3        andrewpsuedonym/etcd:2.1.1   "/bin/etcd --addr=127"  2 hours ago         Up 2 hours                              k8s-etcd

The hyperkube kubelet, apiserver, scheduler, controller and proxy must be up.
$ docker ps
CONTAINER ID        IMAGE                                           COMMAND                  CREATED             STATUS              PORTS               NAMES
a17784253dd2        gcr.io/google_containers/hyperkube-arm:v1.1.2   "/hyperkube controlle"  2 hours ago         Up 2 hours                              k8s_controller-manager.7042038a_k8s-master-127.0.0.1_default_43160049df5e3b1c5ec7bcf23d4b97d0_2174a7c3
a0fb6a169094        gcr.io/google_containers/hyperkube-arm:v1.1.2   "/hyperkube scheduler"  2 hours ago         Up 2 hours                              k8s_scheduler.d905fc61_k8s-master-127.0.0.1_default_43160049df5e3b1c5ec7bcf23d4b97d0_511945f8
d93a94a66d33        gcr.io/google_containers/hyperkube-arm:v1.1.2   "/hyperkube apiserver"  2 hours ago         Up 2 hours                              k8s_apiserver.f4ad1bfa_k8s-master-127.0.0.1_default_43160049df5e3b1c5ec7bcf23d4b97d0_b5b4936d
db034473b334        gcr.io/google_containers/hyperkube-arm:v1.1.2   "/hyperkube kubelet -"  2 hours ago         Up 2 hours                              k8s-master
f017f405ff4b        gcr.io/google_containers/hyperkube-arm:v1.1.2   "/hyperkube proxy --m"  2 hours ago         Up 2 hours                              k8s-master-proxy

Deploying the first pod and service on the cluster

When that’s looking good we’re able to access the master node of the Kubernetes cluster with kubectl. Kubectl for ARM can be downloaded from googleapis storage. kubectl get nodes shows which cluster nodes are registered with its status. The master node is named 127.0.0.1.
$ curl -fsSL -o /usr/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/v1.1.2/bin/linux/arm/kubectl
$ kubectl get nodes
NAME              LABELS                                   STATUS    AGE
127.0.0.1        kubernetes.io/hostname=127.0.0.1         Ready     1h

An easy way to test the cluster is by running a busybox docker image for ARM. kubectl run can be used to run the image as a container in a pod. kubectl get pods shows the pods that are registered with its status.
$ kubectl run busybox --image=hypriot/rpi-busybox-httpd
$ kubectl get pods -o wide
NAME                   READY     STATUS    RESTARTS   AGE       NODE
busybox-fry54         1/1       Running  1          1h        127.0.0.1
k8s-master-127.0.0.1   3/3       Running   6          1h        127.0.0.1

Now the pod is running but the application is not generally accessible. That can be achieved by creating a service. The cluster IP-address is the IP-address the service is avalailable within the cluster. Use the IP-address of your master node as external IP and the service becomes available outside of the cluster (e.g. at http://192.168.192.161 in my case).
$ kubectl expose rc busybox --port=90 --target-port=80 --external-ip=<ip-address-master-node>
$ kubectl get svc
NAME         CLUSTER_IP   EXTERNAL_IP       PORT(S)   SELECTOR      AGE
busybox     10.0.0.87    192.168.192.161  90/TCP    run=busybox   1h
kubernetes   10.0.0.1     <none>            443/TCP   <none>        2h
$ curl http://10.0.0.87:90/
<html>
<head><title>Pi armed with Docker by Hypriot</title>
 <body style="width: 100%; background-color: black;">
   <div id="main" style="margin: 100px auto 0 auto; width: 800px;">
     <img src="pi_armed_with_docker.jpg" alt="pi armed with docker" style="width: 800px">
   </div>
 </body>
</html>

Installing the Kubernetes worker nodes

The next step is installing Kubernetes on each worker node and add it to the cluster. This also comes basically down to getting the git repository content and executing the installation script. Though in this installation the k8s.conf file needs to be copied on forehand and edited to contain the IP-address of the master node.

$ curl -L -o k8s-on-rpi.zip https://github.com/awassink/k8s-on-rpi/archive/master.zip
$ apt-get update
$ apt-get install unzip
$ unzip k8s-on-rpi.zip
$ mkdir /etc/kubernetes
$ cp k8s-on-rpi-master/rootfs/etc/kubernetes/k8s.conf /etc/kubernetes/k8s.conf
### Change the ip-address in /etc/kubernetes/k8s.conf to match the master node ###
$ k8s-on-rpi-master/install-k8s-worker.sh

The install script will install four services. These are the quite similar to ones on the master node, but with the difference that no etcd service is running and the kubelet service is configured as worker node.
Once all the services on the worker node are up and running we can check that the node is added to the cluster on the master node.
$ kubectl get nodes
NAME              LABELS                                   STATUS    AGE
127.0.0.1         kubernetes.io/hostname=127.0.0.1         Ready     2h
192.168.192.160  kubernetes.io/hostname=192.168.192.160   Ready    1h
$ kubectl scale --replicas=2 rc/busybox
$ kubectl get pods -o wide
NAME                   READY     STATUS    RESTARTS   AGE       NODE
busybox-fry54          1/1       Running   1          1h        127.0.0.1
busybox-j2slu         1/1       Running   0          1h        192.168.192.160
k8s-master-127.0.0.1   3/3       Running   6          2h        127.0.0.1

Enjoy your Kubernetes cluster!

Congratulations! You now have your Kubernetes Raspberry Pi cluster running and can start playing with Kubernetes and start learning. Checkout the Kubernetes User Guide to find out what you all can do. And don’t forget to pull some plugs occasionally like Ray and I do :-)

Arjen Wassink, Java Architect and Team Lead, Quintor

Simple leader election with Kubernetes

$
0
0

Overview 


Kubernetes simplifies the deployment and operational management of services running on clusters. However, it also simplifies the development of these services. In this post we'll see how you can use Kubernetes to easily perform leader election in your distributed application. Distributed applications usually replicate the tasks of a service for reliability and scalability, but often it is necessary to designate one of the replicas as the leader who is responsible for coordination among all of the replicas.

Typically in leader election, a set of candidates for becoming leader is identified. These candidates all race to declare themselves the leader. One of the candidates wins and becomes the leader. Once the election is won, the leader continually “heartbeats” to renew their position as the leader, and the other candidates periodically make new attempts to become the leader. This ensures that a new leader is identified quickly, if the current leader fails for some reason.

Implementing leader election usually requires either deploying software such as ZooKeeper, etcd or Consul and using it for consensus, or alternately, implementing a consensus algorithm on your own. We will see below that Kubernetes makes the process of using leader election in your application significantly easier.

Implementing leader election in Kubernetes 


The first requirement in leader election is the specification of the set of candidates for becoming the leader. Kubernetes already uses Endpoints to represent a replicated set of pods that comprise a service, so we will re-use this same object. (aside: You might have thought that we would use ReplicationControllers, but they are tied to a specific binary, and generally you want to have a single leader even if you are in the process of performing a rolling update)

To perform leader election, we use two properties of all Kubernetes API objects:

  • ResourceVersions - Every API object has a unique ResourceVersion, and you can use these versions to perform compare-and-swap on Kubernetes objects 
  • Annotations - Every API object can be annotated with arbitrary key/value pairs to be used by clients. 

Given these primitives, the code to use master election is relatively straightforward, and you can find it here. Let’s run it ourselves.

$ kubectl run leader-elector --image=gcr.io/google_containers/leader-elector:0.4 --replicas=3 -- --election=example
This creates a leader election set with 3 replicas:

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
leader-elector-inmr1 1/1 Running 0 13s
leader-elector-qkq00 1/1 Running 0 13s
leader-elector-sgwcq 1/1 Running 0 13s

To see which pod was chosen as the leader, you can access the logs of one of the pods, substituting one of your own pod’s names in place of

${pod_name}, (e.g. leader-elector-inmr1 from the above)

$ kubectl logs -f ${name}
leader is (leader-pod-name)
…  Alternately, you can inspect the endpoints object directly:  

# ‘example’ is the name of the candidate set from the above kubectl run … command
$ kubectl get endpoints example -o yaml
Now to validate that leader election actually works, in a different terminal, run:  

$ kubectl delete pods (leader-pod-name)
This will delete the existing leader. Because the set of pods is being managed by a replication controller, a new pod replaces the one that was deleted, ensuring that the size of the replicated set is still three. Via leader election one of these three pods is selected as the new leader, and you should see the leader failover to a different pod.  Because pods in Kubernetes have a grace period before termination, this may take 30-40 seconds.

The leader-election container provides a simple webserver that can serve on any address (e.g. http://localhost:4040). You can test this out by deleting the existing leader election group and creating a new one where you additionally pass in a --http=(host):(port) specification to the leader-elector image. This causes each member of the set to serve information about the leader via a webhook.

# delete the old leader elector group
$ kubectl delete rc leader-elector

# create the new group, note the --http=localhost:4040 flag
$ kubectl run leader-elector --image=gcr.io/google_containers/leader-elector:0.4 --replicas=3 -- --election=example --http=0.0.0.0:4040

# create a proxy to your Kubernetes api server
$ kubectl proxy

You can then access:

http://localhost:8001/api/v1/proxy/namespaces/default/pods/(leader-pod-name):4040/
And you will see:
{"name":"(name-of-leader-here)"}


Leader election with sidecars 


Ok, that’s great, you can do leader election and find out the leader over HTTP, but how can you use it from your own application? This is where the notion of sidecars come in. In Kubernetes, Pods are made up of one or more containers. Often times, this means that you add sidecar containers to your main application to make up a Pod. (for a much more detailed treatment of this subject see my earlier blog post).

The leader-election container can serve as a sidecar that you can use from your own application. Any container in the Pod that’s interested in who the current master is can simply access http://localhost:4040 and they’ll get back a simple JSON object that contains the name of the current master. Since all containers in a Pod share the same network namespace, there’s no service discovery required!

For example, here is a simple Node.js application that connects to the leader election sidecar and prints out whether or not it is currently the master. The leader election sidecar sets its identifier to `hostname` by default.

var http = require('http');
// This will hold info about the current master
var master = {};

// The web handler for our nodejs application
var handleRequest = function(request, response) {
response.writeHead(200);
response.end("Master is " + master.name);
};

// A callback that is used for our outgoing client requests to the sidecar
var cb = function(response) {
var data = '';
response.on('data', function(piece) { data = data + piece; });
response.on('end', function() { master = JSON.parse(data); });
};

// Make an async request to the sidecar at http://localhost:4040
var updateMaster = function() {
var req = http.get({host: 'localhost', path: '/', port: 4040}, cb);
req.on('error', function(e) { console.log('problem with request: ' + e.message); });
req.end();
};

// Set up regular updates
updateMaster();
setInterval(updateMaster, 5000);

// set up the web server
var www = http.createServer(handleRequest);
www.listen(8080);
Of course, you can use this sidecar from any language that you choose that supports HTTP and JSON.

Conclusion 


Hopefully I’ve shown you how easy it is to build leader election for your distributed application using Kubernetes. In future installments we’ll show you how Kubernetes is making building distributed systems even easier. In the meantime, head over to Google Container Engine or kubernetes.io to get started with Kubernetes.

Why Kubernetes doesn’t use libnetwork

$
0
0
Kubernetes has had a very basic form of network plugins since before version 1.0 was released — around the same time as Docker's libnetwork and Container Network Model (CNM) was introduced. Unlike libnetwork, the Kubernetes plugin system still retains its "alpha" designation. Now that Docker's network plugin support is released and supported, an obvious question we get is why Kubernetes has not adopted it yet. After all, vendors will almost certainly be writing plugins for Docker — we would all be better off using the same drivers, right?

Before going further, it's important to remember that Kubernetes is a system that supports multiple container runtimes, of which Docker is just one. Configuring networking is a facet of each runtime, so when people ask "will Kubernetes support CNM?" what they really mean is "will kubernetes support CNM drivers with the Docker runtime?" It would be great if we could achieve common network support across runtimes, but that’s not an explicit goal.

Indeed, Kubernetes has not adopted CNM/libnetwork for the Docker runtime. In fact, we’ve been investigating the alternative Container Network Interface (CNI) model put forth by CoreOS and part of the App Container (appc) specification. Why? There are a number of reasons, both technical and non-technical.

First and foremost, there are some fundamental assumptions in the design of Docker's network drivers that cause problems for us.

Docker has a concept of "local" and "global" drivers. Local drivers (such as "bridge") are machine-centric and don’t do any cross-node coordination. Global drivers (such as "overlay") rely on libkv (a key-value store abstraction) to coordinate across machines. This key-value store is a another plugin interface, and is very low-level (keys and values, no semantic meaning). To run something like Docker's overlay driver in a Kubernetes cluster, we would either need cluster admins to run a whole different instance of consul, etcd or zookeeper (see multi-host networking), or else we would have to provide our own libkv implementation that was backed by Kubernetes.

The latter sounds attractive, and we tried to implement it, but the libkv interface is very low-level, and the schema is defined internally to Docker. We would have to either directly expose our underlying key-value store or else offer key-value semantics (on top of our structured API which is itself implemented on a key-value system). Neither of those are very attractive for performance, scalability and security reasons. The net result is that the whole system would significantly be more complicated, when the goal of using Docker networking is to simplify things.

For users that are willing and able to run the requisite infrastructure to satisfy Docker global drivers and to configure Docker themselves, Docker networking should "just work." Kubernetes will not get in the way of such a setup, and no matter what direction the project goes, that option should be available. For default installations, though, the practical conclusion is that this is an undue burden on users and we therefore cannot use Docker's global drivers (including "overlay"), which eliminates a lot of the value of using Docker's plugins at all.

Docker's networking model makes a lot of assumptions that aren’t valid for Kubernetes. In docker versions 1.8 and 1.9, it includes a fundamentally flawed implementation of "discovery" that results in corrupted /etc/hosts files in containers (docker #17190) — and this cannot be easily turned off. In version 1.10 Docker is planning to bundle a new DNS server, and it’s unclear whether this will be able to be turned off. Container-level naming is not the right abstraction for Kubernetes — we already have our own concepts of service naming, discovery, and binding, and we already have our own DNS schema and server (based on the well-established SkyDNS). The bundled solutions are not sufficient for our needs but are not disableable.

Orthogonal to the local/global split, Docker has both in-process and out-of-process ("remote") plugins. We investigated whether we could bypass libnetwork (and thereby skip the issues above) and drive Docker remote plugins directly. Unfortunately, this would mean that we could not use any of the Docker in-process plugins, "bridge" and "overlay" in particular, which again eliminates much of the utility of libnetwork.

On the other hand, CNI is more philosophically aligned with Kubernetes. It's far simpler than CNM, doesn't require daemons, and is at least plausibly cross-platform (CoreOS’s rkt container runtime supports it). Being cross-platform means that there is a chance to enable network configurations which will work the same across runtimes (e.g. Docker, Rocket, Hyper). It follows the UNIX philosophy of doing one thing well.

Additionally, it's trivial to wrap a CNI plugin and produce a more customized CNI plugin — it can be done with a simple shell script. CNM is much more complex in this regard. This makes CNI an attractive option for rapid development and iteration. Early prototypes have proven that it's possible to eject almost 100% of the currently hard-coded network logic in kubelet into a plugin.

We investigated writing a "bridge" CNM driver for Docker that ran CNI drivers. This turned out to be very complicated. First, the CNM and CNI models are very different, so none of the "methods" lined up. We still have the global vs. local and key-value issues discussed above. Assuming this driver would declare itself local, we have to get info about logical networks from Kubernetes.

Unfortunately, Docker drivers are hard to map to other control planes like Kubernetes. Specifically, drivers are not told the name of the network to which a container is being attached — just an ID that Docker allocates internally. This makes it hard for a driver to map back to any concept of network that exists in another system.

This and other issues have been brought up to Docker developers by network vendors, and are usually closed as "working as intended" (libnetwork #139, libnetwork #486, libnetwork #514, libnetwork #865, docker #18864), even though they make non-Docker third-party systems more difficult to integrate with. Throughout this investigation Docker has made it clear that they’re not very open to ideas that deviate from their current course or that delegate control. This is very worrisome to us, since Kubernetes complements Docker and adds so much functionality, but exists outside of Docker itself.

For all of these reasons we have chosen to invest in CNI as the Kubernetes plugin model. There will be some unfortunate side-effects of this. Most of them are relatively minor (for example, docker inspect will not show an IP address), but some are significant. In particular, containers started by docker run might not be able to communicate with containers started by Kubernetes, and network integrators will have to provide CNI drivers if they want to fully integrate with Kubernetes. On the other hand, Kubernetes will get simpler and more flexible, and a lot of the ugliness of early bootstrapping (such as configuring Docker to use our bridge) will go away.

As we proceed down this path, we’ll certainly keep our eyes and ears open for better ways to integrate and simplify. If you have thoughts on how we can do that, we really would like to hear them — find us on slack or on our network SIG mailing-list.

Tim Hockin, Software Engineer, Google

Kubernetes Community Meeting Notes - 20160114

$
0
0

January 14 - RackN demo, testing woes, and KubeCon EU CFP.

  • Note taker: Joe Beda
  • Demonstration: Automated Deploy on Metal, AWS and others w/ Digital Rebar, Rob Hirschfeld  and Greg Althaus from RackN
    • Greg Althaus. CTO.  Digital Rebar is the product.  Bare metal provisioning tool.
    • Detect hardware, bring it up, configure raid, OS and get workload deployed.
    • Been working on Kubernetes workload.
    • Seeing trend to start in cloud and then move back to bare metal.
    • New provider model to use provisioning system on both cloud and bare metal.
    • UI, REST API, CLI
    • Demo: Packet -- bare metal as a service
      • 4 nodes running grouped into a “deployment”
      • Functional roles/operations selected per node.
      • Decomposed the kubernetes bring up into units that can be ordered and synchronized.  Dependency tree -- things like wait for etcd to be up before starting k8s master.
      • Using the Ansible playbook under the covers.
      • Demo brings up 5 more nodes -- packet will build those nodes
      • Pulled out basic parameters from the ansible playbook.  Things like the network config, dns set up, etc.
        • Flannel now with work on opencontrail
      • Hierarchy of roles pulls in other components -- making a node a master brings in a bunch of other roles that are necessary for that.
      • Has all of this combined into a command line tool with a simple config file.
    • Forward: extending across multiple clouds for test deployments.  Also looking to create split/replicated across bare metal and cloud.
    • Q: secrets?
      A: using ansible playbooks.  Builds own certs and then distributes them.  Wants to abstract them out and push that stuff upstream.
    • Q: Do you support bringing up from real bare metal with PXE boot?
      A: yes -- will discover bare metal systems and install OS, install ssh keys, build networking, etc.
  • [from SIG-scalability] Q: What is the status of moving to golang 1.5?
    A: At HEAD we are 1.5 but will support 1.4 also. Some issues with flakiness but looks like things are stable now.  
    • Also looking to use the 1.5 vendor experiment.  Move away from godep.  But can’t do that until 1.5 is the baseline.
    • Sarah: one of the things we are working on is rewards for doing stuff like this.  Cloud credits, tshirts, poker chips, ponies.
  • [from SIG-scalability] Q: What is the status of cleaning up the jenkins based submit queue? What can the community do to help out?
    A: It has been rocky the last few days.  There should be issues associated with each of these. There is a flake label on those issues.  
    • Still working on test federation.  More test resources now.  Happening slowly but hopefully faster as new people come up to speed.  Will be great to having lots of folks doing e2e tests on their environments.
    • Erick Fjeta is the new test lead
    • Brendan is happy to help share details on Jenkins set up but that shouldn’t be necessary.
    • Federation may use Jenkins API but doesn’t require Jenkins itself.
    • Joe bitches about the fact that running the e2e tests in the way Jenkins is tricky.  Brendan says it should be runnable easily.  Joe will take another look.
    • Conformance tests? etune did this but he isn’t here.  - revisit 20150121
  • Kubecon.io/EU CFP is open https://kubecon.io/call-for-proposals/
    • March 10-11 in London.  Venue to be announced this week.
    • Please send talks!  CFP deadline looks to be Feb 5.
      • Would love to see more talks from production users.
    • Lots of excitement.  Looks to be 700-800 people.  Bigger than SF version (560 ppl).
    • Buy tickets early -- early bird prices will end soon and price will go up 100 GBP.
    • Accommodations provided for speakers?
      • Potentially but not 100% certain.  Need to figure it out.
    • Q from Bob @ Samsung: Can we get more warning/planning for stuff like this:
      • A: Sarah -- I don’t hear about this stuff much in advance but will try to pull together a list.  Working to make the events page on kubernetes.io easier to use.
      • A: JJ -- we’ll make sure we give more info earlier for the next US conf.
  • Scale tests [Rob Hirschfeld from RackN] -- if you want to help coordinate on scale tests we’d love to help.
    • Bob invited Rob to join the SIG-scale group.
    • There is also a big bare metal cluster through the CNCF (from Intel) that will be useful too.  No hard dates yet on that.
  • Notes/video going to be posted on k8s blog. (Video for 20150114 wasn’t recorded.  Fail.)
To get involved in the Kubernetes community consider joining our Slack channel, taking a look at the Kubernetes project on GitHub, or join the Kubernetes-dev Google group. If you’re really excited, you can do all of the above and join us for the next community conversation - January 27th, 2016. Please add yourself or a topic you want to know about to the agenda and get a calendar invitation by joining this group.   
We missed recording this meeting, but you can check out the archive of Kubernetes Community Meetings.

Kubernetes Community Meeting Notes - 20160121

$
0
0

January 21 - Configuration, Federation and Testing, oh my. 

To get involved in the Kubernetes community consider joining our Slack channel, taking a look at the Kubernetes project on GitHub, or join the Kubernetes-dev Google group. If you’re really excited, you can do all of the above and join us for the next community conversation -- January 27th, 2016. Please add yourself or a topic you want to know about to the agenda and get a calendar invitation by joining this group.

Still want more Kubernetes? Check out the recording of this meeting and the growing of the archive of Kubernetes Community Meetings.

State of the Container World, January 2016

$
0
0
At the start of the new year, we sent out a survey to gauge the state of the container world. We’re ready to send the February edition, but before we do, let’s take a look at the January data from the 119 responses (thank you for participating!).

A note about these numbers: First, you may notice that the numbers don’t add up to 100%, the choices were not exclusive in most cases and so percentages given are the percentage of all respondents who selected a particular choice. Second, while we attempted to reach a broad cross-section of the cloud community, the survey was initially sent out via Twitter to followers of @brendandburns, @kelseyhightower, @sarahnovotny, @juliaferraioli, @thagomizer_rb, so the audience is likely not a perfect cross-section. We’re working to broaden our sample size (have I mentioned our February survey? Come take it now).

Now, without further ado, the data:

First off, lots of you are using containers! 71% are currently using containers, while 24% of you are considering using them soon. Obviously this indicates a somewhat biased sample set. Numbers for container usage in the broader community vary, but are definitely lower than 71%.  Consequently, take all of the rest of these numbers with a grain of salt.

So what are folks using containers for? More than 80% of respondents are using containers for development, while only 50% are using containers for production. But you plan to move to production soon, as 78% of container users said that you were planning on moving to production sometime soon.

Where do you deploy containers? Your laptop was the clear winner here, with 53% of folks deploying to laptops. Next up was 44% of people running on their own VMs (Vagrant? OpenStack? we’ll try dive into this in the February survey), followed by 33% of folks running on physical infrastructure, and 31% on public cloud VMs.

And how are you deploying containers? 54% of you are using Kubernetes, awesome to see, though likely somewhat biased by the sample set (see the notes above), possibly more surprising, 45% of you are using shell scripts. Is it because of the extensive (and awesome) Bash scripting going on in the Kubernetes repository? Go on, you can tell me the truth…  Rounding out the numbers, 25% are using CAPS (Chef/Ansible/Puppet/Salt) systems, and roughly 13% are using Docker Swarm, Mesos or other systems.

Finally, we asked people for free-text answers about the challenges of working with containers. Some of the most interesting answers are grouped and reproduced here:

Development Complexity

  • “Silo'd development environments / workflows can be fragmented, ease of access to tools like logs is available when debugging containers but not intuitive at times, massive amounts of knowledge is required to grasp the whole infrastructure stack and best practices from say deploying / updating kubernetes, to underlying networking etc.”
  • “Migrating developer workflow. People uninitiated with containers, volumes, etc just want to work.”

Security

  • “Network Security”
  • “Secrets”

Immaturity

  • “Lack of a comprehensive non-proprietary standard (i.e. non-Docker) like e.g runC / OCI”
  • “Still early stage with few tools and many missing features.”
  • “Poor CI support, a lot of tooling still in very early days.”
  • "We've never done it that way before."

Complexity

  • “Networking support, providing ip per pod on bare metal for kubernetes”
  • “Clustering is still too hard”
  • “Setting up Mesos and Kubernetes too damn complicated!!”

Data

  • “Lack of flexibility of volumes (which is the same problem with VMs, physical hardware, etc)”
  • “Persistency”
  • “Storage”
  • “Persistent Data”

Download the full survey results here (CSV file).

Update: 2/1/2015 - Fixed the CSV link.

-- Brendan Burns, Software Engineer, Google

Kubernetes Community Meeting Notes - 20160128

$
0
0

January 28 - 1.2 release update, Deis demo, flaky test surge and SIGs


The Kubernetes contributing community meets once a week to discuss the project's status via a videoconference. Here are the notes from the latest meeting.

  • Note taker: Erin Boyd
  • Discuss process around code freeze/code slush (TJ Goltermann)
    • Code wind down was happening during holiday (for 1.1)
    • Releasing ~ every 3 months
    • Build stability is still missing
    • Issue on Transparency (Bob Wise)
      • Email from Sarah for call to contribute (Monday, January 25)
        • Concern over publishing dates / understanding release schedule /etc…
    • Release targeted for early March
      • Where does one find information on the release schedule with the committed features?
        • For 1.2 - Send email / Slack to TJ
        • For 1.3 - Working on better process to communicate to the community
          • Twitter
          • Wiki
          • GitHub Milestones
    • How to better communicate issues discovered in the SIG
      • AI: People need to email the kubernetes-dev@ mailing list with summary of findings
      • AI: Each SIG needs a note taker
  • Release planning vs Release testing
    • Testing SIG lead Ike McCreery
      • Also part of the testing infrastructure team at Google
      • Community being able to integrate into the testing framework
        • Federated testing
    • Release Manager = David McMahon
      • Request to  introduce him to the community meeting
  • Demo: Jason Hansen Deis
  • Testing
    • Called for community interaction
    • Need to understand friction points from community
      • Better documentation
      • Better communication on how things “should work"
    • Internally, Google is having daily calls to resolve test flakes
    • Started up SIG testing meetings (Tuesday at 10:30 am PT)
    • Everyone wants it, but no one want to pony up the time to make it happen
      • Google is dedicating headcount to it (3-4 people, possibly more)
  • Best practices for labeling
      • Are there tools built on top of these to leverage
      • AI: Generate artifact for labels and what they do (Create doc)
        • Help Wanted Label - good for new community members
        • Classify labels for team and area
          • User experience, test infrastructure, etc..
  • SIG Config (not about deployment)
    • Any interest in ansible, etc.. type
  • SIG Scale meeting (Bob Wise & Tim StClair)
    • Tests related to performance SLA get relaxed in order to get the tests to pass
      • exposed process issues
      • AI: outline of a proposal for a notice policy if things are being changed that are critical to the system (Bob Wise/Samsung)
        • Create a Best Practices of set of constants into well documented place

    To get involved in the Kubernetes community consider joining our Slack channel, taking a look at the Kubernetes project on GitHub, or join the Kubernetes-dev Google group. If you’re really excited, you can do all of the above and join us for the next community conversation — February 4th, 2016. Please add yourself or a topic you want to know about to the agenda and get a calendar invitation by joining this group.

    The full recording is available on YouTube in the growing archive of Kubernetes Community Meetings.

    Kubernetes Community Meeting Notes - 20160204

    $
    0
    0

    February 4th - rkt demo (congratulations on the 1.0, CoreOS!), eBay puts k8s on Openstack and considers Openstack on k8s, SIGs, and flaky test surge makes progress.


    The Kubernetes contributing community meets most Thursdays at 10:00PT to discuss the project's status via a videoconference. Here are the notes from the latest meeting.

    • Note taker: Rob Hirschfeld
    • Demo (20 min): CoreOS rkt + Kubernetes [Shaya Potter]
      • expect to see integrations w/ rkt & k8s in the coming months (“rkt-netes”). not integrated into the v1.2 release.
      • Shaya gave a demo (8 minutes into meeting for video reference)
        • CLI of rkt shown spinning up containers
        • [note: audio is garbled at points]
        • Discussion about integration w/ k8s & rkt
        • rkt community sync next week: https://groups.google.com/forum/#!topic/rkt-dev/FlwZVIEJGbY
        • Dawn Chen:
          • The remaining issues of integrating rkt with kubernetes: 1) cadivsor 2) DNS 3) bugs related to logging
          • But need more work on e2e test suites
    • Use Case (10 min): eBay k8s on OpenStack and OpenStack on k8s [Ashwin Raveendran]
      • eBay is currently running Kubernetes on OpenStack
      • Goal for eBay is to manage the OpenStack control plane w/ k8s.  Goal would be to achieve upgrades
      • OpenStack Kolla creates containers for the control plane.  Uses Ansible+Docker for management of the containers.  
      • Working on k8s control plan management - Saltstack is proving to be a management challenge at the scale they want to operate.  Looking for automated management of the k8s control plane.
    • SIG Report
    • Testing update [Jeff, Joe, and Erick]
      • Working to make the workflow about contributing to K8s easier to understanding
        • pull/19714 has flow chart of the bot flow to help users understand
      • Need a consistent way to run tests w/ hacking config scripts (you have to fake a Jenkins process right now)
      • Want to create necessary infrastructure to make test setup less flaky
      • want to decouple test start (single or full) from Jenkins
      • goal is to get to point where you have 1 script to run that can be pointed to any cluster
      • demo included Google internal views - working to try get that external.
      • want to be able to collect test run results
      • Bob Wise calls for testing infrastructure to be a blocker on v1.3
      • Long discussion about testing practices…
        • consensus that we want to have tests work over multiple platforms.
        • would be helpful to have a comprehensive state dump for test reports
        • “phone-home” to collect stack traces - should be available
    • 1.2 Release Watch
    • CoC [Sarah]
    • GSoC [Sarah]
      • need mentors!!  deadline is very soon.

    To get involved in the Kubernetes community consider joining our Slack channel, taking a look at the Kubernetes project on GitHub, or join the Kubernetes-dev Google group. If you’re really excited, you can do all of the above and join us for the next community conversation — February 11th, 2016. Please add yourself or a topic you want to know about to the agenda and get a calendar invitation by joining this group.   

    The full recording is available on YouTube in the growing archive of Kubernetes Community Meetings.





    ShareThis: Kubernetes In Production

    $
    0
    0
    Today’s guest blog post is by Juan Valencia, Technical Lead at ShareThis, a service that helps website publishers drive engagement and consumer sharing behavior across social networks.
    ShareThis has grown tremendously since its first days as a tiny widget that allowed you to share to your favorite social services. It now serves over 4.5 million domains per month, helping publishers create a more authentic digital experience.
    Fast growth came with a price. We leveraged technical debt to scale fast and to grow our products, particularly when it came to infrastructure. As our company expanded, the infrastructure costs mounted as well - both in terms of inefficient utilization and in terms of people costs. About 1 year ago, it became clear something needed to change.

    TL;DRKubernetes has been a key component for us to reduce technical debt in our infrastructure by:

    • Fostering the Adoption of Docker
    • Simplifying Container Management
    • Onboarding Developers On Infrastructure
    • Unlocking Continuous Integration and Delivery
    We accomplished this by radically adopting Kubernetes and switching our DevOps team to a Cloud Platform team that worked in terms of containers and microservices. This included creating some tools to get around our own legacy debt.

    The Problem

    Alas, the cloud was new and we were young. We started with a traditional data-center mindset.  We managed all of our own services: MySQL, Cassandra, Aerospike, Memcache, you name it.  We set up VM’s just like you would traditional servers, installed our applications on them, and managed them in Nagios or Ganglia.
    Unfortunately, this way of thinking was antithetical to a cloud-centric approach. Instead of thinking in terms of services, we were thinking in terms of servers. Instead of using modern cloud approaches such as autoscaling, microservices, or even managed VM’s, we were thinking in terms of scripted setups, server deployments, and avoiding vendor lock-in.
    These ways of thinking were not bad per se, they were simply inefficient. They weren’t taking advantage of the changes to the cloud that were happening very quickly. It also meant that when changes needed to take place, we were treating those changes as big slow changes to a datacenter, rather than small fast changes to the cloud.

    The Solution

    Kubernetes As A Tool To Foster Docker Adoption

    As Docker became more of a force in our industry, engineers at ShareThis also started experimenting with it to good effect. It soon became obvious that we needed to have a working container for every app in our company just so we could simplify testing in our development environment.
    Some apps moved quickly into Docker because they were simple and had few dependencies.  For those that had small dependencies, we were able to manage using Fig (Fig was the original name of Docker Compose). Still, many of our data pipelines or interdependent apps were too gnarly to be directly dockerized. We still wanted to do it, but Docker was not enough.
    In late 2015, we were frustrated enough with our legacy infrastructure that we finally bit the bullet. We evaluated Docker’s tools, ECS, Kubernetes, and Mesosphere. It was quickly obvious that Kubernetes was in a more stable and user friendly state than its competitors for our infrastructure. As a company, we could solidify our infrastructure on Docker by simply setting the goal of having all of our infrastructure on Kubernetes.
    Engineers were skeptical at first. However, once they saw applications scale effortlessly into hundreds of instances per application, they were hooked. Now, not only was there the pain points driving us forward into Docker and by extension Kubernetes, but there was genuine excitement for the technology pulling us in. This has allowed us to make an incredibly difficult migration fairly quickly. We now run Kubernetes in multiple regions on about 65 large VMs and increasing to over 100 in the next couple months. Our Kubernetes cluster currently processes 800 million requests per day with the plan to process over 2 billion requests per day in the coming months.

    Kubernetes As A Tool To Manage Containers

    Our earliest use of Docker was promising for development, but not so much so for production. The biggest friction point was the inability to manage Docker components at scale. Knowing which containers were running where, what version of a deployment was running, what state an app was in, how to manage subnets and VPCs, etc, plagued any chance of it going to production. The tooling required would have been substantial.


    When you look at Kubernetes, there are several key features that were immediately attractive:
    • It is easy to install on AWS (where all our apps were running)
    • There is a direct path from a Dockerfile to a replication controller through a yaml/json file
    • Pods are able to scale in number easily
    • We can easily scale the number of VM’s running on AWS in a Kubernetes cluster
    • Rolling deployments and rollback are built into the tooling
    • Each pod gets monitored through health checks
    • Service endpoints are managed by the tool
    • There is an active and vibrant community


    Unfortunately, one of the biggest pain points was that the tooling didn’t solve our existing legacy infrastructure, it just provided an infrastructure to move onto. There were still a variety of network quirks which disallowed us from directly moving our applications onto a new VPC. In addition, the reworking of so many applications required developers to jump onto problems that have classically been solved by sys admins and operations teams.

    Kubernetes As A Tool For Onboarding Developers On Infrastructure

    When we decided to make the switch from what was essentially a Chef-run setup to Kubernetes, I do not think we understood all of the pain points that we would hit.  We ran our servers in a variety of different ways in a variety of different network configurations that were considerably different than the clean setup that you find on a fresh Kubernetes VPC.  
    In production we ran in both AWS VPCs and AWS classic across multiple regions. This means that we managed several subnets with different access controls across different applications. Our most recent applications were also very secure, having no public endpoints. This meant that we had a combination of VPC peering, network address translation (NAT), and proxies running in varied configurations.
    In the Kubernetes world, there’s only the VPC.  All the pods can theoretically talk to each other, and services endpoints are explicitly defined. It’s easy for the developer to gloss over some of the details and it removes the need for operations (mostly).  
    We made the decision to convert all of our infrastructure / DevOps developers into application developers (really!). We had already started hiring them on the basis of their development skills rather than their operational skills anyway, so perhaps that is not as wild as it sounds.
    We then made the decision to onboard our entire engineering organization onto Operations. Developers are flexible, they enjoy challenges, and they enjoy learning. It was remarkable.  After 1 month, our organization went from having a few DevOps folks, to having every engineer capable of modifying our architecture.
    The training ground for onboarding on networking, productionization, problem solving, root cause analysis, etc, was getting Kubernetes into prod at scale. After the first month, I was biting my nails and worrying about our choices. After 2 months, it looked like it might some day be viable. After 3 months, we were deploying 10 times per week. After 4 months, 40 apps per week. Only 30% of our apps have been migrated, yet the gains are not only remarkable, they are astounding. Kubernetes allowed us to go from an infrastructure-is-slowing-us-down-ugh! organization, to an infrastructure-is-speeding-us-up-yay! organization.

    Kubernetes As A Means To Unlock Continuous Integration And Delivery

    How did we get to 40+ deployments per week? Put simply, continuous integration and deployment (CI/CD) came as a byproduct of our migration. Our first application in Kubernetes was Jenkins, and every app that went in also was added to Jenkins. As we moved forward, we made Jenkins more automatic until pods were being added and taken from Kubernetes faster than we could keep track.  
    Interestingly, our problems with scaling are now about wanting to push out too many changes at once and people having to wait until their turn. Our goal is to get 100 deployments per week through the new infrastructure. This is achievable if we can continue to execute on our migration and on our commitment to a CI/CD process on Kubernetes and Jenkins.

    Next Steps

    We need to finish our migration. At this point the problems are mostly solved, the biggest difficulties are in the tedium of the task at hand. To move things out of our legacy infrastructure meant changing the network configurations to allow access to and from the Kubernetes VPC and across the regions. This is still a very real pain, and one we continue to address.  
    Some services do not play well in Kubernetes -- think stateful distributed databases. Luckily, we can usually migrate those to a 3rd party who will manage it for us. At the end of this migration, we will only be running pods on Kubernetes. Our infrastructure will become much simpler.
    All these changes do not come for free; committing our entire infrastructure to Kubernetes means that we need to have Kubernetes experts.  Our team has been unblocked in terms of infrastructure and they are busy adding business value through application development (as they should). However, we do not (yet) have committed engineers to stay up to date with changes to Kubernetes and cloud computing.  
    As such, we have transferred one engineer to a new “cloud platform team” and will hire a couple of others (have I mentioned we’re hiring!). They will be responsible for developing tools that we can use to interface well with Kubernetes and manage all of our cloud resources. In addition, they will be working in the Kubernetes source code, part of Kubernetes SIGs, and ideally, pushing code into the open source project.

    Summary

    All in all, while the move to Kubernetes initially seemed daunting, it was far less complicated and disruptive than we thought. And the reward at the other end was a company that could respond as fast as our customers wanted.



    Kubernetes Community Meeting Notes - 20160211

    $
    0
    0

    February 11th - Pangaea Demo, #AWS SIG formed, release automation and documentation team introductions. 1.2 update and planning 1.3. 


    The Kubernetes contributing community meets most Thursdays at 10:00PT to discuss the project's status via videoconference. Here are the notes from the latest meeting.

    • Note taker: Rob Hirschfeld
    • Demo: Pangaea [Shahidh K Muhammed, Tanmai Gopal, and Akshaya Acharya]
      • Microservices packages
      • Focused on Application developers
      • Demo at recording +4 minutes
      • Single node kubernetes cluster — runs locally using Vagrant CoreOS image
      • Single user/system cluster allows use of DNS integration (unlike Compose)
        • prevents collisions
      • Can run locally or in cloud
      • Best contact is via the Pangaea repo
    • SIG Report
        • collecting operators and tools (inventory)
        • will set time for meeting:  recommendations?  time zone?
      • Hello world from SIG-AWS
        • running K8s on AWS
    • Release Automation and an introduction to David McMahon
      • Current is heavily documented but not automated
      • objectives
        • separate build and release
        • make it more of a software process (less manual & more repeatable)
        • target to have framework built (automation will not be ready next release)
    • Docs and k8s website redesign proposal and an introduction to John Mulhausen
      • Switching from native script munging to Jekyll (http://jekyllrb.com/docs/home/)
      • This will allow the system to build docs correctly from Github w/ minimal effort
      • Will be check-in triggered
      • Getting website style updates
      • Want to keep authoring really light
      • There will be some automated checks
      • Next week: preview of the new website during the community meeting
    • [@goltermann] 1.2 Release Watch (time +34 minutes)
      • code slush date: 2/9/2016
        • not 100% but close.  some resources still moving from beta to v1
      • no major features or refactors accepted
      • discussion about release criteria: we will hold release date for bugs
        • we’re getting accurate counts of bugs.  
        • hard to predict burn down at this point
    • Testing flake surge is over (one time event and then maintain test stability)
      • if you find a “flaky” test, then it’s a P0 to fix it.  Want to eliminate false fail test results
        • they are down by 75%!  (meaning, that 75% of them are eliminated)
    • 1.3 Planning (time +40 minutes)
      • working to cleanup the Github milestones — they should be a source of truth.  you can use Github for bug reporting
      • push off discussion while 1.2 crunch is under
      • Framework
        • dates
        • prioritization
        • feedback
      • Design Review meetings
      • General discussion about the PRD process — still at the beginning states

      • Working on a contributor conference
      • Rob suggested tracking relationships between PRD/Mgmr authors
      • PLEASE DO REVIEWS — talked about the way people are authorized to +2 reviews.

    To get involved in the Kubernetes community consider joining our Slack channel, taking a look at the Kubernetes project on GitHub, or join the Kubernetes-dev Google group. If you’re really excited, you can do all of the above and join us for the next community conversation — February 18th, 2016. Please add yourself or a topic you want to know about to the agenda and get a calendar invitation by joining this group.

    The full recording is available on YouTube in the growing archive of Kubernetes Community Meetings.

    Kubernetes Community Meeting Notes - 20160218

    $
    0
    0

    February 18th - kmachine demo, clusterops SIG formed, new k8s.io website preview, 1.2 update and planning 1.3

    The Kubernetes contributing community meets most Thursdays at 10:00PT to discuss the project's status via videoconference. Here are the notes from the latest meeting.
    • Note taker: Rob HIrschfeld
    • Demo (10 min): kmachine [Sebastien Goasguen]
      • started :01 intro video
      • looking to create mirror of Docker tools for Kubernetes (similar to machine, compose, etc)
      • kmachine (forked from Docker Machine, so has the same endpoints)
        • creates a single machine w/ kubernetes endpoint setup
        • demo showing AWS & Virtual Box
    • Use Case (10 min): started at :15
      • Network isolation for different namespaces
        • Pods in namespace1 should not be able to contact Pods in namespace2
        • No native implementation at this point
        • Project Calico and Openshift have plugins that enable this
          • Calico implements this at the moment here: https://github.com/projectcalico/calico-containers/blob/master/docs/cni/kubernetes/Policy.md
        • More details from Networking SIG
    • SIG Report starter
      • Cluster Ops launch meeting Friday (doc). [Rob Hirschfeld]
    • Time Zone Discussion [:22]
      • This timezone does not work for Asia.  
      • Considering rotation - once per month
      • Likely 5 or 6 PT
      • Rob suggested moving the regular meeting up a little
    • k8s.io website preview [John Mulhausen] [:27]
      • using github for docs.  you can fork and do a pull request against the site
      • will be it’s own kubernetes organization BUT not in the code repo
      • Google will offer a “doc bounty” where you can get GCP credits for working on docs
      • Uses Jekyll to generate the site (e.g. the ToC)
      • Principle will be to 100% GitHub Pages; no script trickery or plugins, just fork/clone, edit, and push
      • Hope to launch at Kubecon EU
      • Home Page Only Preview: http://kub.unitedcreations.xyz
    • 1.2 Release Watch [T.J. Goltermann] [:38]
    • 1.3 Planning update [T.J. Goltermann]
      • google 1.3 feature list to be presented 3/3
      • Community members (company or otherwise) commitments requested 3/17?
    • GSoC participation -- deadline 2/19  [Sarah Novotny]
      • if you’re interested in being a mentor, please reach out  to Sarah today.
    • March 10th meeting? [Sarah Novotny]
      • March 10th is KubeCon EU.  I’m inclined to cancel the community meeting.  Thoughts?
    To get involved in the Kubernetes community consider joining our Slack channel, taking a look at the Kubernetes project on GitHub, or join the Kubernetes-dev Google group. If you’re really excited, you can do all of the above and join us for the next community conversation — February 25th, 2016. Please add yourself or a topic you want to know about to the agenda and get a calendar invitation by joining this group.   
    The full recording is available on YouTube in the growing archive of Kubernetes Community Meetings.


    -- Kubernetes Community

    KubeCon EU 2016: Kubernetes Community in London

    $
    0
    0
    KubeCon EU 2016 is the inaugural European Kubernetes community conference that follows on the American launch in November 2015. KubeCon is fully dedicated to education and community engagement for Kubernetes enthusiasts, production users and the surrounding ecosystem.
    Come join us in London and hang out with hundreds from the Kubernetes community and experience a wide variety of deep technical expert talks and use cases.
    Don’t miss these great speaker sessions at the conference:
    • “Kubernetes Hardware Hacks: Exploring the Kubernetes API Through Knobs, Faders, and Sliders” by Ian Lewis and Brian Dorsey, Developer Advocate, Google -- http://sched.co/6Bl3
    • “rktnetes: what's new with container runtimes and Kubernetes” by Jonathan Boulle, Developer and Team Lead at CoreOS -- http://sched.co/6BY7

    Get your KubeCon EU tickets here.
    Venue Location: CodeNode - 10 South Pl, London, United Kingdom
    Accommodations: hotels
    Website: kubecon.io
    Twitter: @KubeConio #KubeCon
    Google is a proud Diamond sponsor of KubeCon EU 2016. Come to London next month, March 10th& 11th, and visit booth #13 to learn all about Kubernetes, Google Container Engine (GKE) and Google Cloud Platform!

    KubeCon is organized by KubeAcademy, LLC, a community-driven group of developers focused on the education of developers and the promotion of Kubernetes.

    -- Sarah Novotny, Kubernetes Community Manager, Google


    Kubernetes Community Meeting Notes - 20160225

    $
    0
    0

    February 25th - Redspread demo, 1.2 update and planning 1.3, newbie introductions, SIG-networking and a shout out to CoreOS blog post.

    The Kubernetes contributing community meets most Thursdays at 10:00PT to discuss the project's status via videoconference. Here are the notes from the latest meeting.
    • Note taker: [Ilan Rabinovich]
    • Quick call out for sharing presentations/slides [JBeda]
    • Demo (10 min):Redspread [Mackenzie Burnett, Dan Gillespie]
      • Just open sourced
      • YC company
      • Streamline tool to build/push/deploy to k8s in one command
      • Looking forward to offline development of k8s cluster.
      • Client only
        • reuses a lot of kubectl code
        • Convention on directory structure
      • Roadmap
        • Linking between Kube objects as an app definition.
        • Parameterization and versioning of configs (eg git for k8s)
      • Happily welcoming contributors and user feedback via github (https://github.com/redspread/spread or firstname@redspread.com)
      • Q/A
        • Brian Grant asked for feedback on how to reorganize the kubectl code base to make projects like redspread easier.
    • 1.2 Release Watch [T.J. Goltermann]
        • currently about 80 issues in the queue that need to be addressed before branching.
        • currently looks like March 7th may slip to later in the week, but up in the air until flakey tests are resolved.
        • non-1.2 changes may be delayed in review/merging until 1.2 stabilization work completes.
      • 1.3 release planning
        • March 17th meeting will discuss features from community members and Google. Bring your notes/plans to that meeting
    • Newbie Introductions
    • SIG Reports -
      • Networking [Tim Hockin]
        • Meets later today.
        • Working on a (proto)-specification for a 3rd party resource to describe network policies. (eg pod x can talk to service y, or frontends can/cannot talk to backends)
          • Currently ironing out naming/structure
          • Calico has a running demo that shows Calico project enforcing network policy based on an earlier form of the spec.
          • Shouldn’t require any changes to the 1.2 code
          • Goal is to submit it as part of the 1.3 cycle.
      • Scale [Bob Wise]
        • CoreOS Blog post on scheduler scaling- https://coreos.com/blog/improving-kubernetes-scheduler-performance.html
      • Cluster Ops [Rob Hirschfeld]
        • meeting last Friday went very well.  Discussed charter AND a working deployment
        • moved meeting to Thursdays @ 1 (so in 3 hours!)
        • Rob is posting a Cluster Ops announce on TheNewStack to recruit more members
    • GSoC participation -- no application submitted.  [Sarah Novotny]
    • Brian Grant has offered to review PRs that need attention for 1.2
    • Dynamic Provisioning
      • Currently overlaps a bit with the ubernetes work
      • PR in progress.
      • Should work in 1.2, but being targeted more in 1.3
    • Next meeting is March 3rd.
      • Demo from Weave on Kubernetes Anywhere
      • Another Kubernetes 1.2 update
      • Update from CNCF update
      • 1.3 commitments from google
    • No meeting on March 10th.  
    To get involved in the Kubernetes community consider joining our Slack channel, taking a look at the Kubernetes project on GitHub, or join the Kubernetes-dev Google group. If you’re really excited, you can do all of the above and join us for the next community conversation — March 3rd, 2016. Please add yourself or a topic you want to know about to the agenda and get a calendar invitation by joining this group.   
    The full recording is available on YouTube in the growing archive of Kubernetes Community Meetings. -- Kubernetes Community

    State of the Container World, February 2016

    $
    0
    0

    State of the container world, February 2016

    Hello, and welcome to the second installment of the Kubernetes state of the container world survey. At the beginning of February we sent out a survey about people’s usage of containers, and wrote about the results from the January survey. Here we are again, as before, while we try to reach a large and representative set of respondents, this survey was publicized across the social media account of myself and others on the Kubernetes team, so I expect some pro-container and Kubernetes bias in the data.We continue to try to get as large an audience as possible, and in that vein, please go and take the March survey and share it with your friends and followers everywhere! Without further ado, the numbers...

    Containers continue to gain ground

    In January, 71% of respondents were currently using containers, in February, 89% of respondents were currently using containers. The percentage of users not even considering containers also shrank from 4% in January to a surprising 0% in February. Will see if that holds consistent in March.Likewise, the usage of containers continued to march across the dev/canary/prod lifecycle. In all parts of the lifecycle, container usage increased:

    • Development: 80% -> 88%
    • Test: 67% -> 72%
    • Pre production: 41% -> 55%
    • Production: 50% -> 62%
    What is striking in this is that pre-production growth continued, even as workloads were clearly transitioned into true production. Likewise the share of people considering containers for production rose from 78% in January to 82% in February. Again we’ll see if the trend continues into March.

    Container and cluster sizes

    We asked some new questions in the survey too, around container and cluster sizes, and there were some interesting numbers:

    How many containers are you running?
    Screen Shot 2016-02-29 at 9.27.01 AM.png















    How many machines are you running containers on?


    Screen Shot 2016-02-29 at 9.27.15 AM.png














    So while container usage continues to grow, the size and scope continues to be quite modest, with more than 50% of users running fewer than 50 containers on fewer than 10 machines.

    Things stay the same

    Across the orchestration space, things seemed pretty consistent between January and February (Kubernetes is quite popular with folks (54% -> 57%), though again, please see the note at the top about the likely bias in our survey population. Shell scripts likewise are also quite popular and holding steady. You all certainly love your Bash (don’t worry, we do too ;)
    Likewise people continue to use cloud services both in raw IaaS form (10% on GCE, 30% on EC2, 2% on Azure) as well as cloud container services (16% for GKE, 11% on ECS, 1% on ACS). Though the most popular deployment target by far remains your laptop/desktop at ~53%.

    Raw data

    As always, the complete raw data is available in a spreadsheet here.

    Conclusions

    Containers continue to gain in popularity and usage. The world of orchestration is somewhat stabilizing, and cloud services continue to be a common place to run containers, though your laptop is even more popular.

    And if you are just getting started with containers (or looking to move beyond your laptop) please visit us at kubernetes.io and Google Container Engine. ‘Till next month, please get your friends, relatives and co-workers to take our March survey!



    Thanks!

    -- Brendan Burns, Software Engineer, Google

    Kubernetes in the Enterprise with Fujitsu’s Cloud Load Control

    $
    0
    0
    Today’s guest post is by Florian Walker, Product Manager at Fujitsu and working on Cloud Load Control, an offering focused on the usage of Kubernetes in an enterprise context. Florian tells us what potential Fujitsu sees in Kubernetes, and how they make it accessible to enterprises.
    Earlier this year, Fujitsu released its Kubernetes-based offering Fujitsu ServerViewCloud Load Control (CLC) to the public. Some might be surprised since Fujitsu’s reputation is not necessarily related to software development, but rather to hardware manufacturing and IT services. As a long-time member of the Linux foundation and founding member of the ​Open Container Initiative and the Cloud Native Computing Foundation, Fujitsu does not only build software, but is committed to open source software, and contributes to several projects, including Kubernetes. But we not only believe in Kubernetes as an open source project, we also chose it as the core of our offering, because it provides the best balance of feature set, resource requirements and complexity to run distributed applications at scale.
    Today, we want to take you on a short tour explaining the background of our offering, why we think Kubernetes is the right fit for your customers and what value Cloud Load Control provides on top of it.
    A long long time ago…
    In mid 2014 we looked at the challenges enterprises are facing in the context of digitization, where traditional enterprises experience that more and more competitors from the IT sector are pushing into the core of their markets. A big part of Fujitsu’s customers are such traditional businesses, so we considered how we could help them and came up with three basic principles:
    • Decouple applications from infrastructure - Focus on where the value for the customer is: the application.
    • Decompose applications - Build applications from smaller, loosely coupled parts. Enable reconfiguration of those parts depending on the needs of the business. Also encourage innovation by low-cost experiments.
    • Automate everything - Fight the increasing complexity of the first two points by introducing a high degree of automation.
    We found that Linux containers themselves cover the first point and touch the second. But at this time there was little support for creating distributed applications and running them managed automatically. We found Kubernetes as the missing piece.
    Not a free lunch
    The general approach of Kubernetes in managing containerized workload is convincing, but as we looked at it with the eyes of customers, we realized that it’s not a free lunch. Many  customers are medium-sized companies whose core business is often bound to strict data protection regulations. The top three requirements we identified are:
    • On-premise deployments (with the option for hybrid scenarios)
    • Efficient operations as part of a (much) bigger IT infrastructure
    • Enterprise-grade support, potentially on global scale
    We created Cloud Load Control with these requirements in mind. It is basically a distribution of Kubernetes targeted for on-premise use, primarily focusing on operational aspects of container infrastructure. We are committed to work with the community, and contribute all relevant changes and extensions upstream to the Kubernetes project.
    On-premise deployments
    As Kubernetes core developer Tim Hockin often puts it in histalks, Kubernetes is "a story with two parts" where setting up a Kubernetes cluster is not the easy part and often challenging due to variations in infrastructure. This is in particular true when it comes to production-ready deployments of Kubernetes. In the public cloud space, a customer could choose a service like Google Container Engine (GKE) to do this job. Since customers have less options on-premise, often they have to consider the deployment by themselves.
    Cloud Load Control addresses these issues. It enables customers to reliably and readily provision a production grade Kubernetes clusters on their own infrastructure, with the following benefits:
    • Proven setup process, lowers risk of problems while setting up the cluster
    • Reduction of provisioning time to minutes
    • Repeatable process, relevant especially for large, multi-tenant environments
    Cloud Load Control delivers these benefits for a range of platforms, starting from selected OpenStack distributions in the first versions of Cloud Load Control, and successively adding more platforms depending on customer demand.  We are especially excited about the option to remove the virtualization layer and support Kubernetes bare-metal on Fujitsu servers in the long run. By removing a layer of complexity, the total cost to run the system would be decreased and the missing hypervisor would increase performance.
    Right now we are in the process of contributing a generic provider to set up Kubernetes on OpenStack. As a next step in driving multi-platform support, Docker-based deployment of Kubernetes seems to be crucial. We plan to contribute to this feature to ensure it is going to be Beta in Kubernetes 1.3.
    Efficient operations
    Reducing operation costs is the target of any organization providing IT infrastructure. This can be achieved by increasing the efficiency of operations and helping operators to get their job done. Considering large-scale container infrastructures, we found it is important to differentiate between two types of operations:
    • Platform-oriented, relates to the overall infrastructure, often including various systems, one of which might be Kubernetes.
    • Application-oriented, focusses rather on a single, or a small set of applications deployed on Kubernetes.
    Kubernetes is already great for the application-oriented part. Cloud Load Control was created to help platform-oriented operators to efficiently manage Kubernetes as part of the overall infrastructure and make it easy to execute Kubernetes tasks relevant to them.
    The first version of Cloud Load Control provides a user interface integrated in the OpenStack Horizon dashboard which enables the Platform ops to create and manage their Kubernetes clusters.
    Clusters are treated as first-class citizens of OpenStack. Their creation is as simple as the creation of a virtual machine. Operators do not need to learn a new system or method of provisioning, and the self-service approach enables large organizations to rapidly provide the Kubernetes infrastructure to their tenants.
    An intuitive UI is crucial for the simplification of operations. This is why we heavily contributed to theKubernetes Dashboard project and ship it in Cloud Load Control. Especially for operators who don’t know the Kubernetes CLI by heart, because they have to care about other systems too, a great UI is perfectly suitable to get typical operational tasks done, such as checking the health of the system or deploying a new application.
    Monitoring is essential. With the dashboard, it is possible to get insights on cluster level. To ensure that the OpenStack operators have a deep understanding of their platform, we will soon add an integration withMonasca, OpenStack’s monitoring-as-a-service project, so metrics of Kubernetes can be analyzed together with OpenStack metrics from a single point of access.
    Quality and enterprise-grade support
    As a Japanese company, quality and customer focus have the highest priority in every product and service we ship. This is where the actual value of Cloud Cloud Control comes from: it provides a specific version of the open source software which has been intensively tested and hardened to ensure stable operations on a particular set of platforms.
    Acknowledging that container technology and Kubernetes is new territory for a lot of enterprises, expert assistance is the key for setting up and running a production-grade container infrastructure. Cloud Load Control comes with a support service leveraging Fujitsu’s proven support structure. This enables support also for customers operating Kubernetes in different regions of the world, like Europe and Japan, as part of the same offering.
    Conclusion
    2014 seems to be light years away, we believe the decision for Kubernetes was the right one. It is built from the ground-up to enable the creation of container-based, distributed applications, and best supports this use case.
    With Cloud Load Control, we’re excited to enable enterprises to run Kubernetes in production environments and to help their operators to efficiently use it, so DevOps teams can build awesome applications on top of it.

    -- Florian Walker, Product Manager, FUJITSU

    ElasticBox introduces ElasticKube to help manage Kubernetes within the enterprise

    $
    0
    0
    Today’s guest post is brought to you by Brannan Matherson, from ElasticBox, who’ll discuss a new open source project to help standardize container deployment and management in enterprise environments.  This highlights the advantages of authentication and user management for containerized applications
    I’m delighted to share some exciting work that we’re doing at ElasticBox to contribute to the open source community regarding the rapidly changing advancements in container technologies.  Our team is kicking off a new initiative called ElasticKube to help solve the problem of challenging container management scenarios within the enterprise.  This project is a native container management experience that is specific to Kubernetes and leverages automation to provision clusters for containerized applications based on the latest release of Kubernetes 1.2.  
    I’ve talked to many enterprise companies, both large and small, and the plethora of cloud offering capabilities is often confusing and makes the evaluation process very difficult, so why Kubernetes?  Of the large public cloud players - Amazon Web Services Microsoft Azure and Google Cloud Platform - Kubernetes is poised to take an innovative leadership role in framing the container management space. The Kubernetes platform does not restrict or dictate any given technical approach for containers, but encourages the community to collectively solve problems as this container market still takes form.  With a proven track record of supporting open source efforts, Kubernetes platform allows my team and me to actively contribute to this fundamental shift in the IT and developer world.
    We’ve chosen Kubernetes, not just for the core infrastructure services, but also the agility of Kubernetes to leverage the cluster management layer across any cloud environment - GCP, AWS, Azure, vSphere, and Rackspace.  Kubernetes also provides a huge benefit for users to run clusters for containers locally on many popular technologies such as: Docker, Vagrant (and VirtualBox), CoreOS, Mesos and more.  This amount of choice enables our team and many others in the community to consider solutions that will be viable for a wide range of enterprise scenarios.  In the case of ElasticKube, we’re pleased with Kubernetes 1.2 which includes the full release of the deployment API.  This provides the ability for us to perform seamless rolling updates of containerized applications that are running in production.   In addition, we’ve been able to support new resource types like ConfigMaps and Horizontal Pod Autoscalers.
    Fundamentally, ElasticKube delivers a web console for which compliments Kubernetes for users managing their clusters. The initial experience incorporates team collaboration, lifecycle management and reporting, so organizations can efficiently manage resources in a predictable manner.  Users will see an ElasticKube portal that takes advantage of the infrastructure abstraction that enables users to run a container that has already been built.  With ElasticKube assuming the cluster has been deployed, the overwhelming value is to provide visibility into who did what and define permissions for access to the cluster with multiple containers running on them.  Secondly, by partitioning clusters into namespaces, authorization management is more effective.  Finally, by empowering users to build a set of reusable templates in a modern portal, ElasticKube provides a vehicle for delivering a self-service template catalog that can be stored in GitHub (for instance, using Helm templates) and deployed easily.
    ElasticKube enables organizations to accelerate adoption by developers, application operations and traditional IT operations teams and shares a mutual goal of increasing developer productivity, driving efficiency in container management and promoting the use of microservices as a modern application delivery methodology.   When leveraging ElasticKube in your environment, users need to ensure the following technologies are configured appropriately to guarantee everything runs correctly:
    • Configure Google Container Engine (GKE) for cluster installation and management
    • Use Kubernetes to provision the infrastructure and clusters for containers  
    • Use your existing tools of choice to actually build your containers
    • Use ElasticKube to run, deploy and manage your containers and services

    Getting Started with Kubernetes and ElasticKube


    (this is a 3min walk through video with the following topics)
    1. Deploy ElasticKube to a Kubernetes cluster
    2. Configuration
    3. Admin: Setup and invite a user
    4. Deploy an instance

    Hear What Others are Saying
    “Kubernetes has provided us the level of sophistication required for enterprises to manage containers across complex networking environments and the appropriate amount of visibility into the application lifecycle.  Additionally, the community commitment and engagement has been exceptional, and we look forward to being a major contributor to this next wave of modern cloud computing and application management.”  
    ~Alberto Arias Maestro, Co-founder and Chief Technology Officer, ElasticBox

    -- Brannan Matherson, Head of Product Marketing, ElasticBox

    Kubernetes 1.2: Even more performance upgrades, plus easier application deployment and management

    $
    0
    0
    Today we released Kubernetes 1.2. This release represents significant improvements for large organizations building distributed systems. Now with over 680 unique contributors to the project, this release represents our largest yet.

    From the beginning, our mission has been to make building distributed systems easy and accessible for all. With the Kubernetes 1.2 release we’ve made strides towards our goal by increasing scale, decreasing latency and overall simplifying the way applications are deployed and managed. Now, developers at organizations of all sizes can build production scale apps more easily than ever before. 

    What’s new: 

    • Significant scale improvements. Increased cluster scale by 400% to 1,000 nodes and 30,000 containers per cluster.
    • Simplified application deployment and management
      • Dynamic Configuration (via the ConfigMap API) enables applications to pull their configuration when they run rather than packaging it in at build time. 
      • Turnkey Deployments (via the Beta Deployment API) let you declare your application and Kubernetes will do the rest. It handles versioning, multiple simultaneous rollouts, aggregating status across all pods, maintaining application availability and rollback. 
    • Automated cluster management:
      • Improved reliability through cross-zone failover and multi-zone scheduling
      • Simplified One-Pod-Per-Node Applications (via the Beta DaemonSet API) allows you to schedule a service (such as a logging agent) that runs one, and only one, pod per node. 
      • TLS and L7 support (via the Beta Ingress API) provides a straightforward way to integrate into custom networking environments by supporting TLS for secure communication and L7 for http-based traffic routing. 
      • Graceful Node Shutdown (aka Node Drain) takes care of transitioning pods off a node and allowing it to be shut down cleanly. 
      • Custom Metrics for Autoscaling now supports custom metrics, allowing you to specify a set of signals to indicate autoscaling pods. 
    • New GUI allows you to get started quickly and enables the same functionality found in the CLI for a more approachable and discoverable interface.

    Community 

    All these improvements would not be possible without our enthusiastic and global community. The momentum is astounding. We’re seeing over 400 pull requests per week, a 50% increase since the previous 1.1 release. There are meetups and conferences discussing Kubernetes nearly every day, on top of the 85 Kubernetes related meetup groups around the world. We’ve also seen significant participation in the community in the form of Special Interest Groups, with 18 active SIGs that cover topics from AWS and OpenStack to big data and scalability, to get involved join or start a new SIG. Lastly, we’re proud that Kubernetes is the first project to be accepted to the Cloud Native Computing Foundation (CNCF), read more about the announcement here

    Documentation 

    With Kubernetes 1.2 comes a relaunch of our website at kubernetes.io. We’ve slimmed down the docs contribution process so that all you have to do is fork/clone and send a PR. And the site works the same whether you’re staging it on your laptop, on github.io, or viewing it in production. It’s a pure GitHub Pages project; no scripts, no plugins. 

    From now on, our docs are at a new repo: https://github.com/kubernetes/kubernetes.github.io

    To entice you even further to contribute, we’re also announcing our new bounty program. For every “bounty bug” you address with a merged pull request, we offer the listed amount in credit for Google Cloud Platform services. Just look for bugs labeled “Bounty” in the new repo for more details. 

    Roadmap 

    All of our work is done in the open, to learn the latest about the project join the weekly community meeting or watch a recorded hangout. In keeping with our major release schedule of every three to four months, here are just a few items that are in development for next release and beyond
    • Improved stateful application support (aka Pet Set) 
    • Cluster Federation (aka Ubernetes) 
    • Even more (more!) performance improvements 
    • In-cluster IAM 
    • Cluster autoscaling 
    • Scheduled job 
    • Public dashboard that allows for nightly test runs across multiple cloud providers 
    • Lots, lots more! 
    Kubernetes 1.2 is available for download at get.k8s.io and via the open source repository hosted on GitHub. To get started with Kubernetes try our new Hello World app

    Connect 

    We’d love to hear from you and see you participate in this growing community: 
    • Get involved with the Kubernetes project on GitHub 
    • Post questions (or answer questions) on Stackoverflow 
    •  Connect with the community on Slack 
    • Follow us on Twitter @Kubernetesio for latest updates 
    Thank you for your support! 

     - David Aronchick, Senior Product Manager for Kubernetes, Google

    Scaling neural network image classification using Kubernetes with TensorFlow Serving

    $
    0
    0
    In 2011, Google developed an internal deep learning infrastructure called DistBelief, which allowed Googlers to build ever larger neural networks and scale training to thousands of cores. Late last year, Google introduced TensorFlow, its second-generation machine learning system. TensorFlow is general, flexible, portable, easy-to-use and, most importantly, developed with the open source community.

    The process of introducing machine learning into your product involves creating and training a model on your dataset, and then pushing the model to production to serve requests. In this blog post, we’ll show you how you can use Kubernetes with TensorFlow Serving, a high performance, open source serving system for machine learning models, to meet the scaling demands of your application.

    Let’s use image classification as an example. Suppose your application needs to be able to correctly identify an image across a set of categories. For example, given the cute puppy image below, your system should classify it as a retriever.
    Image via Wikipedia
    You can implement image classification with TensorFlow using the Inception-v3 model trained on the data from the ImageNet dataset. This dataset contains images and their labels, which allows the TensorFlow learner to train a model that can be used for by your application in production.
    Once the model is trained and exported, TensorFlow Serving uses the model to perform inference  predictions based on new data presented by its clients. In our example, clients submit image classification requests over gRPC, a high performance, open source RPC framework from Google.

    Inference can be very resource intensive. Our server executes the following TensorFlow graph to process every classification request it receives. The Inception-v3 model has over 27 million parameters and runs 5.7 billion floating point operations per inference.
    Schematic diagram of Inception-v3
    Fortunately, this is where Kubernetes can help us. Kubernetes distributes inference request processing across a cluster using its External Load Balancer. Each pod in the cluster contains a TensorFlow Serving Docker image with the TensorFlow Serving-based gRPC server and a trained Inception-v3 model. The model is represented as a set of files describing the shape of the TensorFlow graph, model weights, assets, and so on. Since everything is neatly packaged together, we can dynamically scale the number of replicated pods using the Kubernetes Replication Controller to keep up with the service demands.

    To help you try this out yourself, we’ve written a step-by-step tutorial, which shows you how to create the TensorFlow Serving Docker container to serve the Inception-v3 image classification model, configure a Kubernetes cluster and run classification requests against it. We hope this will make it easier for you to integrate machine learning into your own applications and scale it with Kubernetes! To learn more about TensorFlow Serving, check out tensorflow.github.io/serving

    - Fangwei Li, Software Engineer, Google

    Five Days of Kubernetes 1.2

    $
    0
    0
    The Kubernetes project has had some huge milestones over the past few weeks. We released Kubernetes 1.2, had our first conference in Europe, and were accepted into the Cloud Native Computing Foundation. While we catch our breath, we would like to take a moment to highlight some of the great work contributed by the community since our last milestone, just four months ago.

    Our mission is to make building distributed systems easy and accessible for all. While Kubernetes 1.2 has LOTS of new features, there are a few that really highlight the strides we’re making towards that goal. Over the course of the next week, we’ll be publishing a series of in-depth posts covering what’s new, so come back daily this week to read about the new features that continue to make Kubernetes the easiest way to run containers at scale. Thanks, and stay tuned!

    You can follow us on twitter here @Kubernetesio

    --David Aronchick, Senior Product Manager for Kubernetes, Google


    1000 nodes and beyond: updates to Kubernetes performance and scalability in 1.2

    $
    0
    0
    Editor's note: this is the first in a series of in-depth posts on what's new in Kubernetes 1.2

    We're proud to announce that with the release of 1.2, Kubernetes now supports 1000-node clusters, with a reduction of 80% in 99th percentile tail latency for most API operations. This means in just six months, we've increased our overall scale by 10 times while maintaining a great user experience  the 99th percentile pod startup times are less than 3 seconds, and 99th percentile latency of most API operations is tens of milliseconds (the exception being LIST operations, which take hundreds of milliseconds in very large clusters).

    Words are fine, but nothing speaks louder than a demo. Check this out!


    In the above video, you saw the cluster scale up to 10 M queries per second (QPS) over 1,000 nodes, including a rolling update, with zero downtime and no impact to tail latency. That’s big enough to be one of the top 100 sites on the Internet!

    In this blog post, we’ll cover the work we did to achieve this result, and discuss some of our future plans for scaling even higher.

    Methodology 

    We benchmark Kubernetes scalability against the following Service Level Objectives (SLOs):
    1. API responsiveness1: 99% of all API calls return in less than 1s 
    2. Pod startup time: 99% of pods and their containers (with pre-pulled images) start within 5s. 
    We say Kubernetes scales to a certain number of nodes only if both of these SLOs are met. We continuously collect and report the measurements described above as part of the project test framework. This battery of tests breaks down into two parts: API responsiveness and Pod Startup Time.

    API responsiveness for user-level abstractions2 

    Kubernetes offers high-level abstractions for users to represent their applications. For example, the ReplicationController is an abstraction representing a collection of pods. Listing all ReplicationControllers or listing all pods from a given ReplicationController is a very common use case. On the other hand, there is little reason someone would want to list all pods in the system  for example, 30,000 pods (1000 nodes with 30 pods per node) represent ~150MB of data (~5kB/pod * 30k pods). So this test uses ReplicationControllers.

    For this test (assuming N to be number of nodes in the cluster), we:
    1. Create roughly 3xN ReplicationControllers of different sizes (5, 30 and 250 replicas), which altogether have 30xN replicas. We spread their creation over time (i.e. we don’t start all of them at once) and wait until all of them are running. 

    2. Perform a few operations on every ReplicationController (scale it, list all its instances, etc.), spreading those over time, and measuring the latency of each operation. This is similar to what a real user might do in the course of normal cluster operation. 

    3. Stop and delete all ReplicationControllers in the system. 
    For results of this test see the “Metrics for Kubernetes 1.2” section below.

    For the v1.3 release, we plan to extend this test by also creating Services, Deployments, DaemonSets, and other API objects.

    Pod startup end-to-end latency3 

    Users are also very interested in how long it takes Kubernetes to schedule and start a pod. This is true not only upon initial creation, but also when a ReplicationController needs to create a replacement pod to take over from one whose node failed.

    We (assuming N to be the number of nodes in the cluster):
    1. Create a single ReplicationController with 30xN replicas and wait until all of them are running. We are also running high-density tests, with 100xN replicas, but with fewer nodes in the cluster. 

    2. Launch a series of single-pod ReplicationControllers - one every 200ms. For each, we measure “total end-to-end startup time” (defined below). 

    3. Stop and delete all pods and replication controllers in the system. 
    We define “total end-to-end startup time” as the time from the moment the client sends the API server a request to create a ReplicationController, to the moment when “running & ready” pod status is returned to the client via watch. That means that “pod startup time” includes the ReplicationController being created and in turn creating a pod, scheduler scheduling that pod, Kubernetes setting up intra-pod networking, starting containers, waiting until the pod is successfully responding to health-checks, and then finally waiting until the pod has reported its status back to the API server and then API server reported it via watch to the client.

    While we could have decreased the “pod startup time” substantially by excluding for example waiting for report via watch, or creating pods directly rather than through ReplicationControllers, we believe that a broad definition that maps to the most realistic use cases is the best for real users to understand the performance they can expect from the system.

    Metrics from Kubernetes 1.2 


    So what was the result?We run our tests on Google Compute Engine, setting the size of the master VM based on on the size of the Kubernetes cluster. In particular for 1000-node clusters we use a n1-standard-32 VM for the master (32 cores, 120GB RAM).

    API responsiveness 

    The following two charts present a comparison of 99th percentile API call latencies for the Kubernetes 1.2 release and the 1.0 release on 100-node clusters. (Smaller bars are better)

    We present results for LIST operations separately, since these latencies are significantly higher. Note that we slightly modified our tests in the meantime, so running current tests against v1.0 would result in higher latencies than they used to.


    We also ran these tests against 1000-node clusters. Note: We did not support clusters larger than 100 on GKE, so we do not have metrics to compare these results to. However, customers have reported running on 1,000+ node clusters since Kubernetes 1.0.



    Since LIST operations are significantly larger, we again present them separately: All latencies, in both cluster sizes, are well within our 1 second SLO.


    Pod startup end-to-end latency 

    The results for “pod startup latency” (as defined in the “Pod-Startup end-to-end latency” section) are presented in the following graph. For reference we are presenting also results from v1.0 for 100-node clusters in the first part of the graph.



    As you can see, we substantially reduced tail latency in 100-node clusters, and now deliver low pod startup latency up to the largest cluster sizes we have measured. It is noteworthy that the metrics for 1000-node clusters, for both API latency and pod startup latency, are generally better than those reported for 100-node clusters just six months ago!

    How did we make these improvements? 


    To make these significant gains in scale and performance over the past six months, we made a number of improvements across the whole system. Some of the most important ones are listed below.

    • Created a “read cache” at the API server level 
      (https://github.com/kubernetes/kubernetes/issues/15945 )

      Since most Kubernetes control logic operates on an ordered, consistent snapshot kept up-to-date by etcd watches (via the API server), a slight delay in that arrival of that data has no impact on the correct operation of the cluster. These independent controller loops, distributed by design for extensibility of the system, are happy to trade a bit of latency for an increase in overall throughput.

      In Kubernetes 1.2 we exploited this fact to improve performance and scalability by adding an API server read cache. With this change, the API server’s clients can read data from an in-memory cache in the API server instead of reading it from etcd. The cache is updated directly from etcd via watch in the background. Those clients that can tolerate latency in retrieving data (usually the lag of cache is on the order of tens of milliseconds) can be served entirely from cache, reducing the load on etcd and increasing the throughput of the server. This is a continuation of an optimization begun in v1.1, where we added support for serving watch directly from the API server instead of etcd: https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/apiserver-watch.md

    • Thanks to contributions from Wojciech Tyczynski at Google and Clayton Coleman and Timothy St. Clair at Red Hat, we were able to join careful system design with the unique advantages of etcd to improve the scalability and performance of Kubernetes. 
    • Introduce a “Pod Lifecycle Event Generator” (PLEG) in the Kubelet (https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/pod-lifecycle-event-generator.md

      Kubernetes 1.2 also improved density from a pods-per-node perspective  for v1.2 we test and advertise up to 100 pods on a single node (vs 30 pods in the 1.1 release). This improvement was possible because of diligent work by the Kubernetes community through an implementation of the Pod Lifecycle Event Generator (PLEG).

      The Kubelet (the Kubernetes node agent) has a worker thread per pod which is responsible for managing the pod’s lifecycle. In earlier releases each worker would periodically poll the underlying container runtime (Docker) to detect state changes, and perform any necessary actions to ensure the node’s state matched the desired state (e.g. by starting and stopping containers). As pod density increased, concurrent polling from each worker would overwhelm the Docker runtime, leading to serious reliability and performance issues (including additional CPU utilization which was one of the limiting factors for scaling up).

      To address this problem we introduced a new Kubelet subcomponent  the PLEG  to centralize state change detection and generate lifecycle events for the workers. With concurrent polling eliminated, we were able to lower the steady-state CPU usage of Kubelet and the container runtime by 4x. This also allowed us to adopt a shorter polling period, so as to detect and react to changes more quickly. 







    • Improved scheduler throughputKubernetes community members from CoreOS (Hongchao Deng and Xiang Li) helped to dive deep into the Kubernetes scheduler and dramatically improve throughput without sacrificing accuracy or flexibility. They cut total time to schedule 30,000 pods by nearly 1400%! You can read a great blog post on how they approached the problem here: https://coreos.com/blog/improving-kubernetes-scheduler-performance.html 

    • A more efficient JSON parserGo’s standard library includes a flexible and easy-to-use JSON parser that can encode and decode any Go struct using the reflection API. But that flexibility comes with a cost reflection allocates lots of small objects that have to be tracked and garbage collected by the runtime. Our profiling bore that out, showing that a large chunk of both client and server time was spent in serialization. Given that our types don’t change frequently, we suspected that a significant amount of reflection could be bypassed through code generation.

      After surveying the Go JSON landscape and conducting some initial tests, we found the ugorji codec library offered the most significant speedups - a 200% improvement in encoding and decoding JSON when using generated serializers, with a significant reduction in object allocations. After contributing fixes to the upstream library to deal with some of our complex structures, we switched Kubernetes and the go-etcd client library over. Along with some other important optimizations in the layers above and below JSON, we were able to slash the cost in CPU time of almost all API operations, especially reads. 

    Kubernetes 1.3 and Beyond 

    Of course, our job is not finished. We will continue to invest in improving Kubernetes performance, as we would like it to scale to many thousands of nodes, just like Google’s Borg. Thanks to our investment in testing infrastructure and our focus on how teams use containers in production, we have already identified the next steps on our path to improving scale. 

    On deck for Kubernetes 1.3: 
    1.  Our main bottleneck is still the API server, which spends the majority of its time just marshaling and unmarshaling JSON objects. We plan to add support for protocol buffers to the API as an optional path for inter-component communication and for storing objects in etcd. Users will still be able to use JSON to communicate with the API server, but since the majority of Kubernetes communication is intra-cluster (API server to node, scheduler to API server, etc.) we expect a significant reduction in CPU and memory usage on the master. 

    2.  Kubernetes uses labels to identify sets of objects; For example, identifying which pods belong to a given ReplicationController requires iterating over all pods in a namespace and choosing those that match the controller’s label selector. The addition of an efficient indexer for labels that can take advantage of the existing API object cache will make it possible to quickly find the objects that match a label selector, making this common operation much faster. 

    3. Scheduling decisions are based on a number of different factors, including spreading pods based on requested resources, spreading pods with the same selectors (e.g. from the same Service, ReplicationController, Job, etc.), presence of needed container images on the node, etc. Those calculations, in particular selector spreading, have many opportunities for improvement  see https://github.com/kubernetes/kubernetes/issues/22262 for just one suggested change. 

    4. We are also excited about the upcoming etcd v3.0 release, which was designed with Kubernetes use case in mind  it will both improve performance and introduce new features. Contributors from CoreOS have already begun laying the groundwork for moving Kubernetes to etcd v3.0 (see https://github.com/kubernetes/kubernetes/pull/22604). 
    While this list does not capture all the efforts around performance, we are optimistic we will achieve as big a performance gain as we saw going from Kubernetes 1.0 to 1.2. 


    Conclusion 

    In the last six months we’ve significantly improved Kubernetes scalability, allowing v1.2 to run 1000-node clusters with the same excellent responsiveness (as measured by our SLOs) as we were previously achieving only on much smaller clusters. But that isn’t enough  we want to push Kubernetes even further and faster. Kubernetes v1.3 will improve the system’s scalability and responsiveness further, while continuing to add features that make it easier to build and run the most demanding container-based applications. 

    Please join our community and help us build the future of Kubernetes! There are many ways to participate. If you’re particularly interested in scalability, you’ll be interested in: 
     And of course for more information about the project in general, go to www.kubernetes.io

    - Wojciech Tyczynski, Software Engineer, Google


    1We exclude operations on “events” since these are more like system logs and are not required for the system to operate properly.
    2This is test/e2e/load.go from the Kubernetes github repository.
    3This is test/e2e/density.go test from the Kubernetes github repository 
    4We are looking into optimizing this in the next release, but for now using a smaller master can result in significant (order of magnitude) performance degradation. We encourage anyone running benchmarking against Kubernetes or attempting to replicate these findings to use a similarly sized master, or performance will suffer.
    Viewing all 290 articles
    Browse latest View live