Blog: Kubernetes the Easy Way

October 31, 2017, 5:00 pm

≫ Next: Blog: Containerd Brings More Container Runtime Options for Kubernetes

≪ Previous: Blog: Enforcing Network Policies in Kubernetes

Editor’s note: Today’s post is by Dan Garfield, VP of Marketing at Codefresh, on how to set up and easily deploy a Kubernetes cluster.

Kelsey Hightower wrote an invaluable guide for Kubernetes called Kubernetes the Hard Way. It’s an awesome resource for those looking to understand the ins and outs of Kubernetes—but what if you want to put Kubernetes on easy mode? That’s something we’ve been working on together with Google Cloud. In this guide, we’ll show you how to get a cluster up and running, as well as how to actually deploy your code to that cluster and run it.

This is Kubernetes the easy way.

What We’ll Accomplish

1.Set up a cluster
2.Deploy an application to the cluster
3.Automate deployment with rolling updates

Prerequisites

A containerized application
You can also use a demo app.
A Google Cloud Account or a Kubernetes cluster on another provider
Everything after Cluster creation is identical with all providers.
A free account on Codefresh
Codefresh is a service that handles Kubernetes deployment configuration and automation.

We made Codefresh free for open-source projects and offer 200 builds/mo free for private projects, to make adopting Kubernetes as easy as possible. Deploy as much as you like on as many clusters as you like.

Set Up a Cluster

Create an account at cloud.google.com and log in.

Note: If you’re using a Cluster outside of Google Cloud, you can skip this step.

Google Container Engine is Google Cloud’s managed Kubernetes service. In our testing, it’s both powerful and easy to use.

If you’re new to the platform, you can get a $500 credit at the end of this process.

Open the menu and scroll down to Container Engine. Then select Container Clusters.

Click Create cluster.

We’re done with step 1. In my experience it usually takes less than 5 minutes for a cluster to be created.

Deploy an Application to Kubernetes

First go to Codefresh and create an account using Github, Bitbucket, or Gitlab. As mentioned previously, Codefresh is free for both open source and smaller private projects. We’ll use it to create the configuration Yaml necessary to deploy our application to Kubernetes. Then we’ll deploy our application and automate the process to happen every time we commit code changes. Here are the steps:

1.Create a Codefresh account
2.Connect to Google Cloud (or other cluster)
3.Add Cluster
4.Deploy static image
5.Build and deploy an image
6.Automate the process

Connect to Google Cloud

To connect your Clusters in Google Container Engine, go to Account Settings > Integrations > Kubernetes and click Authenticate. This prompts you to login with your Google credentials.

Once you log in, all of your clusters are available within Codefresh.

Add Cluster

To add your cluster, click the down arrow, and then click add cluster, select the project and cluster name. You can now deploy images!

Optional: Use an Alternative Cluster

To connect a non-GKE cluster we’ll need to add a token and certificate to Codefresh. Go to Account Settings (bottom left) > Integrations > Kubernetes > Configure > Add Provider > Custom Providers. Expand the dropdown and click Add Cluster.

Follow the instructions on how to generate the needed information and click Save. Your cluster now appears under the Kubernetes tab.

Deploy Static Image to Kubernetes

Now for the fun part! Codefresh provides an easily modifiable boilerplate that takes care of the heavy lifting of configuring Kubernetes for your application.

Click on the Kubernetes tab: this shows a list of namespaces.

Think of namespaces as acting a bit like VLANs on a Kubernetes cluster. Each namespace can contain all the services that need to talk to each other on a Kubernetes cluster. For now, we’ll just work off the default namespace (the easy way!).

Click Add Service and fill in the details.

You can use the demo application I mentioned earlier that has a Node.js frontend with a MongoDB.

Here’s the info we need to pass:

Cluster - This is the cluster we added earlier, our application will be deployed there.
Namespace - We’ll use default for our namespace but you can create and use a new one if you’d prefer. Namespaces are discrete units for grouping all the services associated with an application.
Service name - You can name the service whatever you like. Since we’re deploying Mongo, I’ll just name it mongo!
Expose port - We don’t need to expose the port outside of our cluster so we won’t check the box for now but we will specify a port where other containers can talk to this service. Mongo’s default port is ‘27017’.
Image - Mongo is a public image on Dockerhub, so I can reference it by name and tag, ‘mongo:latest’.
Internal Ports - This is the port the mongo application listens on, in this case it’s ‘27017’ again.

We can ignore the other options for now.

Scroll down and click Deploy.

Boom! You’ve just deployed this image to Kubernetes. You can see by clicking on the status that the service, deployment, replicas, and pods are all configured and running. If you click Edit > Advanced, you can see and edit all the raw YAML files associated with this application, or copy them and put them into your repository for use on any cluster.

Build and Deploy an Image

To get the rest of our demo application up and running we need to build and deploy the Node.js portion of the application. To do that we’ll need to add our repository to Codefresh.

Click on Repositories > Add Repository, then copy and paste the demochat repo url (or use your own repo).

We have the option to use a dockerfile, or to use a template if we need help creating a dockerfile. In this case, the demochat repo already has a dockerfile so we’ll select that. Click through the next few screens until the image builds.

Once the build is finished the image is automatically saved inside of the Codefresh docker registry. You can also add any other registry to your account and use that instead.

To deploy the image we’ll need

a pull secret
the image name and registry
the ports that will be used

Creating the Pull Secret

The pull secret is a token that the Kubernetes cluster can use to access a private Docker registry. To create one, we’ll need to generate the token and save it to Codefresh.

Click on User Settings (bottom left) and generate a new token.
Copy the token to your clipboard.

Go to Account Settings > Integrations > Docker Registry > Add Registry and select Codefresh Registry. Paste in your token and enter your username (entry is case sensitive). Your username must match your name displayed at the bottom left of the screen.
Test and save it.

We’ll now be able to create our secret later on when we deploy our image.

Get the image name

Click on Images and open the image you just built. Under Comment you’ll see the image name starting with r.cfcr.io.

Copy the image name; we’ll need to paste it in later.

Deploy the private image to Kubernetes

We’re now ready to deploy the image we built.

Go to the Kubernetes page and, like we did with mongo, click Add Service and fill out the page. Make sure to select the same namespace you used to deploy mongo earlier.

Now let’s expose the port so we can access this application. This provisions an IP address and automatically configures ingress.

Click Deploy : your application will be up and running within a few seconds! The IP address may take longer to provision depending on your cluster location.

From this view you can scale the replicas, see application status, and similar tasks.

Click on the IP address to view the running application.

At this point you should have your entire application up and running! Not so bad huh? Now to automate deployment!

Automate Deployment to Kubernetes

Every time we make a change to our application, we want to build a new image and deploy it to our cluster. We’ve already set up automated builds, but to automate deployment:

Click on Repositories (top left).
Click on the pipeline for the demochat repo (the gear icon).

It’s a good idea to run some tests before deploying. Under Build and Unit Test, add npm test for the unit test script.
Click Deploy Script and select Kubernetes (Beta). Enter the information for the service you’ve already deployed.

You can see the option to use a deployment file from your repo, or to use the deployment file that you just generated.

Click Save.

You’re done with deployment automation! Now whenever a change is made, the image will build, test, and deploy.

Conclusions

We want to make it easy for every team, not just big enterprise teams, to adopt Kubernetes while preserving all of Kubernetes’ power and flexibility. At any point on the Kubernetes service screen you can switch to YAML to view all of the YAMLfiles generated by the configuration you performed in this walkthrough. You can tweak the file content, copy and paste them into local files, etc.

This walkthrough gives everyone a solid base to start with. When you’re ready, you can tweak the entities directly to specify the exact configuration you’d like.

We’d love your feedback! Please share with us on Twitter, or reach out directly.

Addendums

Do you have a video to walk me through this?You bet.

Does this work with Helm Charts? Yes! We’re currently piloting Helm Charts with a limited set of users. Ping us if you’d like to try it early.

Does this work with any Kubernetes cluster? It should work with any Kubernetes cluster and is tested for Kubernetes 1.5 forward.

Can I deploy Codefresh in my own data center? Sure, Codefresh is built on top of Kubernetes using Helm Charts. Codefresh cloud is free for open source, and 200 builds/mo. Codefresh on prem is currently for enterprise users only.

Won’t the database be wiped every time we update? Yes, in this case we skipped creating a persistent volume. It’s a bit more work to get the persistent volume configured, if you’d like, feel free to reach out and we’re happy to help!

↧

Blog: Containerd Brings More Container Runtime Options for Kubernetes

November 1, 2017, 5:00 pm

≫ Next: Blog: Securing Software Supply Chain with Grafeas

≪ Previous: Blog: Kubernetes the Easy Way

Editor’s note: Today’s post is by Lantao Liu, Software Engineer at Google, and Mike Brown, Open Source Developer Advocate at IBM.

A container runtime is software that executes containers and manages container images on a node. Today, the most widely known container runtime is Docker, but there are other container runtimes in the ecosystem, such as rkt, containerd, and lxd. Docker is by far the most common container runtime used in production Kubernetes environments, but Docker’s smaller offspring, containerd, may prove to be a better option. This post describes using containerd with Kubernetes.

Kubernetes 1.5 introduced an internal plugin API named Container Runtime Interface (CRI) to provide easy access to different container runtimes. CRI enables Kubernetes to use a variety of container runtimes without the need to recompile. In theory, Kubernetes could use any container runtime that implements CRI to manage pods, containers and container images.

Over the past 6 months, engineers from Google, Docker, IBM, ZTE, and ZJU have worked to implement CRI for containerd. The project is called cri-containerd, which had its feature complete v1.0.0-alpha.0 release on September 25, 2017. With cri-containerd, users can run Kubernetes clusters using containerd as the underlying runtime without Docker installed.

containerd

Containerd is an OCI compliant core container runtime designed to be embedded into larger systems. It provides the minimum set of functionality to execute containers and manages images on a node. It was initiated by Docker Inc. and donated to CNCF in March of 2017. The Docker engine itself is built on top of earlier versions of containerd, and will soon be updated to the newest version. Containerd is close to a feature complete stable release, with 1.0.0-beta.1 available right now.

Containerd has a much smaller scope than Docker, provides a golang client API, and is more focused on being embeddable.The smaller scope results in a smaller codebase that’s easier to maintain and support over time, matching Kubernetes requirements as shown in the following table:

| | Containerd Scope (In/Out) | Kubernetes Requirement | |-|-|-| | Container Lifecycle Management | In | Container Create/Start/Stop/Delete/List/Inspect (✔️) | | Image Management | In | Pull/List/Inspect (✔️) | | Networking | Out No concrete network solution. User can setup network namespace and put containers into it. | Kubernetes networking deals with pods, rather than containers, so container runtimes should not provide complex networking solutions that don’t satisfy requirements. (✔️) | | Volumes | Out, No volume management. User can setup host path, and mount it into container. |Kubernetes manages volumes. Container runtimes should not provide internal volume management that may conflict with Kubernetes. (✔️) | | Persistent Container Logging | Out, No persistent container log. Container STDIO is provided as FIFOs, which can be redirected/decorated as is required. | Kubernetes has specific requirements for persistent container logs, such as format and path etc. Container runtimes should not persist an unmanageable container log. (✔️) | | Metrics | In Containerd provides container and snapshot metrics as part of the API. | Kubernetes expects container runtime to provide container metrics (CPU, Memory, writable layer size, etc.) and image filesystem usage (disk, inode usage, etc.). (✔️) | Overall, from a technical perspective, containerd is a very good alternative container runtime for Kubernetes.|

cri-containerd

Cri-containerd is exactly that: an implementation of CRI for containerd. It operates on the same node as the Kubelet and containerd. Layered between Kubernetes and containerd, cri-containerd handles all CRI service requests from the Kubelet and uses containerd to manage containers and container images. Cri-containerd manages these service requests in part by forming containerd service requests while adding sufficient additional function to support the CRI requirements.

Compared with the current Docker CRI implementation (dockershim), cri-containerd eliminates an extra hop in the stack, making the stack more stable and efficient.

Architecture

Cri-containerd uses containerd to manage the full container lifecycle and all container images. As also shown below, cri-containerd manages pod networking via CNI (another CNCF project).

Let’s use an example to demonstrate how cri-containerd works for the case when Kubelet creates a single-container pod:

1.Kubelet calls cri-containerd, via the CRI runtime service API, to create a pod;
2.cri-containerd uses containerd to create and start a special pause container (the sandbox container) and put that container inside the pod’s cgroups and namespace (steps omitted for brevity);
3.cri-containerd configures the pod’s network namespace using CNI;
4.Kubelet subsequently calls cri-containerd, via the CRI image service API, to pull the application container image;
5.cri-containerd further uses containerd to pull the image if the image is not present on the node;
6.Kubelet then calls cri-containerd, via the CRI runtime service API, to create and start the application container inside the pod using the pulled container image;
7.cri-containerd finally calls containerd to create the application container, put it inside the pod’s cgroups and namespace, then to start the pod’s new application container. After these steps, a pod and its corresponding application container is created and running.

Status

Cri-containerd v1.0.0-alpha.0 was released on Sep. 25, 2017.

It is feature complete. All Kubernetes features are supported.

All CRI validation tests have passed. (A CRI validation is a test framework for validating whether a CRI implementation meets all the requirements expected by Kubernetes.)

All regular node e2e tests have passed. (The Kubernetes test framework for testing Kubernetes node level functionalities such as managing pods, mounting volumes etc.)

To learn more about the v1.0.0-alpha.0 release, see the project repository.

Try it Out

For a multi-node cluster installer and bring up steps using ansible and kubeadm, see this repo link.

For creating a cluster from scratch on Google Cloud, see Kubernetes the Hard Way.

For a custom installation from release tarball, see this repo link.

For a installation with LinuxKit on a local VM, see this repo link.

Next Steps

We are focused on stability and usability improvements as our next steps.

Stability:
- Set up a full set of Kubernetes integration test in the Kubernetes test infrastructure on various OS distros such as Ubuntu, COS (Container-Optimized OS) etc.
- Actively fix any test failures and other issues reported by users.
Usability:
- Improve the user experience of crictl. Crictl is a portable command line tool for all CRI container runtimes. The goal here is to make it easy to use for debug and development scenarios.
- Integrate cri-containerd with kube-up.sh, to help users bring up a production quality Kubernetes cluster using cri-containerd and containerd.
- Improve our documentation for users and admins alike.

We plan to release our v1.0.0-beta.0 by the end of 2017.

Contribute

Cri-containerd is a Kubernetes incubator project located at https://github.com/kubernetes-incubator/cri-containerd. Any contributions in terms of ideas, issues, and/or fixes are welcome. The getting started guide for developers is a good place to start for contributors.

Community

Cri-containerd is developed and maintained by the Kubernetes SIG-Node community. We’d love to hear feedback from you. To join the community:

sig-node community site
Slack: #sig-node channel in Kubernetes (kubernetes.slack.com)
Mailing List: https://groups.google.com/forum/#!forum/kubernetes-sig-node

↧

Blog: Securing Software Supply Chain with Grafeas

November 2, 2017, 5:00 pm

≫ Next: Blog: Kubernetes is Still Hard (for Developers)

≪ Previous: Blog: Containerd Brings More Container Runtime Options for Kubernetes

Editor’s note: This post is written by Kelsey Hightower, Staff Developer Advocate at Google, and Sandra Guo, Product Manager at Google.

Kubernetes has evolved to support increasingly complex classes of applications, enabling the development of two major industry trends: hybrid cloud and microservices. With increasing complexity in production environments, customers—especially enterprises—are demanding better ways to manage their software supply chain with more centralized visibility and control over production deployments.

On October 12th, Google and partners announced Grafeas, an open source initiative to define a best practice for auditing and governing the modern software supply chain. With Grafeas (“scribe” in Greek), developers can plug in components of the CI/CD pipeline into a central source of truth for tracking and enforcing policies. Google is also working on Kritis (“judge” in Greek), allowing devOps teams to enforce deploy-time image policy using metadata and attestations stored in Grafeas.

Grafeas allows build, auditing and compliance tools to exchange comprehensive metadata on container images using a central API. This allows enforcing policies that provide central control over the software supply process.

Example application: PaymentProcessor

Let’s consider a simple application, PaymentProcessor, that retrieves, processes and updates payment info stored in a database. This application is made up of two containers: a standard ruby container and custom logic.

Due to the sensitive nature of the payment data, the developers and DevOps team really want to make sure that the code meets certain security and compliance requirements, with detailed records on the provenance of this code. There are CI/CD stages that validate the quality of the PaymentProcessor release, but there is no easy way to centrally view/manage this information:

Visibility and governance over the PaymentProcessor Code

Grafeas provides an API for customers to centrally manage metadata created by various CI/CD components and enables deploy time policy enforcement through a Kritis implementation.

Let’s consider a basic example of how Grafeas can provide deploy time control for the PaymentProcessor app using a demo verification pipeline.

Assume that a PaymentProcessor container image has been created and pushed to Google Container Registry. This example uses the gcr.io/exampleApp/PaymentProcessor container for testing. You as the QA engineer want to create an attestation certifying this image for production usage. Instead of trusting an image tag like 0.0.1, which can be reused and point to a different container image later, we can trust the image digest to ensure the attestation links to the full image contents.

1. Set up the environment

Generate a signing key:

gpg --quick-generate-key --yes qa\_bob@example.com

Export the image signer’s public key:

gpg --armor --export image.signer@example.com \> ${GPG\_KEY\_ID}.pub

Create the ‘qa’ AttestationAuthority note via the Grafeas API:

curl -X POST \  "http://127.0.0.1:8080/v1alpha1/projects/image-signing/notes?noteId=qa" \  
  -d @note.json

Create the Kubernetes ConfigMap for admissions control and store the QA signer’s public key:

kubectl create configmap image-signature-webhook \  
  --from-file ${GPG\_KEY\_ID}.pub

kubectl get configmap image-signature-webhook -o yaml

Set up an admissions control webhook to require QA signature during deployment.

kubectl apply -f kubernetes/image-signature-webhook.yaml

2. Attempt to deploy an image without QA attestation

Attempt to run the image in paymentProcessor.ymal before it is QA attested:

kubectl apply -f pods/nginx.yaml

apiVersion: v1

kind: Pod

metadata:

  name: payment

spec:

  containers:

    - name: payment

      image: "gcr.io/hightowerlabs/payment@sha256:aba48d60ba4410ec921f9d2e8169236c57660d121f9430dc9758d754eec8f887"

Create the paymentProcessor pod:

kubectl apply -f pods/paymentProcessor.yaml

Notice the paymentProcessor pod was not created and the following error was returned:

The  "" is invalid: : No matched signatures for container image: gcr.io/hightowerlabs/payment@sha256:aba48d60ba4410ec921f9d2e8169236c57660d121f9430dc9758d754eec8f887

3. Create an image signature

Assume the image digest is stored in Image-digest.txt, sign the image digest:

gpg -u qa\_bob@example.com \  
  --armor \  
  --clearsign \  
  --output=signature.gpg \  
  Image-digest.txt

4. Upload the signature to the Grafeas API

Generate a pgpSignedAttestation occurrence from the signature :

cat \> occurrence.json \<\<EOF  
{  "resourceUrl": "$(cat image-digest.txt)",  "noteName": "projects/image-signing/notes/qa",  "attestation": {  "pgpSignedAttestation": {  "signature": "$(cat signature.gpg)",  "contentType": "application/vnd.gcr.image.url.v1",  "pgpKeyId": "${GPG\_KEY\_ID}"  
    }  
  }  
}  
EOF

Upload the attestation through the Grafeas API:

curl -X POST \  
  'http://127.0.0.1:8080/v1alpha1/projects/image-signing/occurrences' \  
  -d @occurrence.json

5. Verify QA attestation during a production deployment

Attempt to run the image in paymentProcessor.ymal now that it has the correct attestation in the Grafeas API:

kubectl apply -f pods/paymentProcessor.yaml

pod "PaymentProcessor" created

With the attestation added the pod will be created as the execution criteria are met.

For more detailed information, see this Grafeas tutorial.

Summary

The demo above showed how you can integrate your software supply chain with Grafeas and gain visibility and control over your production deployments. However, the demo verification pipeline by itself is not a full Kritis implementation. In addition to basic admission control, Kritis provides additional support for workflow enforcement, multi-authority signing, breakglass deployment and more. You can read the Kritis whitepaper for more details. The team is actively working on a full open-source implementation. We’d love your feedback!

In addition, a hosted alpha implementation of Kritis, called Binary Authorization, is available on Google Container Engine and will be available for broader consumption soon.

Google, JFrog, and other partners joined forces to create Grafeas based on our common experiences building secure, large, and complex microservice deployments for internal and enterprise customers. Grafeas is an industry-wide community effort.

To learn more about Grafeas and contribute to the project:

Register for the JFrog-Google webinar [here]
Try Grafeas now and join the GitHub project: https://github.com/grafeas
Try out the Grafeas demo and tutorial: https://github.com/kelseyhightower/grafeas-tutorial
Attend Shopify’s talks at KubeCon in December
Fill out [this form] if you’re interested in learning more about our upcoming releases or talking to us about integrations
See grafeas.io for documentation and examples We hope you join us!
The Grafeas Team

↧

Blog: Kubernetes is Still Hard (for Developers)

November 14, 2017, 4:00 pm

≫ Next: Blog: Certified Kubernetes Conformance Program: Launch Celebration Round Up

≪ Previous: Blog: Securing Software Supply Chain with Grafeas

Kubernetes has made the Ops experience much easier, but how does the developer experience compare? Ops teams can deploy a Kubernetes cluster in a matter of minutes. But developers need to understand a host of new concepts before beginning to work with Kubernetes. This can be a tedious and manual process, but it doesn’t have to be. In this talk, Michelle Noorali, co-lead of SIG-Apps, reimagines the Kubernetes developer experience. She shares her top 3 tips for building a successful developer experience including:

A framework for thinking about cloud native applications
An integrated experience for debugging and fine-tuning cloud native applicationsA way to get a cloud native application out the door quickly Interested in learning how far the Kubernetes developer experience has come? Join us at KubeCon in Austin on December 6-8. Register Now >>

Check out Michelle’s keynote to learn about exciting new updates from CNCF projects.

↧

Blog: Certified Kubernetes Conformance Program: Launch Celebration Round Up

November 15, 2017, 4:00 pm

≫ Next: Blog: Autoscaling in Kubernetes

≪ Previous: Blog: Kubernetes is Still Hard (for Developers)

This week the CNCFⓇ certified the first group of KubernetesⓇ offerings under the Certified Kubernetes Conformance Program. These first certifications follow a beta phase during which we invited participants to submit conformance results. The community response was overwhelming: CNCF certified offerings from 32 vendors!

The new Certified Kubernetes Conformance Program gives enterprise organizations the confidence that workloads running on any Certified Kubernetes distribution or platform will work correctly on other Certified Kubernetes distributions or platforms. A Certified Kubernetes product guarantees that the complete Kubernetes API functions as specified, so users can rely on a seamless, stable experience.

Here’s what the world had to say about the Certified Kubernetes Conformance Program.

Press coverage:

IBM, Google, Microsoft, and 33 more partner to ensure Kubernetes workload portability, VentureBeat.
The CNCF just got 36 companies to agree to a Kubernetes certification standard, TechCrunch
CNCF Ensures Kubernetes Interoperability with a New Cert Program, The New Stack
Cloud Native launches Certified Kubernetes program, SD Times
CNCF offers Kubernetes Software Conformance Certification program RCR Wireless
New Kubernetes Certification Program Announced Virtualization Review
Key Cloud Computing Group Launches Interoperability Certification for Kubernetes GeekWire
New Kubernetes Certification Program Helps Deliver Consistency in the Cloud BetaNews
Kubernetes vendors agree on standardization, ZDNet
CNCF Launches Kubernetes Conformance Certification Program, sdxcentral Community blog round-up:
Introducing Certified Kubernetes (and Google Kubernetes Engine!), Google
Certified Kubernetes: A key step forward for the open source ecosystem, Heptio
Is Kubernetes “crossing the chasm”? Yes!, IBM
CoreOS Tectonic is Certified Kubernetes, CoreOS
VMware Pivotal Container Service Achieves Kubernetes Certification, VMWare Pivotal
Huawei Cloud Container Engine gained first wave of Certificated Kubernetes Qualification, Huawei
Cloud Foundry Container Runtime Gets Kubernetes-Certified, Cloud Foundry
SUSE CaaS Platform 2 certified under Kubernetes Conformance Software Certification, SUSE
Kubernetes on Mesosphere DC/OS now certified by CNCF, Mesosphere
Rancher joins the CNCF Kubernetes Software Conformance Certification program, Rancher
StackPointCloud Becomes First Multi-Cloud CNCF Certified Kubernetes Offering, StackPointCloud Visit https://www.cncf.io/certification/software-conformance for more information about the Certified Kubernetes Conformance Program, and learn how you can join a growing list of Certified Kubernetes providers.

“Cloud Native Computing Foundation”, “CNCF” and “Kubernetes” are registered trademarks of The Linux Foundation in the United States and other countries. “Certified Kubernetes” and the Certified Kubernetes design are trademarks of The Linux Foundation in the United States and other countries.

↧

Blog: Autoscaling in Kubernetes

November 16, 2017, 4:00 pm

≫ Next: Blog: PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes

≪ Previous: Blog: Certified Kubernetes Conformance Program: Launch Celebration Round Up

Kubernetes allows developers to automatically adjust cluster sizes and the number of pod replicas based on current traffic and load. These adjustments reduce the amount of unused nodes, saving money and resources. In this talk, Marcin Wielgus of Google walks you through the current state of pod and node autoscaling in Kubernetes: .how it works, and how to use it, including best practices for deployments in production applications.

Enjoyed this talk? Join us for more exciting sessions on scaling and automating your Kubernetes clusters at KubeCon in Austin on December 6-8. Register Now

Be sure to check out Automating and Testing Production Ready Kubernetes Clusters in the Public Cloud by Ron Lipke, Senior Developer, Platform as a Service, Gannet/USA Today Network.

↧

Blog: PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes

December 5, 2017, 4:00 pm

≫ Next: Blog: Using eBPF in Kubernetes

≪ Previous: Blog: Autoscaling in Kubernetes

Editor’s note: Today’s post is a joint post from the deep learning team at Baidu and the etcd team at CoreOS.

PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes

Two open source communities—PaddlePaddle, the deep learning framework originated in Baidu, and Kubernetes®, the most famous containerized application scheduler—are announcing the Elastic Deep Learning (EDL) feature in PaddlePaddle’s new release codenamed Fluid.

Fluid EDL includes a Kubernetes controller, PaddlePaddle auto-scaler, which changes the number of processes of distributed jobs according to the idle hardware resource in the cluster, and a new fault-tolerable architecture as described in the PaddlePaddle design doc.

Industrial deep learning requires significant computation power. Research labs and companies often build GPU clusters managed by SLURM, MPI, or SGE. These clusters either run a submitted job if it requires less than the idle resource, or pend the job for an unpredictably long time. This approach has its drawbacks: in an example with 99 available nodes and a submitted job that requires 100, the job has to wait without using any of the available nodes. Fluid works with Kubernetes to power elastic deep learning jobs, which often lack optimal resources, by helping to expose potential algorithmic problems as early as possible.

Another challenge is that industrial users tend to run deep learning jobs as a subset stage of the complete data pipeline, including the web server and log collector. Such general-purpose clusters require priority-based elastic scheduling. This makes it possible to run more processes in the web server job and less in deep learning during periods of high web traffic, then prioritize deep learning when web traffic is low. Fluid talks to Kubernetes’ API server to understand the global picture and orchestrate the number of processes affiliated with various jobs.

In both scenarios, PaddlePaddle jobs are tolerant to a process spikes and decreases. We achieved this by implementing the new design, which introduces a master process in addition to the old PaddlePaddle architecture as described in a previous blog post. In the new design, as long as there are three processes left in a job, it continues. In extreme cases where all processes are killed, the job can be restored and resume.

We tested Fluid EDL for two use cases: 1) the Kubernetes cluster runs only PaddlePaddle jobs; and 2) the cluster runs PaddlePaddle and Nginx jobs.

In the first test, we started up to 20 PaddlePaddle jobs one by one with a 10-second interval. Each job has 60 trainers and 10 parameter server processes, and will last for hours. We repeated the experiment 20 times: 10 with FluidEDL turned off and 10 with FluidEDL turned on. In Figure one, solid lines correspond to the first 10 experiments and dotted lines the rest. In the upper part of the figure, we see that the number of pending jobs increments monotonically without EDL. However, when EDL is turned on, resources are evenly distributed to all jobs. Fluid EDL kills some existing processes to make room for new jobs and jobs coming in at a later point in time. In both cases, the cluster is equally utilized (see lower part of figure).

| | | Figure 1. Fluid EDL evenly distributes resource among jobs.
|

In the second test, each experiment ran 400 Nginx pods, which has higher priority than the six PaddlePaddle jobs. Initially, each PaddlePaddle job had 15 trainers and 10 parameter servers. We killed 100 Nginx pods every 90 seconds until 100 left, and then we started to increase the number of Nginx jobs by 100 every 90 seconds. The upper part of Figure 2 shows this process. The middle of the diagram shows that Fluid EDL automatically started some PaddlePaddle processes by decreasing Nginx pods, and killed PaddlePaddle processes by increasing Nginx pods later on. As a result, the cluster maintains around 90% utilization as shown in the bottom of the figure. When Fluid EDL was turned off, there were no PaddlePaddle processes autoincrement, and the utilization fluctuated with the varying number of Nginx pods.

| | | Figure 2. Fluid changes PaddlePaddle processes with the change of Nginx processes. |

We continue to work on FluidEDL and welcome comments and contributions. Visit the PaddlePaddle repo, where you can find the design doc, a simple tutorial, and experiment details.

Xu Yan (Baidu Research)
Helin Wang (Baidu Research)
Yi Wu (Baidu Research)
Xi Chen (Baidu Research)
Weibao Gong (Baidu Research)
Xiang Li (CoreOS)
Yi Wang (Baidu Research)

↧

Blog: Using eBPF in Kubernetes

December 6, 2017, 4:00 pm

≫ Next: Blog: Kubernetes 1.9: Apps Workloads GA and Expanded Ecosystem

≪ Previous: Blog: PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes

Introduction

Kubernetes provides a high-level API and a set of components that hides almost all of the intricate and—to some of us—interesting details of what happens at the systems level. Application developers are not required to have knowledge of the machines’ IP tables, cgroups, namespaces, seccomp, or, nowadays, even the container runtime that their application runs on top of. But underneath, Kubernetes and the technologies upon which it relies (for example, the container runtime) heavily leverage core Linux functionalities.

This article focuses on a core Linux functionality increasingly used in networking, security and auditing, and tracing and monitoring tools. This functionality is called extended Berkeley Packet Filter (eBPF)

Note:In this article we use both acronyms: eBPF and BPF. The former is used for the extended BPF functionality, and the latter for “classic” BPF functionality.

What is BPF?

BPF is a mini-VM residing in the Linux kernel that runs BPF programs. Before running, BPF programs are loaded with the bpf() syscall and are validated for safety: checking for loops, code size, etc. BPF programs are attached to kernel objects and executed when events happen on those objects—for example, when a network interface emits a packet.

BPF Superpowers

BPF programs are event-driven by definition, an incredibly powerful concept, and executes code in the kernel when an event occurs. Netflix’s Brendan Gregg refers to BPF as a Linux superpower.

The ‘e’ in eBPF

Traditionally, BPF could only be attached to sockets for socket filtering. BPF’s first use case was in tcpdump. When you run tcpdump the filter is compiled into a BPF program and attached to a raw AF_PACKET socket in order to print out filtered packets.

But over the years, eBPF added the ability to attach to other kernel objects. In addition to socket filtering, some supported attach points are:

Kprobes (and userspace equivalents uprobes)
Tracepoints
Network schedulers or qdiscs for classification or action (tc)
XDP (eXpress Data Path) This and other, newer features like in-kernel helper functions and shared data-structures (maps) that can be used to communicate with user space, extend BPF’s capabilities.
Existing Use Cases of eBPF with Kubernetes
Several open-source Kubernetes tools already use eBPF and many use cases warrant a closer look, especially in areas such as networking, monitoring and security tools.

Dynamic Network Control and Visibility with Cilium

Cilium is a networking project that makes heavy use of eBPF superpowers to route and filter network traffic for container-based systems. By using eBPF, Cilium can dynamically generate and apply rules—even at the device level with XDP—without making changes to the Linux kernel itself.

The Cilium Agent runs on each host. Instead of managing IP tables, it translates network policy definitions to BPF programs that are loaded into the kernel and attached to a container’s virtual ethernet device. These programs are executed—rules applied—on each packet that is sent or received.

This diagram shows how the Cilium project works:

Depending on what network rules are applied, BPF programs may be attached with tc or XDP. By using XDP, Cilium can attach the BPF programs at the lowest possible point, which is also the most performant point in the networking software stack.

If you’d like to learn more about how Cilium uses eBPF, take a look at the project’s BPF and XDP reference guide.

Tracking TCP Connections in Weave Scope

Weave Scope is a tool for monitoring, visualizing and interacting with container-based systems. For our purposes, we’ll focus on how Weave Scope gets the TCP connections.

Weave Scope employs an agent that runs on each node of a cluster. The agent monitors the system, generates a report and sends it to the app server. The app server compiles the reports it receives and presents the results in the Weave Scope UI.

To accurately draw connections between containers, the agent attaches a BPF program to kprobes that track socket events: opening and closing connections. The BPF program, tcptracer-bpf, is compiled into an ELF object file and loaded using gopbf.

(As a side note, Weave Scope also has a plugin that make use of eBPF: HTTP statistics.)

To learn more about how this works and why it’s done this way, read this extensive post that the Kinvolk team wrote for the Weaveworks Blog. You can also watch a recent talk about the topic.

Limiting syscalls with seccomp-bpf

Linux has more than 300 system calls (read, write, open, close, etc.) available for use—or misuse. Most applications only need a small subset of syscalls to function properly. seccomp is a Linux security facility used to limit the set of syscalls that an application can use, thereby limiting potential misuse.

The original implementation of seccomp was highly restrictive. Once applied, if an application attempted to do anything beyond reading and writing to files it had already opened, seccomp sent a SIGKILL signal.

seccomp-bpf enables more complex filters and a wider range of actions. Seccomp-bpf, also known as seccomp mode 2, allows for applying custom filters in the form of BPF programs. When the BPF program is loaded, the filter is applied to each syscall and the appropriate action is taken (Allow, Kill, Trap, etc.).

seccomp-bpf is widely used in Kubernetes tools and exposed in Kubernetes itself. For example, seccomp-bpf is used in Docker to apply custom seccomp security profiles, in rkt to apply seccomp isolators, and in Kubernetes itself in its Security Context.

But in all of these cases the use of BPF is hidden behind libseccomp. Behind the scenes, libseccomp generates BPF code from rules provided to it. Once generated, the BPF program is loaded and the rules applied.

Potential Use Cases for eBPF with Kubernetes

eBPF is a relatively new Linux technology. As such, there are many uses that remain unexplored. eBPF itself is also evolving: new features are being added in eBPF that will enable new use cases that aren’t currently possible. In the following sections, we’re going to look at some of these that have only recently become possible and ones on the horizon. Our hope is that these features will be leveraged by open source tooling.

Pod and container level network statistics

BPF socket filtering is nothing new, but BPF socket filtering per cgroup is. Introduced in Linux 4.10, cgroup-bpf allows attaching eBPF programs to cgroups. Once attached, the program is executed for all packets entering or exiting any process in the cgroup.

A cgroup is, amongst other things, a hierarchical grouping of processes. In Kubernetes, this grouping is found at the container level. One idea for making use of cgroup-bpf, is to install BPF programs that collect detailed per-pod and/or per-container network statistics.

Generally, such statistics are collected by periodically checking the relevant file in Linux’s /sys directory or using Netlink. By using BPF programs attached to cgroups for this, we can get much more detailed statistics: for example, how many packets/bytes on tcp port 443, or how many packets/bytes from IP 10.2.3.4. In general, because BPF programs have a kernel context, they can safely and efficiently deliver more detailed information to user space.

To explore the idea, the Kinvolk team implemented a proof-of-concept: https://github.com/kinvolk/cgnet. This project attaches a BPF program to each cgroup and exports the information to Prometheus.

There are of course other interesting possibilities, like doing actual packet filtering. But the obstacle currently standing in the way of this is having cgroup v2 support—required by cgroup-bpf—in Docker and Kubernetes.

Application-applied LSM

Linux Security Modules (LSM) implements a generic framework for security policies in the Linux kernel. SELinux and AppArmor are examples of these. Both of these implement rules at a system-global scope, placing the onus on the administrator to configure the security policies.

Landlock is another LSM under development that would co-exist with SELinux and AppArmor. An initial patchset has been submitted to the Linux kernel and is in an early stage of development. The main difference with other LSMs is that Landlock is designed to allow unprivileged applications to build their own sandbox, effectively restricting themselves instead of using a global configuration. With Landlock, an application can load a BPF program and have it executed when the process performs a specific action. For example, when the application opens a file with the open() system call, the kernel will execute the BPF program, and, depending on what the BPF program returns, the action will be accepted or denied.

In some ways, it is similar to seccomp-bpf: using a BPF program, seccomp-bpf allows unprivileged processes to restrict what system calls they can perform. Landlock will be more powerful and provide more flexibility. Consider the following system call:

C  
fd = open(“myfile.txt”, O\_RDWR);

The first argument is a “char *”, a pointer to a memory address, such as 0xab004718.

With seccomp, a BPF program only has access to the parameters of the syscall but cannot dereference the pointers, making it impossible to make security decisions based on a file. seccomp also uses classic BPF, meaning it cannot make use of eBPF maps, the mechanism for interfacing with user space. This restriction means security policies cannot be changed in seccomp-bpf based on a configuration in an eBPF map.

BPF programs with Landlock don’t receive the arguments of the syscalls but a reference to a kernel object. In the example above, this means it will have a reference to the file, so it does not need to dereference a pointer, consider relative paths, or perform chroots.

Use Case: Landlock in Kubernetes-based serverless frameworks

In Kubernetes, the unit of deployment is a pod. Pods and containers are the main unit of isolation. In serverless frameworks, however, the main unit of deployment is a function. Ideally, the unit of deployment equals the unit of isolation. This puts serverless frameworks like Kubeless or OpenFaaS into a predicament: optimize for unit of isolation or deployment?

To achieve the best possible isolation, each function call would have to happen in its own container—ut what’s good for isolation is not always good for performance. Inversely, if we run function calls within the same container, we increase the likelihood of collisions.

By using Landlock, we could isolate function calls from each other within the same container, making a temporary file created by one function call inaccessible to the next function call, for example. Integration between Landlock and technologies like Kubernetes-based serverless frameworks would be a ripe area for further exploration.

Auditing kubectl-exec with eBPF

In Kubernetes 1.7 the audit proposal started making its way in. It’s currently pre-stable with plans to be stable in the 1.10 release. As the name implies, it allows administrators to log and audit events that take place in a Kubernetes cluster.

While these events log Kubernetes events, they don’t currently provide the level of visibility that some may require. For example, while we can see that someone has used kubectl exec to enter a container, we are not able to see what commands were executed in that session. With eBPF one can attach a BPF program that would record any commands executed in the kubectl exec session and pass those commands to a user-space program that logs those events. We could then play that session back and know the exact sequence of events that took place.

Learn more about eBPF

If you’re interested in learning more about eBPF, here are some resources: - A comprehensive reading list about eBPF for doing just that - BCC (BPF Compiler Collection) provides tools for working with eBPF as well as many example tools making use of BCC. - Some videos

BPF: Tracing and More by Brendan Gregg
Cilium - Container Security and Networking Using BPF and XDP by Thomas Graf
Using BPF in Kubernetes by Alban Crequy

Conclusion

We are just starting to see the Linux superpowers of eBPF being put to use in Kubernetes tools and technologies. We will undoubtedly see increased use of eBPF. What we have highlighted here is just a taste of what you might expect in the future. What will be really exciting is seeing how these technologies will be used in ways that we have not yet thought about. Stay tuned!

The Kinvolk team will be hanging out at the Kinvolk booth at KubeCon in Austin. Come by to talk to us about all things, Kubernetes, Linux, container runtimes and yeah, eBPF.

↧

Blog: Kubernetes 1.9: Apps Workloads GA and Expanded Ecosystem

December 14, 2017, 4:00 pm

≫ Next: Blog: Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes

≪ Previous: Blog: Using eBPF in Kubernetes

We’re pleased to announce the delivery of Kubernetes 1.9, our fourth and final release this year.

Today’s release continues the evolution of an increasingly rich feature set, more robust stability, and even greater community contributions. As the fourth release of the year, it gives us an opportunity to look back at the progress made in key areas. Particularly notable is the advancement of the Apps Workloads API to stable. This removes any reservations potential adopters might have had about the functional stability required to run mission-critical workloads. Another big milestone is the beta release of Windows support, which opens the door for many Windows-specific applications and workloads to run in Kubernetes, significantly expanding the implementation scenarios and enterprise readiness of Kubernetes.

Workloads API GA

We’re excited to announce General Availability (GA) of the apps/v1 Workloads API, which is now enabled by default. The Apps Workloads API groups the DaemonSet, Deployment, ReplicaSet, and StatefulSet APIs together to form the foundation for long-running stateless and stateful workloads in Kubernetes. Note that the Batch Workloads API (Job and CronJob) is not part of this effort and will have a separate path to GA stability.

Deployment and ReplicaSet, two of the most commonly used objects in Kubernetes, are now stabilized after more than a year of real-world use and feedback. SIG Apps has applied the lessons from this process to all four resource kinds over the last several release cycles, enabling DaemonSet and StatefulSet to join this graduation. The v1 (GA) designation indicates production hardening and readiness, and comes with the guarantee of long-term backwards compatibility.

Windows Support (beta)

Kubernetes was originally developed for Linux systems, but as our users are realizing the benefits of container orchestration at scale, we are seeing demand for Kubernetes to run Windows workloads. Work to support Windows Server in Kubernetes began in earnest about 12 months ago. SIG-Windowshas now promoted this feature to beta status, which means that we can evaluate it for usage.

Storage Enhancements

From the first release, Kubernetes has supported multiple options for persistent data storage, including commonly-used NFS or iSCSI, along with native support for storage solutions from the major public and private cloud providers. As the project and ecosystem grow, more and more storage options have become available for Kubernetes. Adding volume plugins for new storage systems, however, has been a challenge.

Container Storage Interface (CSI) is a cross-industry standards initiative that aims to lower the barrier for cloud native storage development and ensure compatibility. SIG-Storage and the CSI Community are collaborating to deliver a single interface for provisioning, attaching, and mounting storage compatible with Kubernetes.

Kubernetes 1.9 introduces an alpha implementation of the Container Storage Interface (CSI), which will make installing new volume plugins as easy as deploying a pod, and enable third-party storage providers to develop their solutions without the need to add to the core Kubernetes codebase.

Because the feature is alpha in 1.9, it must be explicitly enabled and is not recommended for production usage, but it indicates the roadmap working toward a more extensible and standards-based Kubernetes storage ecosystem.

Additional Features

Custom Resource Definition (CRD) Validation, now graduating to beta and enabled by default, helps CRD authors give clear and immediate feedback for invalid objects

SIG Node hardware accelerator moves to alpha, enabling GPUs and consequently machine learning and other high performance workloads

CoreDNS alpha makes it possible to install CoreDNS with standard tools

IPVS mode for kube-proxy goes beta, providing better scalability and performance for large clusters

Each Special Interest Group (SIG) in the community continues to deliver the most requested user features for their area. For a complete list, please visit the release notes.

Availability

Kubernetes 1.9 is available for download on GitHub. To get started with Kubernetes, check out these interactive tutorials.

Release team

This release is made possible through the effort of hundreds of individuals who contributed both technical and non-technical content. Special thanks to the release team led by Anthony Yeh, Software Engineer at Google. The 14 individuals on the release team coordinate many aspects of the release, from documentation to testing, validation, and feature completeness.

As the Kubernetes community has grown, our release process has become an amazing demonstration of collaboration in open source software development. Kubernetes continues to gain new users at a rapid clip. This growth creates a positive feedback cycle where more contributors commit code creating a more vibrant ecosystem.

Project Velocity

The CNCF has embarked on an ambitious project to visualize the myriad contributions that go into the project. K8s DevStats illustrates the breakdown of contributions from major company contributors. Open issues remained relatively stable over the course of the release, while forks rose approximately 20%, as did individuals starring the various project repositories. Approver volume has risen slightly since the last release, but a lull is commonplace during the last quarter of the year. With 75,000+ comments, Kubernetes remains one of the most actively discussed projects on GitHub.

User highlights

According to the latest survey conducted by CNCF, 61 percent of organizations are evaluating and 83 percent are using Kubernetes in production. Example of user stories from the community include:

BlaBlaCar, the world’s largest long distance carpooling community connects 40 million members across 22 countries. The company has about 3,000 pods, with 1,200 of them running on Kubernetes, leading to improved website availability for customers.

Pokémon GO, the popular free-to-play, location-based augmented reality game developed by Niantic for iOS and Android devices, has its application logic running on Google Container Engine powered by Kubernetes. This was the largest Kubernetes deployment ever on Google Container Engine.

Is Kubernetes helping your team? Share your story with the community.

Ecosystem updates

Announced on November 13, the Certified Kubernetes Conformance Program ensures that Certified Kubernetes™ products deliver consistency and portability. Thirty-two Certified Kubernetes Distributions and Platforms are now available. Development of the certification program involved close collaboration between CNCF and the rest of the Kubernetes community, especially the Testing and Architecture Special Interest Groups (SIGs). The Kubernetes Architecture SIG is the final arbiter of the definition of API conformance for the program. The program also includes strong guarantees that commercial providers of Kubernetes will continue to release new versions to ensure that customers can take advantage of the rapid pace of ongoing development.

CNCF also offers online training that teaches the skills needed to create and configure a real-world Kubernetes cluster.

KubeCon

For recorded sessions from the largest Kubernetes gathering, KubeCon + CloudNativeCon in Austin from December 6-8, 2017, visit YouTube/CNCF. The premiere Kubernetes event will be back May 2-4, 2018 in Copenhagen and will feature technical sessions, case studies, developer deep dives, salons and more! CFP closes January 12, 2018.

Webinar

Join members of the Kubernetes 1.9 release team on January 9th from 10am-11am PT to learn about the major features in this release as they demo some of the highlights in the areas of Windows and Docker support, storage, admission control, and the workloads API. Register here.

Get involved:

The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below.

Thank you for your continued feedback and support.

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Follow us on Twitter @Kubernetesio for latest updates
Chat with the community on Slack
Share your Kubernetes story.

↧

Blog: Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes

December 20, 2017, 4:00 pm

≫ Next: Blog: Five Days of Kubernetes 1.9

≪ Previous: Blog: Kubernetes 1.9: Apps Workloads GA and Expanded Ecosystem

Today’s post is by David Aronchick and Jeremy Lewi, a PM and Engineer on the Kubeflow project, a new open source Github repo dedicated to making using machine learning (ML) stacks on Kubernetes easy, fast and extensible.

Kubernetes and Machine Learning

Kubernetes has quickly become the hybrid solution for deploying complicated workloads anywhere. While it started with just stateless services, customers have begun to move complex workloads to the platform, taking advantage of rich APIs, reliability and performance provided by Kubernetes. One of the fastest growing use cases is to use Kubernetes as the deployment platform of choice for machine learning.

Building any production-ready machine learning system involves various components, often mixing vendors and hand-rolled solutions. Connecting and managing these services for even moderately sophisticated setups introduces huge barriers of complexity in adopting machine learning. Infrastructure engineers will often spend a significant amount of time manually tweaking deployments and hand rolling solutions before a single model can be tested.

Worse, these deployments are so tied to the clusters they have been deployed to that these stacks are immobile, meaning that moving a model from a laptop to a highly scalable cloud cluster is effectively impossible without significant re-architecture. All these differences add up to wasted effort and create opportunities to introduce bugs at each transition.

Introducing Kubeflow

To address these concerns, we’re announcing the creation of the Kubeflow project, a new open source Github repo dedicated to making using ML stacks on Kubernetes easy, fast and extensible. This repository contains:

JupyterHub to create & manage interactive Jupyter notebooks
A Tensorflow Custom Resource (CRD) that can be configured to use CPUs or GPUs, and adjusted to the size of a cluster with a single setting
A TF Serving container Because this solution relies on Kubernetes, it runs wherever Kubernetes runs. Just spin up a cluster and go!

Using Kubeflow

Let’s suppose you are working with two different Kubernetes clusters: a local minikube cluster; and a GKE cluster with GPUs; and that you have two kubectl contexts defined named minikube and gke.

First we need to initialize our ksonnet application and install the Kubeflow packages. (To use ksonnet, you must first install it on your operating system - the instructions for doing so are here)

     ks init my-kubeflow  
     cd my-kubeflow  
     ks registry add kubeflow \  
     github.com/google/kubeflow/tree/master/kubeflow  
     ks pkg install kubeflow/core  
     ks pkg install kubeflow/tf-serving  
     ks pkg install kubeflow/tf-job  
     ks generate core kubeflow-core --name=kubeflow-core

We can now define environments corresponding to our two clusters.

     kubectl config use-context minikube  
     ks env add minikube  

     kubectl config use-context gke  
     ks env add gke

And we’re done! Now just create the environments on your cluster. First, on minikube:

     ks apply minikube -c kubeflow-core

And to create it on our multi-node GKE cluster for quicker training:

     ks apply gke -c kubeflow-core

By making it easy to deploy the same rich ML stack everywhere, the drift and rewriting between these environments is kept to a minimum.

To access either deployments, you can execute the following command:

     kubectl port-forward tf-hub-0 8100:8000

and then open up http://127.0.0.1:8100 to access JupyterHub. To change the environment used by kubectl, use either of these commands:

     # To access minikube  
     kubectl config use-context minikube  

     # To access GKE  
     kubectl config use-context gke

When you execute apply you are launching on K8s

JupyterHub for launching and managing Jupyter notebooks on K8s
A TF CRD

Let’s suppose you want to submit a training job. Kubeflow provides ksonnet prototypes that make it easy to define components. The tf-job prototype makes it easy to create a job for your code but for this example, we’ll use the tf-cnn prototype which runs TensorFlow’s CNN benchmark.

To submit a training job, you first generate a new job from a prototype:

     ks generate tf-cnn cnn --name=cnn

By default the tf-cnn prototype uses 1 worker and no GPUs which is perfect for our minikube cluster so we can just submit it.

     ks apply minikube -c cnn

On GKE, we’ll want to tweak the prototype to take advantage of the multiple nodes and GPUs. First, let’s list all the parameters available:

     # To see a list of parameters  
     ks prototype list tf-job

Now let’s adjust the parameters to take advantage of GPUs and access to multiple nodes.

     ks param set --env=gke cnn num\_gpus 1  
     ks param set --env=gke cnn num\_workers 1  

     ks apply gke -c cnn

Note how we set those parameters so they are used only when you deploy to GKE. Your minikube parameters are unchanged!

After training, you export your model to a serving location.

Kubeflow also includes a serving package as well. In a separate example, we trained a standard Inception model, and stored the trained model in a bucket we’ve created called ‘gs://kubeflow-models’ with the path ‘/inception’.

To deploy a the trained model for serving, execute the following:

     ks generate tf-serving inception --name=inception  
     ---namespace=default --model\_path=gs://kubeflow-models/inception  
     ks apply gke -c inception

This highlights one more option in Kubeflow - the ability to pass in inputs based on your deployment. This command creates a tf-serving service on the GKE cluster, and makes it available to your application.

For more information about of deploying and monitoring TensorFlow training jobs and TensorFlow models please refer to the user guide.

Kubeflow + ksonnet

One choice we want to call out is the use of the ksonnet project. We think working with multiple environments (dev, test, prod) will be the norm for most Kubeflow users. By making environments a first class concept, ksonnet makes it easy for Kubeflow users to easily move their workloads between their different environments.

Particularly now that Helm is integrating ksonnet with the next version of their platform, we felt like it was the perfect choice for us. More information about ksonnet can be found in the ksonnet docs.

We also want to thank the team at Heptio for expediting features critical to Kubeflow’s use of ksonnet.

What’s Next?

We are in the midst of building out a community effort right now, and we would love your help! We’ve already been collaborating with many teams - CaiCloud, Red Hat & OpenShift, Canonical, Weaveworks, Container Solutions and many others. CoreOS, for example, is already seeing the promise of Kubeflow:

“The Kubeflow project was a needed advancement to make it significantly easier to set up and productionize machine learning workloads on Kubernetes, and we anticipate that it will greatly expand the opportunity for even more enterprises to embrace the platform. We look forward to working with the project members in providing tight integration of Kubeflow with Tectonic, the enterprise Kubernetes platform.” – Reza Shafii, VP of product, CoreOS

If you’d like to try out Kubeflow right now right in your browser, we’ve partnered with Katacoda to make it super easy. You can try it here!

And we’re just getting started! We would love for you to help. How you might ask? Well…

Please join theslack channel
Please join thekubeflow-discuss email list
Please subscribe to theKubeflow twitter account
Please download and run kubeflow, and submit bugs! Thank you for your support so far, we could not be more excited!

Jeremy Lewi & David Aronchick Google

↧

Blog: Five Days of Kubernetes 1.9

January 7, 2018, 4:00 pm

≫ Next: Blog: Kubernetes v1.9 releases beta support for Windows Server Containers

≪ Previous: Blog: Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes

Kubernetes 1.9 is live, made possible by hundreds of contributors pushing thousands of commits in this latest releases.

The community has tallied around 32,300 commits in the main repo and continues rapid growth outside of the main repo, which signals growing maturity and stability for the project. The community has logged more than 90,700 commits across all repos and 7,800 commits across all repos for v1.8.0 to v1.9.0 alone.

With the help of our growing community of 1,400 plus contributors, we issued more than 4,490 PRs and pushed more than 7,800 commits to deliver Kubernetes 1.9 with many notable updates, including enhancements for the workloads and stateful application support areas. This all points to increased extensibility and standards-based Kubernetes ecosystem.

While many improvements have been contributed, we highlight key features in this series of in-depth posts listed below. Follow along and see what’s new and improved with workloads, Windows support and more.

Day 1: 5 Days of Kubernetes 1.9
Day 2: Windows and Docker support for Kubernetes (beta)
Day 3: Storage, CSI framework (alpha)
Day 4: Web Hook and Mission Critical, Dynamic Admission Control
Day 5: Introducing client-go version 6
Day 6: Workloads API

Connect

Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Follow us on Twitter @Kubernetesio for latest updates
Connect with the community on Slack
Get involved with the Kubernetes project on GitHub

↧

Blog: Kubernetes v1.9 releases beta support for Windows Server Containers

January 8, 2018, 4:00 pm

≫ Next: Blog: Introducing Container Storage Interface (CSI) Alpha for Kubernetes

≪ Previous: Blog: Five Days of Kubernetes 1.9

With the release of Kubernetes v1.9, our mission of ensuring Kubernetes works well everywhere and for everyone takes a great step forward. We’ve advanced support for Windows Server to beta along with continued feature and functional advancements on both the Kubernetes and Windows platforms. SIG-Windows has been working since March of 2016 to open the door for many Windows-specific applications and workloads to run on Kubernetes, significantly expanding the implementation scenarios and the enterprise reach of Kubernetes.

Enterprises of all sizes have made significant investments in .NET and Windows based applications. Many enterprise portfolios today contain .NET and Windows, with Gartner claiming that 80% of enterprise apps run on Windows. According to StackOverflow Insights, 40% of professional developers use the .NET programming languages (including .NET Core).

But why is all this information important? It means that enterprises have both legacy and new born-in-the-cloud (microservice) applications that utilize a wide array of programming frameworks. There is a big push in the industry to modernize existing/legacy applications to containers, using an approach similar to “lift and shift”. Modernizing existing applications into containers also provides added flexibility for new functionality to be introduced in additional Windows or Linux containers. Containers are becoming the de facto standard for packaging, deploying, and managing both existing and microservice applications. IT organizations are looking for an easier and homogenous way to orchestrate and manage containers across their Linux and Windows environments. Kubernetes v1.9 now offers beta support for Windows Server containers, making it the clear choice for orchestrating containers of any kind.

Features

Alpha support for Windows Server containers in Kubernetes was great for proof-of-concept projects and visualizing the road map for support of Windows in Kubernetes. The alpha release had significant drawbacks, however, and lacked many features, especially in networking. SIG-Windows, Microsoft, Cloudbase Solutions, Apprenda, and other community members banded together to create a comprehensive beta release, enabling Kubernetes users to start evaluating and using Windows.

Some key feature improvements for Windows Server containers on Kubernetes include:

Improved support for pods! Multiple Windows Server containers in a pod can now share the network namespace using network compartments in Windows Server. This feature brings the concept of a pod to parity with Linux-based containers
Reduced network complexity by using a single network endpoint per pod
Kernel-Based load-balancing using the Virtual Filtering Platform (VFP) Hyper-v Switch Extension (analogous to Linux iptables)
Container Runtime Interface (CRI) pod and node level statistics. Windows Server containers can now be profiled for Horizontal Pod Autoscaling using performance metrics gathered from the pod and the node
Support for kubeadm commands to add Windows Server nodes to a Kubernetes environment. Kubeadm simplifies the provisioning of a Kubernetes cluster, and with the support for Windows Server, you can use a single tool to deploy Kubernetes in your infrastructure
Support for ConfigMaps, Secrets, and Volumes. These are key features that allow you to separate, and in some cases secure, the configuration of the containers from the implementation The crown jewels of Kubernetes 1.9 Windows support, however, are the networking enhancements. With the release of Windows Server 1709, Microsoft has enabled key networking capabilities in the operating system and the Windows Host Networking Service (HNS) that paved the way to produce a number of CNI plugins that work with Windows Server containers in Kubernetes. The Layer-3 routed and network overlay plugins that are supported with Kubernetes 1.9 are listed below:

Upstream L3 Routing - IP routes configured in upstream ToR
Host-Gateway - IP routes configured on each host
Open vSwitch (OVS) & Open Virtual Network (OVN) with Overlay - Supports STT and Geneve tunneling types You can read more about each of their configuration, setup, and runtime capabilities to make an informed selection for your networking stack in Kubernetes.

Even though you have to continue running the Kubernetes Control Plane and Master Components in Linux, you are now able to introduce Windows Server as a Node in Kubernetes. As a community, this is a huge milestone and achievement. We will now start seeing .NET, .NET Core, ASP.NET, IIS, Windows Services, Windows executables and many more windows-based applications in Kubernetes.

What’s coming next

A lot of work went into this beta release, but the community realizes there are more areas of investment needed before we can release Windows support as GA (General Availability) for production workloads. Some keys areas of focus for the first two quarters of 2018 include:

Continue to make progress in the area of networking. Additional CNI plugins are under development and nearing completion
Overlay - win-overlay (vxlan or IP-in-IP encapsulation using Flannel)
Win-l2bridge (host-gateway)
OVN using cloud networking - without overlays
Support for Kubernetes network policies in ovn-kubernetes
Support for Hyper-V Isolation
Support for StatefulSet functionality for stateful applications
Produce installation artifacts and documentation that work on any infrastructure and across many public cloud providers like Microsoft Azure, Google Cloud, and Amazon AWS
Continuous Integration/Continuous Delivery (CI/CD) infrastructure for SIG-Windows
Scalability and Performance testing Even though we have not committed to a timeline for GA, SIG-Windows estimates a GA release in the first half of 2018.

Get Involved

As we continue to make progress towards General Availability of this feature in Kubernetes, we welcome you to get involved, contribute code, provide feedback, deploy Windows Server containers to your Kubernetes cluster, or simply join our community.

If you want to get started on deploying Windows Server containers in Kubernetes, read our getting started guide at https://kubernetes.io/docs/getting-started-guides/windows/
We meet every other Tuesday at 12:30 Eastern Standard Time (EST) at https://zoom.us/my/sigwindows. All our meetings are recorded on youtube and referenced at https://www.youtube.com/playlist?list=PL69nYSiGNLP2OH9InCcNkWNu2bl-gmIU4
Chat with us on Slack at https://kubernetes.slack.com/messages/sig-windows
Find us on GitHub at https://github.com/kubernetes/community/tree/master/sig-windows

Thank you,

Michael Michael (@michmike77)
SIG-Windows Lead
Senior Director of Product Management, Apprenda

↧

Blog: Introducing Container Storage Interface (CSI) Alpha for Kubernetes

January 9, 2018, 4:00 pm

≫ Next: Blog: Extensible Admission is Beta

≪ Previous: Blog: Kubernetes v1.9 releases beta support for Windows Server Containers

One of the key differentiators for Kubernetes has been a powerful volume plugin system that enables many different types of storage systems to:

Automatically create storage when required.
Make storage available to containers wherever they’re scheduled.
Automatically delete the storage when no longer needed. Adding support for new storage systems to Kubernetes, however, has been challenging.

Kubernetes 1.9 introduces an alpha implementation of the Container Storage Interface (CSI) which makes installing new volume plugins as easy as deploying a pod. It also enables third-party storage providers to develop solutions without the need to add to the core Kubernetes codebase.

Because the feature is alpha in 1.9, it must be explicitly enabled. Alpha features are not recommended for production usage, but are a good indication of the direction the project is headed (in this case, towards a more extensible and standards based Kubernetes storage ecosystem).

Why Kubernetes CSI?

Kubernetes volume plugins are currently “in-tree”, meaning they’re linked, compiled, built, and shipped with the core kubernetes binaries. Adding support for a new storage system to Kubernetes (a volume plugin) requires checking code into the core Kubernetes repository. But aligning with the Kubernetes release process is painful for many plugin developers.

The existing Flex Volume plugin attempted to address this pain by exposing an exec based API for external volume plugins. Although it enables third party storage vendors to write drivers out-of-tree, in order to deploy the third party driver files it requires access to the root filesystem of node and master machines.

In addition to being difficult to deploy, Flex did not address the pain of plugin dependencies: Volume plugins tend to have many external requirements (on mount and filesystem tools, for example). These dependencies are assumed to be available on the underlying host OS which is often not the case (and installing them requires access to the root filesystem of node machine).

CSI addresses all of these issues by enabling storage plugins to be developed out-of-tree, containerized, deployed via standard Kubernetes primitives, and consumed through the Kubernetes storage primitives users know and love (PersistentVolumeClaims, PersistentVolumes, StorageClasses).

What is CSI?

The goal of CSI is to establish a standardized mechanism for Container Orchestration Systems (COs) to expose arbitrary storage systems to their containerized workloads. The CSI specification emerged from cooperation between community members from various Container Orchestration Systems (COs)–including Kubernetes, Mesos, Docker, and Cloud Foundry. The specification is developed, independent of Kubernetes, and maintained at https://github.com/container-storage-interface/spec/blob/master/spec.md.

Kubernetes v1.9 exposes an alpha implementation of the CSI specification enabling CSI compatible volume drivers to be deployed on Kubernetes and consumed by Kubernetes workloads.

How do I deploy a CSI driver on a Kubernetes Cluster?

CSI plugin authors will provide their own instructions for deploying their plugin on Kubernetes.

How do I use a CSI Volume?

Assuming a CSI storage plugin is already deployed on your cluster, you can use it through the familiar Kubernetes storage primitives: PersistentVolumeClaims, PersistentVolumes, and StorageClasses.

CSI is an alpha feature in Kubernetes v1.9. To enable it, set the following flags:

CSI is an alpha feature in Kubernetes v1.9. To enable it, set the following flags:

API server binary:
--feature-gates=CSIPersistentVolume=true
--runtime-config=storage.k8s.io/v1alpha1=true
API server binary and kubelet binaries:
--feature-gates=MountPropagation=true
--allow-privileged=true

Dynamic Provisioning

You can enable automatic creation/deletion of volumes for CSI Storage plugins that support dynamic provisioning by creating a StorageClass pointing to the CSI plugin.

The following StorageClass, for example, enables dynamic creation of “fast-storage” volumes by a CSI volume plugin called “com.example.team/csi-driver”.

kind: StorageClass

apiVersion: storage.k8s.io/v1

metadata:

  name: fast-storage

provisioner: com.example.team/csi-driver

parameters:

  type: pd-ssd

To trigger dynamic provisioning, create a PersistentVolumeClaim object. The following PersistentVolumeClaim, for example, triggers dynamic provisioning using the StorageClass above.

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: my-request-for-storage

spec:

  accessModes:

  - ReadWriteOnce

  resources:

    requests:

      storage: 5Gi

  storageClassName: fast-storage

When volume provisioning is invoked, the parameter “type: pd-ssd” is passed to the CSI plugin “com.example.team/csi-driver” via a “CreateVolume” call. In response, the external volume plugin provisions a new volume and then automatically create a PersistentVolume object to represent the new volume. Kubernetes then binds the new PersistentVolume object to the PersistentVolumeClaim, making it ready to use.

If the “fast-storage” StorageClass is marked default, there is no need to include the storageClassName in the PersistentVolumeClaim, it will be used by default.

Pre-Provisioned Volumes

You can always expose a pre-existing volume in Kubernetes by manually creating a PersistentVolume object to represent the existing volume. The following PersistentVolume, for example, exposes a volume with the name “existingVolumeName” belonging to a CSI storage plugin called “com.example.team/csi-driver”.

apiVersion: v1

kind: PersistentVolume

metadata:

  name: my-manually-created-pv

spec:

  capacity:

    storage: 5Gi

  accessModes:

    - ReadWriteOnce

  persistentVolumeReclaimPolicy: Retain

  csi:

    driver: com.example.team/csi-driver

    volumeHandle: existingVolumeName

    readOnly: false

Attaching and Mounting

You can reference a PersistentVolumeClaim that is bound to a CSI volume in any pod or pod template.

kind: Pod

apiVersion: v1

metadata:

  name: my-pod

spec:

  containers:

    - name: my-frontend

      image: dockerfile/nginx

      volumeMounts:

      - mountPath: "/var/www/html"

        name: my-csi-volume

  volumes:

    - name: my-csi-volume

      persistentVolumeClaim:

        claimName: my-request-for-storage

When the pod referencing a CSI volume is scheduled, Kubernetes will trigger the appropriate operations against the external CSI plugin (ControllerPublishVolume, NodePublishVolume, etc.) to ensure the specified volume is attached, mounted, and ready to use by the containers in the pod.

For more details please see the CSI implementation design doc and documentation.

How do I create a CSI driver?

Kubernetes is as minimally prescriptive on the packaging and deployment of a CSI Volume Driver as possible. The minimum requirements for deploying a CSI Volume Driver on Kubernetes are documented here.

The minimum requirements document also contains a section outlining the suggested mechanism for deploying an arbitrary containerized CSI driver on Kubernetes. This mechanism can be used by a Storage Provider to simplify deployment of containerized CSI compatible volume drivers on Kubernetes.

As part of this recommended deployment process, the Kubernetes team provides the following sidecar (helper) containers:

external-attacher
- Sidecar container that watches Kubernetes VolumeAttachment objects and triggers ControllerPublish and ControllerUnpublish operations against a CSI endpoint.
external-provisioner
- Sidecar container that watches Kubernetes PersistentVolumeClaim objects and triggers CreateVolume and DeleteVolume operations against a CSI endpoint.
driver-registrar
- Sidecar container that registers the CSI driver with kubelet (in the future), and adds the drivers custom NodeId (retrieved via GetNodeID call against the CSI endpoint) to an annotation on the Kubernetes Node API Object

Storage vendors can build Kubernetes deployments for their plugins using these components, while leaving their CSI driver completely unaware of Kubernetes.

Where can I find CSI drivers?

CSI drivers are developed and maintained by third-parties. You can find example CSI drivers here, but these are provided purely for illustrative purposes, and are not intended to be used for production workloads.

What about Flex?

The Flex Volume plugin exists as an exec based mechanism to create “out-of-tree” volume plugins. Although it has some drawbacks (mentioned above), the Flex volume plugin coexists with the new CSI Volume plugin. SIG Storage will continue to maintain the Flex API so that existing third-party Flex drivers (already deployed in production clusters) continue to work. In the future, new volume features will only be added to CSI, not Flex.

What will happen to the in-tree volume plugins?

Once CSI reaches stability, we plan to migrate most of the in-tree volume plugins to CSI. Stay tuned for more details as the Kubernetes CSI implementation approaches stable.

What are the limitations of alpha?

The alpha implementation of CSI has the following limitations:

The credential fields in CreateVolume, NodePublishVolume, and ControllerPublishVolume calls are not supported.
Block volumes are not supported; only file.
Specifying filesystems is not supported, and defaults to ext4.
CSI drivers must be deployed with the provided “external-attacher,” even if they don’t implement “ControllerPublishVolume”.
Kubernetes scheduler topology awareness is not supported for CSI volumes: in short, sharing information about where a volume is provisioned (zone, regions, etc.) to allow k8s scheduler to make smarter scheduling decisions.

What’s next?

Depending on feedback and adoption, the Kubernetes team plans to push the CSI implementation to beta in either 1.10 or 1.11.

How Do I Get Involved?

This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together. A huge thank you to Vladimir Vivien (vladimirvivien), Jan Šafránek (jsafrane), Chakravarthy Nelluri (chakri-nelluri), Bradley Childs (childsb), Luis Pabón (lpabon), and Saad Ali (saad-ali) for their tireless efforts in bringing CSI to life in Kubernetes.

If you’re interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system, join the Kubernetes Storage Special-Interest-Group (SIG). We’re rapidly growing and always welcome new contributors.

↧

Blog: Extensible Admission is Beta

January 10, 2018, 4:00 pm

≫ Next: Blog: Introducing client-go version 6

≪ Previous: Blog: Introducing Container Storage Interface (CSI) Alpha for Kubernetes

In this post we review a feature, available in the Kubernetes API server, that allows you to implement arbitrary control decisions and which has matured considerably in Kubernetes 1.9.

The admission stage of API server processing is one of the most powerful tools for securing a Kubernetes cluster by restricting the objects that can be created, but it has always been limited to compiled code. In 1.9, we promoted webhooks for admission to beta, allowing you to leverage admission from outside the API server process.

What is Admission?

Admission is the phase of handling an API server request that happens before a resource is persisted, but after authorization. Admission gets access to the same information as authorization (user, URL, etc) and the complete body of an API request (for most requests).

The admission phase is composed of individual plugins, each of which are narrowly focused and have semantic knowledge of what they are inspecting. Examples include: PodNodeSelector (influences scheduling decisions), PodSecurityPolicy (prevents escalating containers), and ResourceQuota (enforces resource allocation per namespace).

Admission is split into two phases:

Mutation, which allows modification of the body content itself as well as rejection of an API request.
Validation, which allows introspection queries and rejection of an API request. An admission plugin can be in both phases, but all mutation happens before validation.

Mutation

The mutation phase of admission allows modification of the resource content before it is persisted. Because the same field can be mutated multiple times while in the admission chain, the order of the admission plugins in the mutation matters.

One example of a mutating admission plugin is the PodNodeSelector plugin, which uses an annotation on a namespace namespace.annotations[“scheduler.alpha.kubernetes.io/node-selector”] to find a label selector and add it to the pod.spec.nodeselector field. This positively restricts which nodes the pods in a particular namespace can land on, as opposed to taints, which provide negative restriction (also with an admission plugin).

Validation

The validation phase of admission allows the enforcement of invariants on particular API resources. The validation phase runs after all mutators finish to ensure that the resource isn’t going to change again.

One example of a validation admission plugin is also the PodNodeSelector plugin, which ensures that all pods’ spec.nodeSelector fields are constrained by the node selector restrictions on the namespace. Even if a mutating admission plugin tries to change the spec.nodeSelector field after the PodNodeSelector runs in the mutating chain, the PodNodeSelector in the validating chain prevents the API resource from being created because it fails validation.

What are admission webhooks?

Admission webhooks allow a Kubernetes installer or a cluster-admin to add mutating and validating admission plugins to the admission chain of kube-apiserver as well as any extensions apiserver based on k8s.io/apiserver 1.9, like metrics, service-catalog, or kube-projects, without recompiling them. Both kinds of admission webhooks run at the end of their respective chains and have the same powers and limitations as compiled admission plugins.

What are they good for?

Webhook admission plugins allow for mutation and validation of any resource on any API server, so the possible applications are vast. Some common use-cases include:

Mutation of resources like pods. Istio has talked about doing this to inject side-car containers into pods. You could also write a plugin which forcefully resolves image tags into image SHAs.
Name restrictions. On multi-tenant systems, reserving namespaces has emerged as a use-case.
Complex CustomResource validation. Because the entire object is visible, a clever admission plugin can perform complex validation on dependent fields (A requires B) and even external resources (compare to LimitRanges).
Security response. If you forced image tags into image SHAs, you could write an admission plugin that prevents certain SHAs from running.

Registration

Webhook admission plugins of both types are registered in the API, and all API servers (kube-apiserver and all extension API servers) share a common config for them. During the registration process, a webhook admission plugin describes:

How to connect to the webhook admission server
How to verify the webhook admission server (Is it really the server I expect?)
Where to send the data at that server (which URL path)
Which resources and which HTTP verbs it will handle
What an API server should do on connection failures (for example, if the admission webhook server goes down)1 apiVersion: admissionregistration.k8s.io/v1beta1 2 kind: ValidatingWebhookConfiguration 3 metadata: 4 name: namespacereservations.admission.online.openshift.io 5 webhooks: 6 - name: namespacereservations.admission.online.openshift.io 7 clientConfig: 8 service: 9 namespace: default 10 name: kubernetes 11 path: /apis/admission.online.openshift.io/v1alpha1/namespacereservations 12 caBundle: KUBE\_CA\_HERE 13 rules: 14 - operations: 15 - CREATE 16 apiGroups: 17 - "" 18 apiVersions: 19 - "\*" 20 resources: 21 - namespaces 22 failurePolicy: Fail Line 6: name - the name for the webhook itself. For mutating webhooks, these are sorted to provide ordering.
Line 7: clientConfig - provides information about how to connect to, trust, and send data to the webhook admission server.
Line 13: rules - describe when an API server should call this admission plugin. In this case, only for creates of namespaces. You can specify any resource here so specifying creates of serviceinstances.servicecatalog.k8s.io is also legal.
Line 22: failurePolicy - says what to do if the webhook admission server is unavailable. Choices are “Ignore” (fail open) or “Fail” (fail closed). Failing open makes for unpredictable behavior for all clients.

Authentication and trust

Because webhook admission plugins have a lot of power (remember, they get to see the API resource content of any request sent to them and might modify them for mutating plugins), it is important to consider:

How individual API servers verify their connection to the webhook admission server
How the webhook admission server authenticates precisely which API server is contacting it
Whether that particular API server has authorization to make the request There are three major categories of connection:

From kube-apiserver or extension-apiservers to externally hosted admission webhooks (webhooks not hosted in the cluster)
From kube-apiserver to self-hosted admission webhooks
From extension-apiservers to self-hosted admission webhooks To support these categories, the webhook admission plugins accept a kubeconfig file which describes how to connect to individual servers. For interacting with externally hosted admission webhooks, there is really no alternative to configuring that file manually since the authentication/authorization and access paths are owned by the server you’re hooking to.

For the self-hosted category, a cleverly built webhook admission server and topology can take advantage of the safe defaulting built into the admission plugin and have a secure, portable, zero-config topology that works from any API server.

Simple, secure, portable, zero-config topology

If you build your webhook admission server to also be an extension API server, it becomes possible to aggregate it as a normal API server. This has a number of advantages:

Your webhook becomes available like any other API under default kube-apiserver service kubernetes.default.svc (e.g. https://kubernetes.default.svc/apis/admission.example.com/v1/mymutatingadmissionreviews). Among other benefits, you can test using kubectl.
Your webhook automatically (without any config) makes use of the in-cluster authentication and authorization provided by kube-apiserver. You can restrict access to your webhook with normal RBAC rules.
Your extension API servers and kube-apiserver automatically (without any config) make use of their in-cluster credentials to communicate with the webhook.
Extension API servers do not leak their service account token to your webhook because they go through kube-apiserver, which is a secure front proxy.

_Source: https://drive.google.com/a/redhat.com/file/d/12nC9S2fWCbeX_P8nrmL6NgOSIha4HDNp_

In short: a secure topology makes use of all security mechanisms of API server aggregation and additionally requires no additional configuration.

Other topologies are possible but require additional manual configuration as well as a lot of effort to create a secure setup, especially when extension API servers like service catalog come into play. The topology above is zero-config and portable to every Kubernetes cluster.

How do I write a webhook admission server?

Writing a full server complete with authentication and authorization can be intimidating. To make it easier, there are projects based on Kubernetes 1.9 that provide a library for building your webhook admission server in 200 lines or less. Take a look at the generic-admission-apiserver and the kubernetes-namespace-reservation projects for the library and an example of how to build your own secure and portable webhook admission server.

With the admission webhooks introduced in 1.9 we’ve made Kubernetes even more adaptable to your needs. We hope this work, driven by both Red Hat and Google, will enable many more workloads and support ecosystem components. (Istio is one example.) Now is a good time to give it a try!

If you’re interested in giving feedback or contributing to this area, join us in the SIG API machinery.

↧

Blog: Introducing client-go version 6

January 11, 2018, 4:00 pm

≫ Next: Blog: Core Workloads API GA

≪ Previous: Blog: Extensible Admission is Beta

The Kubernetes API server exposes a REST interface consumable by any client. client-go is the official client library for the Go programming language. It is used both internally by Kubernetes itself (for example, inside kubectl) as well as by numerous external consumers:operators like the etcd-operator or prometheus-operator;higher level frameworks like KubeLess and OpenShift; and many more.

The version 6 update to client-go adds support for Kubernetes 1.9, allowing access to the latest Kubernetes features. While the changelog contains all the gory details, this blog post highlights the most prominent changes and intends to guide on how to upgrade from version 5.

This blog post is one of a number of efforts to make client-go more accessible to third party consumers. Easier access is a joint effort by a number of people from numerous companies, all meeting in the #client-go-docs channel of the Kubernetes Slack. We are happy to hear feedback and ideas for further improvement, and of course appreciate anybody who wants to contribute.

API group changes

The following API group promotions are part of Kubernetes 1.9:

Workload objects (Deployments, DaemonSets, ReplicaSets, and StatefulSets) have been promoted to the apps/v1 API group in Kubernetes 1.9. client-go follows this transition and allows developers to use the latest version by importing the k8s.io/api/apps/v1 package instead of k8s.io/api/apps/v1beta1 and by using Clientset.AppsV1().
Admission Webhook Registration has been promoted to the admissionregistration.k8s.io/v1beta1 API group in Kubernetes 1.9. The former ExternalAdmissionHookConfiguration type has been replaced by the incompatible ValidatingWebhookConfiguration and MutatingWebhookConfiguration types. Moreover, the webhook admission payload type AdmissionReview in admission.k8s.io has been promoted to v1beta1. Note that versioned objects are now passed to webhooks. Refer to the admission webhook documentation for details.

Validation for CustomResources

In Kubernetes 1.8 we introduced CustomResourceDefinitions (CRD) pre-persistence schema validation as an alpha feature. With 1.9, the feature got promoted to beta and will be enabled by default. As a client-go user, you will find the API types at k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1beta1.

The OpenAPI v3 schema can be defined in the CRD spec as:


apiVersion: apiextensions.k8s.io/v1beta1  
kind: CustomResourceDefinition  
metadata: ...  
spec:  
  ...  
  validation:  
    openAPIV3Schema:  
      properties:  
        spec:  
          properties:  
            version:  
                type: string  
                enum:  
                - "v1.0.0"  
                - "v1.0.1"  
            replicas:  
                type: integer  
                minimum: 1  
                maximum: 10

The schema in the above CRD applies following validations for the instance:

spec.version must be a string and must be either “v1.0.0” or “v1.0.1”.
spec.replicas must be an integer and must have a minimum value of 1 and a maximum value of 10. A CustomResource with invalid values for spec.version (v1.0.2) and spec.replicas (15) will be rejected:


apiVersion: mygroup.example.com/v1  
kind: App  
metadata:  
  name: example-app  
spec:  
  version: "v1.0.2"  
  replicas: 15

$ kubectl create -f app.yaml

The App "example-app" is invalid: []: Invalid value: map[string]interface {}{"apiVersion":"mygroup.example.com/v1", "kind":"App", "metadata":map[string]interface {}{"creationTimestamp":"2017-08-31T20:52:54Z", "uid":"5c674651-8e8e-11e7-86ad-f0761cb232d1", "selfLink":"", "clusterName":"", "name":"example-app", "namespace":"default", "deletionTimestamp":interface {}(nil), "deletionGracePeriodSeconds":(\*int64)(nil)}, "spec":map[string]interface {}{"replicas":15, "version":"v1.0.2"}}:  
validation failure list:  
spec.replicas in body should be less than or equal to 10  
spec.version in body should be one of [v1.0.0 v1.0.1]

Note that with Admission Webhooks, Kubernetes 1.9 provides another beta feature to validate objects before they are created or updated. Starting with 1.9, these webhooks also allow mutation of objects (for example, to set defaults or to inject values). Of course, webhooks work with CRDs as well. Moreover, webhooks can be used to implement validations that are not easily expressible with CRD validation. Note that webhooks are harder to implement than CRD validation, so for many purposes, CRD validation is the right tool.

Creating namespaced informers

Often objects in one namespace or only with certain labels are to be processed in a controller. Informers now allow you to tweak the ListOptions used to query the API server to list and watch objects. Uninitialized objects (for consumption by initializers) can be made visible by setting IncludeUnitialized to true. All this can be done using the new NewFilteredSharedInformerFactory constructor for shared informers:


import “k8s.io/client-go/informers”
...  
sharedInformers := informers.NewFilteredSharedInformerFactory(  
 client,  
 30\*time.Minute,   
 “some-namespace”,  
 func(opt \*metav1.ListOptions) {  
  opt.LabelSelector = “foo=bar”  
 },  
)

Note that the corresponding lister will only know about the objects matching the namespace and the given ListOptions. Note that the same restrictions apply for a List or Watch call on a client.

This production code example of a cert-manager demonstrates how namespace informers can be used in real code.

Polymorphic scale client

Historically, only types in the extensions API group would work with autogenerated Scale clients. Furthermore, different API groups use different Scale types for their /scale subresources. To remedy these issues, k8s.io/client-go/scale provides a polymorphic scale client to scale different resources in different API groups in a coherent way:


import (


apimeta "k8s.io/apimachinery/pkg/api/meta"

 discocache "k8s.io/client-go/discovery/cached"  "k8s.io/client-go/discovery""k8s.io/client-go/dynamic"

“k8s.io/client-go/scale”  
)

...

cachedDiscovery := discocache.NewMemCacheClient(client.Discovery())  
restMapper := discovery.NewDeferredDiscoveryRESTMapper(

cachedDiscovery,

apimeta.InterfacesForUnstructured,

)  
scaleKindResolver := scale.NewDiscoveryScaleKindResolver(

client.Discovery(),

)  
scaleClient, err := scale.NewForConfig(

client, restMapper,

dynamic.LegacyAPIPathResolverFunc,

scaleKindResolver,

)
scale, err := scaleClient.Scales("default").Get(groupResource, "foo")

The returned scale object is generic and is exposed as the autoscaling/v1.Scale object. It is backed by an internal Scale type, with conversions defined to and from all the special Scale types in the API groups supporting scaling. We planto extend this to CustomResources in 1.10.

If you’re implementing support for the scale subresource, we recommend that you expose the autoscaling/v1.Scale object.

Type-safe DeepCopy

Deeply copying an object formerly required a call to Scheme.Copy(Object) with the notable disadvantage of losing type safety. A typical piece of code from client-go version 5 required type casting:


newObj, err := runtime.NewScheme().Copy(node)


if err != nil {

    return fmt.Errorf("failed to copy node %v: %s”, node, err)

}


newNode, ok := newObj.(\*v1.Node)

if !ok {

    return fmt.Errorf("failed to type-assert node %v", newObj)


}

Thanks to k8s.io/code-generator, Copy has now been replaced by a type-safe DeepCopy method living on each object, allowing you to simplify code significantly both in terms of volume and API error surface:

newNode := node.DeepCopy()

No error handling is necessary: this call never fails. If and only if the node is nil does DeepCopy() return nil.

To copy runtime.Objects there is an additional DeepCopyObject() method in the runtime.Object interface.

With the old method gone for good, clients need to update their copy invocations accordingly.

Code generation and CustomResources

Using client-go’s dynamic client to access CustomResources is discouraged and superseded by type-safe code using the generators in k8s.io/code-generator. Check out the Deep Dive on the Open Shift blog to learn about using code generation with client-go.

Comment Blocks

You can now place tags in the comment block just above a type or function, or in the second block above. There is no distinction anymore between these two comment blocks. This used to a be a source of subtle errors when using the generators:

// second block above  
// +k8s:some-tag  

// first block above  
// +k8s:another-tag  
type Foo struct {}

Custom Client Methods

You can now use extended tag definitions to create custom verbs . This lets you expand beyond the verbs defined by HTTP. This opens the door to higher levels of customization.

For example, this block leads to the generation of the method UpdateScale(s *autoscaling.Scale) (*autoscaling.Scale, error):

// genclient:method=UpdateScale,verb=update,subresource=scale,input=k8s.io/kubernetes/pkg/apis/autoscaling.Scale,result=k8s.io/kubernetes/pkg/apis/autoscaling.Scale

Resolving Golang Naming Conflicts

In more complex API groups it’s possible for Kinds, the group name, the Go package name, and the Go group alias name to conflict. This was not handled correctly prior to 1.9. The following tags resolve naming conflicts and make the generated code prettier:

// +groupName=example2.example.com  
// +groupGoName=SecondExample

These are usually in the doc.go file of an API package. The first is used as the CustomResource group name when RESTfully speaking to the API server using HTTP. The second is used in the generated Golang code (for example, in the clientset) to access the group version:

clientset.SecondExampleV1()

It’s finally possible to have dots in Go package names. In this section’s example, you would put the groupName snippet into the pkg/apis/example2.example.com directory of your project.

Example projects

Kubernetes 1.9 includes a number of example projects which can serve as a blueprint for your own projects:

k8s.io/sample-apiserver is a simple user-provided API server that is integrated into a cluster via API aggregation.
k8s.io/sample-controller is a full-featured controller (also called an operator) with shared informers and a workqueue to process created, changed or deleted objects. It is based on CustomResourceDefinitions and uses k8s.io/code-generator to generate deepcopy functions, typed clientsets, informers, and listers.

Vendoring

In order to update from the previous version 5 to version 6 of client-go, the library itself as well as certain third-party dependencies must be updated. Previously, this process had been tedious due to the fact that a lot of code got refactored or relocated within the existing package layout across releases. Fortunately, far less code had to move in the latest version, which should ease the upgrade procedure for most users.

State of the published repositories

In the past k8s.io/client-go, k8s.io/api, and k8s.io/apimachinery were updated infrequently. Tags (for example, v4.0.0) were created quite some time after the Kubernetes releases. With the 1.9 release we resumed running a nightly bot that updates all the repositories for public consumption, even before manual tagging. This includes the branches:

master
release-1.8 / release-5.0
release-1.9 / release-6.0 Kubernetes tags (for example, v1.9.1-beta1) are also applied automatically to the published repositories, prefixed with kubernetes- (for example, kubernetes-1.9.1-beta1).

These tags have limited test coverage, but can be used by early adopters of client-go and the other libraries. Moreover, they help to vendor the correct version of k8s.io/api and k8s.io/apimachinery. Note that we only create a v6.0.3-like semantic versioning tag on k8s.io/client-go. The corresponding tag for k8s.io/api and k8s.io/apimachinery is kubernetes-1.9.3.

Also note that only these tags correspond to tested releases of Kubernetes. If you depend on the release branch, e.g., release-1.9, your client is running on unreleased Kubernetes code.

State of vendoring of client-go

In general, the list of which dependencies to vendor is automatically generated and written to the file Godeps/Godeps.json. Only the revisions listed there are tested. This means especially that we do not and cannot test the code-base against master branches of our dependencies. This puts us in the following situation depending on the used vendoring tool:

godep reads Godeps/Godeps.json by running godep restore from k8s.io/client-go in your GOPATH. Then use godep save to vendor in your project. godep will choose the correct versions from your GOPATH.
glide reads Godeps/Godeps.json automatically from its dependencies including from k8s.io/client-go, both on init and on update. Hence, glide should be mostly automatic as long as there are no conflicts.
dep does not currently respect Godeps/Godeps.json in a consistent way, especially not on updates. It is crucial to specify client-go dependencies manually as constraints or overrides, also for non k8s.io/* dependencies. Without those, dep simply chooses the dependency master branches, which can cause problems as they are updated frequently.
The Kubernetes and golang/dep community are aware of the problems [issue #1124, issue #1236] and are working together on solutions. Until then special care must be taken. Please see client-go’s INSTALL.md for more details.

Updating dependencies – golang/dep

Even with the deficiencies of golang/dep today, dep is slowly becoming the de-facto standard in the Go ecosystem. With the necessary care and the awareness of the missing features, dep can be (and is!) used successfully. Here’s a demonstration of how to update a project with client-go 5 to the latest version 6 using dep:

(If you are still running client-go version 4 and want to play it safe by not skipping a release, now is a good time to check out this excellent blog post describing how to upgrade to version 5, put together by our friends at Heptio.)

Before starting, it is important to understand that client-go depends on two other Kubernetes projects: k8s.io/apimachinery and k8s.io/api. In addition, if you are using CRDs, you probably also depend on k8s.io/apiextensions-apiserver for the CRD client. The first exposes lower-level API mechanics (such as schemes, serialization, and type conversion), the second holds API definitions, and the third provides APIs related to CustomResourceDefinitions. In order for client-go to operate correctly, it needs to have its companion libraries vendored in correspondingly matching versions. Each library repository provides a branch named release-<version> where <version> refers to a particular Kubernetes version; for client-go version 6, it is imperative to refer to the release-1.9 branch on each repository.

Assuming the latest version 5 patch release of client-go being vendored through dep, the Gopkg.toml manifest file should look something like this (possibly using branches instead of versions):






[[constraint]]


  name = "k8s.io/api"

  version = "kubernetes-1.8.1"


[[constraint]]

  name = "k8s.io/apimachinery"

  version = "kubernetes-1.8.1"


[[constraint]]

  name = "k8s.io/apiextensions-apiserver"

  version = "kubernetes-1.8.1"


[[constraint]]

  name = "k8s.io/client-go"




  version = "5.0.1"

Note that some of the libraries could be missing if they are not actually needed by the client.

Upgrading to client-go version 6 means bumping the version and tag identifiers as following ( emphasis given):






[constraint]]


  name = "k8s.io/api"

  version = "kubernetes-1.9.0"


[[constraint]]

  name = "k8s.io/apimachinery"

  version = "kubernetes-1.9.0"


[[constraint]]

  name = "k8s.io/apiextensions-apiserver"

  version = "kubernetes-1.9.0"


[[constraint]]

  name = "k8s.io/client-go"




  version = "6.0.0"

The result of the upgrade can be found here.

A note of caution: dep cannot capture the complete set of dependencies in a reliable and reproducible fashion as described above. This means that for a 100% future-proof project you have to add constraints (or even overrides) to many other packages listed in client-go’s Godeps/Godeps.json. Be prepared to add them if something breaks. We are working with the golang/dep community to make this an easier and more smooth experience.

Finally, we need to tell dep to upgrade to the specified versions by executing dep ensure. If everything goes well, the output of the command invocation should be empty, with the only indication that it was successful being a number of updated files inside the vendor folder.

If you are using CRDs, you probably also use code-generation. The following block for Gopkg.toml will add the required code-generation packages to your project:


required = [  "k8s.io/code-generator/cmd/client-gen",  "k8s.io/code-generator/cmd/conversion-gen",  "k8s.io/code-generator/cmd/deepcopy-gen",  "k8s.io/code-generator/cmd/defaulter-gen",  "k8s.io/code-generator/cmd/informer-gen",  "k8s.io/code-generator/cmd/lister-gen",  
]


[[constraint]]

  branch = "kubernetes-1.9.0"


  name = "k8s.io/code-generator"

Whether you would also like to prune unneeded packages (such as test files) through dep or commit the changes into the VCS at this point is up to you – but from an upgrade perspective, you should now be ready to harness all the fancy new features that Kubernetes 1.9 brings through client-go.

↧

Blog: Core Workloads API GA

January 14, 2018, 4:00 pm

≫ Next: Blog: Reporting Errors from Control Plane to Applications Using Kubernetes Events

≪ Previous: Blog: Introducing client-go version 6

DaemonSet, Deployment, ReplicaSet, and StatefulSet are GA

Editor’s Note: We’re happy to announce that the Core Workloads API is GA in Kubernetes 1.9! This blog post from Kenneth Owens reviews how Core Workloads got to GA from its origins, reveals changes in 1.9, and talks about what you can expect going forward.

In the Beginning …

There were Pods, tightly coupled containers that share resource requirements, networking, storage, and a lifecycle. Pods were useful, but, as it turns out, users wanted to seamlessly, reproducibly, and automatically create many identical replicas of the same Pod, so we created ReplicationController.

Replication was a step forward, but what users really needed was higher level orchestration of their replicated Pods. They wanted rolling updates, roll backs, and roll overs. So the OpenShift team created DeploymentConfig. DeploymentConfigs were also useful, and OpenShift users were happy. In order to allow all OSS Kubernetes uses to share in the elation, and to take advantage of set-based label selectors, ReplicaSet and Deployment were added to the extensions/v1beta1 group version providing rolling updates, roll backs, and roll overs for all Kubernetes users.

That mostly solved the problem of orchestrating containerized 12 factor apps on Kubernetes, so the community turned its attention to a different problem. Replicating a Pod <n> times isn’t the right hammer for every nail in your cluster. Sometimes, you need to run a Pod on every Node, or on a subset of Nodes (for example, shared side cars like log shippers and metrics collectors, Kubernetes add-ons, and Distributed File Systems). The state of the art was Pods combined with NodeSelectors, or static Pods, but this is unwieldy. After having grown used to the ease of automation provided by Deployments, users demanded the same features for this category of application, so DaemonSet was added to extension/v1beta1 as well.

For a time, users were content, until they decided that Kubernetes needed to be able to orchestrate more than just 12 factor apps and cluster infrastructure. Whether your architecture is N-tier, service oriented, or micro-service oriented, your 12 factor apps depend on stateful workloads (for example, RDBMSs, distributed key value stores, and messaging queues) to provide services to end users and other applications. These stateful workloads can have availability and durability requirements that can only be achieved by distributed systems, and users were ready to use Kubernetes to orchestrate the entire stack.

While Deployments are great for stateless workloads, they don’t provide the right guarantees for the orchestration of distributed systems. These applications can require stable network identities, ordered, sequential deployment, updates, and deletion, and stable, durable storage. PetSet was added to the apps/v1beta1 group version to address this category of application. Unfortunately, we were less than thoughtful with its naming, and, as we always strive to be an inclusive community, we renamed the kind to StatefulSet.

Finally, we were done.

…Or were we?

Kubernetes 1.8 and apps/v1beta2

Pod, ReplicationController, ReplicaSet, Deployment, DaemonSet, and StatefulSet came to collectively be known as the core workloads API. We could finally orchestrate all of the things, but the API surface was spread across three groups, had many inconsistencies, and left users wondering about the stability of each of the core workloads kinds. It was time to stop adding new features and focus on consistency and stability.

Pod and ReplicationController were at GA stability, and even though you can run a workload in a Pod, it’s a nucleus primitive that belongs in core. As Deployments are the recommended way to manage your stateless apps, moving ReplicationController would serve no purpose. In Kubernetes 1.8, we moved all the other core workloads API kinds (Deployment, DaemonSet, ReplicaSet, and StatefulSet) to the apps/v1beta2 group version. This had the benefit of providing a better aggregation across the API surface, and allowing us to break backward compatibility to fix inconsistencies. Our plan was to promote this new surface to GA, wholesale and as is, when we were satisfied with its completeness. The modifications in this release, which are also implemented in apps/v1, are described below.

Selector Defaulting Deprecated

In prior versions of the apps and extensions groups, label selectors of the core workloads API kinds were, when left unspecified, defaulted to a label selector generated from the kind’s template’s labels.

This was completely incompatible with strategic merge patch and kubectl apply. Moreover, we’ve found that defaulting the value of a field from the value of another field of the same object is an anti-pattern, in general, and particularly dangerous for the API objects used to orchestrate workloads.

Immutable Selectors

Selector mutation, while allowing for some use cases like promotable Deployment canaries, is not handled gracefully by our workload controllers, and we have always strongly cautioned users against it. To provide a consistent, usable, and stable API, selectors were made immutable for all kinds in the workloads API.

We believe that there are better ways to support features like promotable canaries and orchestrated Pod relabeling, but, if restricted selector mutation is a necessary feature for our users, we can relax immutability in the future without breaking backward compatibility.

The development of features like promotable canaries, orchestrated Pod relabeling, and restricted selector mutability is driven by demand signals from our users. If you are currently modifying the selectors of your core workload API objects, please tell us about your use case via a GitHub issue, or by participating in SIG apps.

Default Rolling Updates

Prior to apps/v1beta2, some kinds defaulted their update strategy to something other than RollingUpdate (e.g. app/v1beta1/StatefulSet uses OnDelete by default). We wanted to be confident that RollingUpdate worked well prior to making it the default update strategy, and we couldn’t change the default behavior in released versions without breaking our promise with respect to backward compatibility. In apps/v1beta2 we enabled RollingUpdate for all core workloads kinds by default.

CreatedBy Annotation Deprecated

The “kubernetes.io/created-by” was a legacy hold over from the days before garbage collection. Users should use an object’s ControllerRef from its ownerReferences to determine object ownership. We deprecated this feature in 1.8 and removed it in 1.9.

Scale Subresources

A scale subresource was added to all of the applicable kinds in apps/v1beta2 (DaemonSet scales based on its node selector).

Kubernetes 1.9 and apps/v1

In Kubernetes 1.9, as planned, we promoted the entire core workloads API surface to GA in the apps/v1 group version. We made a few more changes to make the API consistent, but apps/v1 is mostly identical to apps/v1beta2. The reality is that most users have been treating the beta versions of the core workloads API as GA for some time now. Anyone who is still using ReplicationControllers and shying away from DaemonSets, Deployments, and StatefulSets, due to a perceived lack of stability, should plan migrate their workloads (where applicable) to apps/v1. The minor changes that were made during promotion are described below.

Garbage Collection Defaults to Delete

Prior to apps/v1 the default garbage collection policy for Pods in a DaemonSet, Deployment, ReplicaSet, or StatefulSet, was to orphan the Pods. That is, if you deleted one of these kinds, the Pods that they owned would not be deleted automatically unless cascading deletion was explicitly specified. If you use kubectl, you probably didn’t notice this, as these kinds are scaled to zero prior to deletion. In apps/v1 all core worloads API objects will now, by default, be deleted when their owner is deleted. For most users, this change is transparent.
Status Conditions

Prior to apps/v1 only Deployment and ReplicaSet had Conditions in their Status objects. For consistency’s sake, either all of the objects or none of them should have conditions. After some debate, we decided that Conditions are useful, and we added Conditions to StatefulSetStatus and DaemonSetStatus. The StatefulSet and DaemonSet controllers currently don’t populate them, but we may choose communicate conditions to clients, via this mechanism, in the future.

Scale Subresource Migrated to autoscale/v1

We originally added a scale subresource to the apps group. This was the wrong direction for integration with the autoscaling, and, at some point, we would like to use custom metrics to autoscale StatefulSets. So the apps/v1 group version uses the autoscaling/v1 scale subresource.

Migration and Deprecation

The question most you’re probably asking now is, “What’s my migration path onto apps/v1 and how soon should I plan on migrating?” All of the group versions prior to apps/v1 are deprecated as of Kubernetes 1.9, and all new code should be developed against apps/v1, but, as discussed above, many of our users treat extensions/v1beta1 as if it were GA. We realize this, and the minimum support timelines in our deprecation policy are just that, minimums.

In future releases, before completely removing any of the group versions, we will disable them by default in the API Server. At this point, you will still be able to use the group version, but you will have to explicitly enable it. We will also provide utilities to upgrade the storage version of the API objects to apps/v1. Remember, all of the versions of the core workloads kinds are bidirectionally convertible. If you want to manually update your core workloads API objects now, you can use kubectl convert to convert manifests between group versions.

What’s Next?

The core workloads API surface is stable, but it’s still software, and software is never complete. We often add features to stable APIs to support new use cases, and we will likely do so for the core workloads API as well. GA stability means that any new features that we do add will be strictly backward compatible with the existing API surface. From this point forward, nothing we do will break our backwards compatibility guarantees. If you’re looking to participate in the evolution of this portion of the API, please feel free to get involved in GitHub or to participate in SIG Apps.

–Kenneth Owens, Software Engineer, Google

Download Kubernetes
Get involved with the Kubernetes project on GitHub
Post questions (or answer questions) on Stack Overflow
Connect with the community on Slack
Follow us on Twitter @Kubernetesio for latest updates

↧

Blog: Reporting Errors from Control Plane to Applications Using Kubernetes Events

January 24, 2018, 4:00 pm

≫ Next: Blog: Kubernetes: First Beta Version of Kubernetes 1.10 is Here

≪ Previous: Blog: Core Workloads API GA

At Box, we manage several large scale Kubernetes clusters that serve as an internal platform as a service (PaaS) for hundreds of deployed microservices. The majority of those microservices are applications that power box.com for over 80,000 customers. The PaaS team also deploys several services affiliated with the platform infrastructure as the control plane.

One use case of Box’s control plane is public key infrastructure (PKI) processing. In our infrastructure, applications needing a new SSL certificate also need to trigger some processing in the control plane. The majority of our applications are not allowed to generate new SSL certificates due to security reasons. The control plane has a different security boundary and network access, and is therefore allowed to generate certificates.

| | | Figure1: Block Diagram of the PKI flow |

If an application needs a new certificate, the application owner explicitly adds a Custom Resource Definition (CRD) to the application’s Kubernetes config [1]. This CRD specifies parameters for the SSL certificate: name, common name, and others. A microservice in the control plane watches CRDs and triggers some processing for SSL certificate generation [2]. Once the certificate is ready, the same control plane service sends it to the API server in a Kubernetes Secret [3]. After that, the application containers access their certificates using Kubernetes Secret VolumeMounts [4]. You can see a working demo of this system in our example application on GitHub.

The rest of this post covers the error scenarios in this “triggered” processing in the control plane. In particular, we are especially concerned with user input errors. Because the SSL certificate parameters come from the application’s config file in a CRD format, what should happen if there is an error in that CRD specification? Even a typo results in a failure of the SSL certificate creation. The error information is available in the control plane even though the root cause is most probably inside the application’s config file. The application owner does not have access to the control plane’s state or logs.

Providing the right diagnosis to the application owner so she can fix the mistake becomes a serious productivity problem at scale. Box’s rapid migration to microservices results in several new deployments every week. Numerous first time users, who do not know every detail of the infrastructure, need to succeed in deploying their services and troubleshooting problems easily. As the owners of the infrastructure, we do not want to be the bottleneck while reading the errors from the control plane logs and passing them on to application owners. If something in an owner’s configuration causes an error somewhere else, owners need a fully empowering diagnosis. This error data must flow automatically without any human involvement.

After considerable thought and experimentation, we found that Kubernetes Events work great to automatically communicate these kind of errors. If the error information is placed in a pod’s event stream, it shows up in kubectl describe output. Even beginner users can execute kubectl describe pod and obtain an error diagnosis.

We experimented with a status web page for the control plane service as an alternative to Kubernetes Events. We determined that the status page could update every time after processing an SSL certificate, and that application owners could probe the status page and get the diagnosis from there. After experimenting with a status page initially, we have seen that this does not work as effectively as the Kubernetes Events solution. The status page becomes a new interface to learn for the application owner, a new web address to remember, and one more context switch to a distinct tool during troubleshooting efforts. On the other hand, Kubernetes Events show up cleanly at the kubectl describe output, which is easily recognized by the developers.

Here is a simplified example showing how we used Kubernetes Events for error reporting across distinct services. We have open sourced a sample golang application representative of the previously mentioned control plane service. It watches changes on CRDs and does input parameter checking. If an error is discovered, a Kubernetes Event is generated and the relevant pod’s event stream is updated.

The sample application executes this code to setup the Kubernetes Event generation:

// eventRecorder returns an EventRecorder type that can be  
// used to post Events to different object's lifecycles.  
func eventRecorder(  
   kubeClient \*kubernetes.Clientset) (record.EventRecorder, error) {  
   eventBroadcaster := record.NewBroadcaster()  
   eventBroadcaster.StartLogging(glog.Infof)  
   eventBroadcaster.StartRecordingToSink(  &typedcorev1.EventSinkImpl{  
         Interface: kubeClient.CoreV1().Events("")})  
   recorder := eventBroadcaster.NewRecorder(  
      scheme.Scheme,  
      v1.EventSource{Component: "controlplane"})  
   return recorder, nil  
}

After the one-time setup, the following code generates events affiliated with pods:

ref, err := reference.GetReference(scheme.Scheme, &pod)  
if err != nil {  
   glog.Fatalf("Could not get reference for pod %v: %v\n",  
      pod.Name, err)  
}  
recorder.Event(ref, v1.EventTypeWarning, "pki ServiceName error",  
   fmt.Sprintf("ServiceName: %s in pki: %s is not found in"+  " allowedNames: %s", pki.Spec.ServiceName, pki.Name,  
      allowedNames))

Further implementation details can be understood by running the sample application.

As mentioned previously, here is the relevant kubectl describe output for the application owner.

Events:  
  FirstSeen   LastSeen   Count   From         SubObjectPath   Type      Reason         Message  
  ---------   --------   -----   ----         -------------   --------   ------     
  ....  
  1d      1m      24   controlplane            Warning      pki ServiceName error   ServiceName: appp1 in pki: app1-pki is not found in allowedNames: [app1 app2]  
  ....

We have demonstrated a practical use case with Kubernetes Events. The automated feedback to programmers in the case of configuration errors has significantly improved our troubleshooting efforts. In the future, we plan to use Kubernetes Events in various other applications under similar use cases. The recently created sample-controller example also utilizes Kubernetes Events in a similar scenario. It is great to see there are more sample applications to guide the community. We are excited to continue exploring other use cases for Events and the rest of the Kubernetes API to make development easier for our engineers.

If you have a Kubernetes experience you’d like to share, submit your story. If you use Kubernetes in your organization and want to voice your experience more directly, consider joining the CNCF End User Community that Box and dozens of like-minded companies are part of.

Special thanks for Greg Lyons and Mohit Soni for their contributions.
Hakan Baba, Sr. Software Engineer, Box

↧

Blog: Kubernetes: First Beta Version of Kubernetes 1.10 is Here

March 1, 2018, 4:00 pm

≫ Next: Blog: Apache Spark 2.3 with Native Kubernetes Support

≪ Previous: Blog: Reporting Errors from Control Plane to Applications Using Kubernetes Events

Editor’s note: Today’s post is by Nick Chase. Nick is Head of Content at Mirantis. The Kubernetes community has released the first beta version of Kubernetes 1.10, which means you can now try out some of the new features and give your feedback to the release team ahead of the official release. The release, currently scheduled for March 21, 2018, is targeting the inclusion of more than a dozen brand new alpha features and more mature versions of more than two dozen more.

Specifically, Kubernetes 1.10 will include production-ready versions of Kubelet TLS Bootstrapping, API aggregation, and more detailed storage metrics.

Some of these features will look familiar because they emerged at earlier stages in previous releases. Each stage has specific meanings:

stable: The same as “generally available”, features in this stage have been thoroughly tested and can be used in production environments.
beta: The feature has been around long enough that the team is confident that the feature itself is on track to be included as a stable feature, and any API calls aren’t going to change. You can use and test these features, but including them in mission-critical production environments is not advised because they are not completely hardened.
alpha: New features generally come in at this stage. These features are still being explored. APIs and options may change in future versions, or the feature itself may disappear. Definitely not for production environments. You can download the latest release of Kubernetes 1.10 from . To give feedback to the development community, create an issue in the Kubernetes 1.10 milestone and tag the appropriate SIG before March 9.

Here’s what to look for, though you should remember that while this is the current plan as of this writing, there’s always a possibility that one or more features may be held for a future release. We’ll start with authentication.

Authentication (SIG-Auth)

Kubelet TLS Bootstrap (stable): Kubelet TLS bootstrapping is probably the “headliner” of the Kubernetes 1.10 release as it becomes available for production environments. It provides the ability for a new kubelet to create a certificate signing request, which enables you to add new nodes to your cluster without having to either manually add security certificates or use self-signed certificates that eliminate many of the benefits of having certificates in the first place.
Pod Security Policy moves to its own API group (beta): The beta release of the Pod Security Policy lets administrators decide what contexts pods can run in. In other words, you have the ability to prevent unprivileged users from creating privileged pods – that is, pods that can perform actions such as writing files or accessing Secrets – in particular namespaces.
Limit node access to API (beta): Also in beta, you now have the ability to limit calls to the API on a node to just that specific node, and to ensure that a node is only calling its own API, and not those on other nodes.
External client-go credential providers (alpha): client-go is the Go language client for accessing the Kubernetes API. This feature adds the ability to add external credential providers. For example, Amazon might want to create its own authenticator to validate interaction with EKS clusters; this feature enables them to do that without having to include their authenticator in the Kubernetes codebase.
TokenRequest API (alpha): The TokenRequest API provides the groundwork for much needed improvements to service account tokens; this feature enables creation of tokens that aren’t persisted in the Secrets API, that are targeted for specific audiences (such as external secret stores), have configurable expiries, and are bindable to specific pods.

Networking (SIG-Network)

Support configurable pod resolv.conf (beta): You now have the ability to specifically control DNS for a single pod, rather than relying on the overall cluster DNS.
Although the feature is called Switch default DNS plugin to CoreDNS (beta), that’s not actually what will happen in this cycle. The community has been working on the switch from kube-dns, which includes dnsmasq, to CoreDNS, another CNCF project with fewer moving parts, for several releases. In Kubernetes 1.10, the default will still be kube-dns, but when CoreDNS reaches feature parity with kube-dns, the team will look at making it the default.
Topology aware routing of services (alpha): The ability to distribute workloads is one of the advantages of Kubernetes, but one thing that has been missing until now is the ability to keep workloads and services geographically close together for latency purposes. Topology aware routing will help with this problem. (This functionality may be delayed until Kubernetes 1.11.)
Make NodePort IP address configurable (alpha): Not having to specify IP addresses in a Kubernetes cluster is great – until you actually need to know what one of those addresses is ahead of time, such as for setting up database replication or other tasks. You will now have the ability to specifically configure NodePort IP addresses to solve this problem. (This functionality may be delayed until Kubernetes 1.11.)

Kubernetes APIs (SIG-API-machinery)

API Aggregation (stable): Kubernetes makes it possible to extend its API by creating your own functionality and registering your functions so that they can be served alongside the core K8s functionality. This capability will be upgraded to “stable” in Kubernetes 1.10, so you can use it in production. Additionally, SIG-CLI is adding a feature called kubectl get and describe should work well with extensions (alpha) to make the server, rather than the client, return this information for a smoother user experience.
Support for self-hosting authorizer webhook (alpha): Earlier versions of Kubernetes brought us the authorizer webhooks, which make it possible to customize the enforcement of permissions before commands are executed. Those webhooks, however, have to live somewhere, and this new feature makes it possible to host them in the cluster itself.

Storage (SIG-Storage)

Detailed storage metrics of internal state (stable): With a distributed system such as Kubernetes, it’s particularly important to know what’s going on inside the system at any given time, either for troubleshooting purposes or simply for automation. This release brings to general availability detailed metrics of what’s going in inside the storage systems, including metrics such as mount and unmount time, number of volumes in a particular state, and number of orphaned pod directories. You can find a full list in this design document.
Mount namespace propagation (beta): This feature allows a container to mount a volume as rslave so that host mounts can be seen inside the container, or as rshared so that any mounts from inside the container are reflected in the host’s mount namespace. The default for this feature is rslave.
Local Ephemeral Storage Capacity Isolation (beta): Without this feature in place, every pod on a node that is using ephemeral storage is pulling from the same pool, and allocating storage is on a “best-effort” basis; in other words, a pod never knows for sure how much space it has available. This function provides the ability for a pod to reserve its own storage.
Out-of-tree CSI Volume Plugins (beta): Kubernetes 1.9 announced the release of the Container Storage Interface, which provides a standard way for vendors to provide storage to Kubernetes. This function makes it possible for them to create drivers that live “out-of-tree”, or out of the normal Kubernetes core. This means that vendors can control their own plugins and don’t have to rely on the community for code reviews and approvals.
Local Persistent Storage (beta): This feature enables PersistentVolumes to be created with locally attached disks, and not just network volumes.
Prevent deletion of Persistent Volume Claims that are used by a pod (beta) and 7. Prevent deletion of Persistent Volume that is bound to a Persistent Volume Claim (beta): In previous versions of Kubernetes it was possible to delete storage that is in use by a pod, causing massive problems for the pod. These features provide validation that prevents that from happening.
Running out of storage space on your Persistent Volume? If you are, you can use Add support for online resizing of PVs (alpha) to enlarge the underlying volume it without disrupting existing data. This also works in conjunction with the new Add resize support for FlexVolume (alpha); FlexVolumes are vendor-supported volumes implemented through FlexVolume plugins.
Topology Aware Volume Scheduling (beta): This feature enables you to specify topology constraints on PersistentVolumes and have those constraints evaluated by the scheduler. It also delays the initial PersistentVolumeClaim binding until the Pod has been scheduled so that the volume binding decision is smarter and considers all Pod scheduling constraints as well. At the moment, this feature is most useful for local persistent volumes, but support for dynamic provisioning is under development.

Node management (SIG-Node)

Dynamic Kubelet Configuration (beta): Kubernetes makes it easy to make changes to existing clusters, such as increasing the number of replicas or making a service available over the network. This feature makes it possible to change Kubernetes itself (or rather, the Kubelet that runs Kubernetes behind the scenes) without bringing down the node on which Kubelet is running.
CRI validation test suite (beta): The Container Runtime Interface (CRI) makes it possible to run containers other than Docker (such as Rkt containers or even virtual machines using Virtlet) on Kubernetes. This features provides a suite of validation tests to make certain that these CRI implementations are compliant, enabling developers to more easily find problems.
Configurable Pod Process Namespace Sharing (alpha): Although pods can easily share the Kubernetes namespace, the process, or PID namespace has been a more difficult issue due to lack of support in Docker. This feature enables you to set a parameter on the pod to determine whether containers get their own operating system processes or share a single process.
Add support for Windows Container Configuration in CRI (alpha): The Container Runtime Interface was originally designed with Linux-based containers in mind, and it was impossible to implement support for Windows-based containers using CRI. This feature solves that problem, making it possible to specify a WindowsContainerConfig.
Debug Containers (alpha): It’s easy to debug a container if you have the appropriate utilities. But what if you don’t? This feature makes it possible to run debugging tools on a container even if those tools weren’t included in the original container image.

Other changes:

Deployment (SIG-Cluster Lifecycle): Support out-of-process and out-of-tree cloud providers (beta): As Kubernetes gains acceptance, more and more cloud providers will want to make it available. To do that more easily, the community is working on extracting provider-specific binaries so that they can be more easily replaced.
Kubernetes on Azure (SIG-Azure): Kubernetes has a cluster-autoscaler that automatically adds nodes to your cluster if you’re running too many workloads, but until now it wasn’t available on Azure. The Add Azure support to cluster-autoscaler (alpha) feature aims to fix that. Closely related, the Add support for Azure Virtual Machine Scale Sets (alpha) feature makes use of Azure’s own autoscaling capabilities to make resources available. You can download the Kubernetes 1.10 beta from . Again, if you’ve got feedback (and the community hopes you do) please add an issue to the 1.10 milestone and tag the relevant SIG before March 9.
_
(Many thanks to community members Michelle Au, Jan Šafránek, Eric Chiang, Michał Nasiadka, Radosław Pieczonka, Xing Yang, Daniel Smith, sylvain boily, Leo Sunmo, Michal Masłowski, Fernando Ripoll, ayodele abejide, Brett Kochendorfer, Andrew Randall, Casey Davenport, Duffie Cooley, Bryan Venteicher, Mark Ayers, Christopher Luciano, and Sandor Szuecs for their invaluable help in reviewing this article for accuracy.)_

↧

Blog: Apache Spark 2.3 with Native Kubernetes Support

March 5, 2018, 4:00 pm

≫ Next: Blog: How to Integrate RollingUpdate Strategy for TPR in Kubernetes

≪ Previous: Blog: Kubernetes: First Beta Version of Kubernetes 1.10 is Here

Kubernetes and Big Data

The open source community has been working over the past year to enable first-class support for data processing, data analytics and machine learning workloads in Kubernetes. New extensibility features in Kubernetes, such as custom resources and custom controllers, can be used to create deep integrations with individual applications and frameworks.

Traditionally, data processing workloads have been run in dedicated setups like the YARN/Hadoop stack. However, unifying the control plane for all workloads on Kubernetes simplifies cluster management and can improve resource utilization.

“Bloomberg has invested heavily in machine learning and NLP to give our clients a competitive edge when it comes to the news and financial information that powers their investment decisions. By building our Data Science Platform on top of Kubernetes, we’re making state-of-the-art data science tools like Spark, TensorFlow, and our sizable GPU footprint accessible to the company’s 5,000+ software engineers in a consistent, easy-to-use way.” - Steven Bower, Team Lead, Search and Data Science Infrastructure at Bloomberg

Introducing Apache Spark + Kubernetes

Apache Spark 2.3 with native Kubernetes support combines the best of the two prominent open source projects — Apache Spark, a framework for large-scale data processing; and Kubernetes.

Apache Spark is an essential tool for data scientists, offering a robust platform for a variety of applications ranging from large scale data transformation to analytics to machine learning. Data scientists are adopting containers en masse to improve their workflows by realizing benefits such as packaging of dependencies and creating reproducible artifacts. Given that Kubernetes is the de facto standard for managing containerized environments, it is a natural fit to have support for Kubernetes APIs within Spark.

Starting with Spark 2.3, users can run Spark workloads in an existing Kubernetes 1.7+ cluster and take advantage of Apache Spark’s ability to manage distributed data processing tasks. Apache Spark workloads can make direct use of Kubernetes clusters for multi-tenancy and sharing through Namespaces and Quotas, as well as administrative features such as Pluggable Authorization and Logging. Best of all, it requires no changes or new installations on your Kubernetes cluster; simply create a container image and set up the right RBAC roles for your Spark Application and you’re all set.

Concretely, a native Spark Application in Kubernetes acts as a custom controller, which creates Kubernetes resources in response to requests made by the Spark scheduler. In contrast with deploying Apache Spark in Standalone Mode in Kubernetes, the native approach offers fine-grained management of Spark Applications, improved elasticity, and seamless integration with logging and monitoring solutions. The community is also exploring advanced use cases such as managing streaming workloads and leveraging service meshes like Istio.

To try this yourself on a Kubernetes cluster, simply download the binaries for the official Apache Spark 2.3 release. For example, below, we describe running a simple Spark application to compute the mathematical constant Pi across three Spark executors, each running in a separate pod. Please note that this requires a cluster running Kubernetes 1.7 or above, a kubectl client that is configured to access it, and the necessary RBAC rules for the default namespace and service account.

$ kubectl cluster-info  

Kubernetes master is running at https://xx.yy.zz.ww

$ bin/spark-submit

   --master k8s://https://xx.yy.zz.ww

   --deploy-mode cluster

   --name spark-pi

   --class org.apache.spark.examples.SparkPi

   --conf spark.executor.instances=5

   --conf spark.kubernetes.container.image=

   --conf spark.kubernetes.driver.pod.name=spark-pi-driver

   local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar

To watch Spark resources that are created on the cluster, you can use the following kubectl command in a separate terminal window.

$ kubectl get pods -l 'spark-role in (driver, executor)' -w

NAME              READY STATUS  RESTARTS AGE

spark-pi-driver   1/1 Running  0 14s

spark-pi-da1968a859653d6bab93f8e6503935f2-exec-1   0/1 Pending 0 0s

The results can be streamed during job execution by running:


$ kubectl logs -f spark-pi-driver

When the application completes, you should see the computed value of Pi in the driver logs.

In Spark 2.3, we’re starting with support for Spark applications written in Java and Scala with support for resource localization from a variety of data sources including HTTP, GCS, HDFS, and more. We have also paid close attention to failure and recovery semantics for Spark executors to provide a strong foundation to build upon in the future. Get started with the open-source documentation today.

Get Involved

There’s lots of exciting work to be done in the near future. We’re actively working on features such as dynamic resource allocation, in-cluster staging of dependencies, support for PySpark & SparkR, support for Kerberized HDFS clusters, as well as client-mode and popular notebooks’ interactive execution environments. For people who fell in love with the Kubernetes way of managing applications declaratively, we’ve also been working on a Kubernetes Operator for spark-submit, which allows users to declaratively specify and submit Spark Applications.

And we’re just getting started! We would love for you to get involved and help us evolve the project further.

Huge thanks to the Apache Spark and Kubernetes contributors spread across multiple organizations who spent many hundreds of hours working on this effort. We look forward to seeing more of you contribute to the project and help it evolve further.

Anirudh Ramanathan and Palak Bhatia
Google

↧

Blog: How to Integrate RollingUpdate Strategy for TPR in Kubernetes

March 12, 2018, 5:00 pm

≫ Next: Blog: Expanding User Support with Office Hours

≪ Previous: Blog: Apache Spark 2.3 with Native Kubernetes Support

With Kubernetes, it’s easy to manage and scale stateless applications like web apps and API services right out of the box. To date, almost all of the talks about Kubernetes has been about microservices and stateless applications.

With the popularity of container-based microservice architectures, there is a strong need to deploy and manage RDBMS(Relational Database Management Systems). RDBMS requires experienced database-specific knowledge to correctly scale, upgrade, and re-configure while protecting against data loss or unavailability.

For example, MySQL (the most popular open source RDBMS) needs to store data in files that are persistent and exclusive to each MySQL database’s storage. Each MySQL database needs to be individually distinct, another, more complex is in cluster that need to distinguish one MySQL database from a cluster as a different role, such as master, slave, or shard. High availability and zero data loss are also hard to accomplish when replacing database nodes on failed machines.

Using powerful Kubernetes API extension mechanisms, we can encode RDBMS domain knowledge into software, named WQ-RDS, running atop Kubernetes like built-in resources.

WQ-RDS leverages Kubernetes primitive resources and controllers, it deliveries a number of enterprise-grade features and brings a significantly reliable way to automate time-consuming operational tasks like database setup, patching backups, and setting up high availability clusters. WQ-RDS supports mainstream versions of Oracle and MySQL (both compatible with MariaDB).

Let’s demonstrate how to manage a MySQL sharding cluster.

MySQL Sharding Cluster

MySQL Sharding Cluster is a scale-out database architecture. Based on the hash algorithm, the architecture distributes data across all the shards of the cluster. Sharding is entirely transparent to clients: Proxy is able to connect to any Shards in the cluster and issue queries to the correct shards directly.

| —– | | | |

Note: Each shard corresponds to a single MySQL instance. Currently, WQ-RDS supports a maximum of 64 shards.

All of the shards are built with Kubernetes Statefulset, Services, Storage Class, configmap, secrets and MySQL. WQ-RDS manages the entire lifecycle of the sharding cluster. Advantages of the sharding cluster are obvious:

Scale out queries per second (QPS) and transactions per second (TPS)
Scale out storage capacity: gain more storage by distributing data to multiple nodes

Create a MySQL Sharding Cluster

Let’s create a Kubernetes cluster with 8 shards.

 kubectl create -f mysqlshardingcluster.yaml

Next, create a MySQL Sharding Cluster including 8 shards.

TPR : MysqlCluster and MysqlDatabase

[root@k8s-master ~]# kubectl get mysqlcluster  


NAME             KIND

clustershard-c   MysqlCluster.v1.mysql.orain.com

MysqlDatabase from clustershard-c0 to clustershard-c7 belongs to MysqlCluster clustershard-c.

[root@k8s-master ~]# kubectl get mysqldatabase  

NAME KIND  

clustershard-c0 MysqlDatabase.v1.mysql.orain.com  

clustershard-c1 MysqlDatabase.v1.mysql.orain.com  

clustershard-c2 MysqlDatabase.v1.mysql.orain.com  

clustershard-c3 MysqlDatabase.v1.mysql.orain.com  

clustershard-c4 MysqlDatabase.v1.mysql.orain.com  

clustershard-c5 MysqlDatabase.v1.mysql.orain.com  

clustershard-c6 MysqlDatabase.v1.mysql.orain.com  

clustershard-c7 MysqlDatabase.v1.mysql.orain.com

Next, let’s look at two main features: high availability and RollingUpdate strategy.

To demonstrate, we’ll start by running sysbench to generate some load on the cluster. In this example, QPS metrics are generated by MySQL export, collected by Prometheus, and visualized in Grafana.

Feature: high availability

WQ-RDS handles MySQL instance crashes while protecting against data loss.

When killing clustershard-c0, WQ-RDS will detect that clustershard-c0 is unavailable and replace clustershard-c0 on failed machine, taking about 35 seconds on average.

zero data loss at same time.

Feature : RollingUpdate Strategy

MySQL Sharding Cluster brings us not only strong scalability but also some level of maintenance complexity. For example, when updating a MySQL configuration like innodb_buffer_pool_size, a DBA has to perform a number of steps:

1. Apply change time.
2. Disable client access to database proxies.
3. Start a rolling upgrade.

Rolling upgrades need to proceed in order and are the most demanding step of the process. One cannot continue a rolling upgrade until and unless previous updates to MySQL instances are running and ready.

4 Verify the cluster.
5. Enable client access to database proxies.

Possible problems with a rolling upgrade include:

node reboot
MySQL instances restart
human error Instead, WQ-RDS enables a DBA to perform rolling upgrades automatically.

StatefulSet RollingUpdate in Kubernetes

Kubernetes 1.7 includes a major feature that adds automated updates to StatefulSets and supports a range of update strategies including rolling updates.

Note: For more information about StatefulSet RollingUpdate, see the Kubernetes docs.

Because TPR (currently CRD) does not support the rolling upgrade strategy, we needed to integrate the RollingUpdate strategy into WQ-RDS. Fortunately, the Kubernetes repo is a treasure for learning. In the process of implementation, there are some points to share:

**MySQL Sharding Cluster has **changed: Each StatefulSet has its corresponding ControllerRevision, which records all the revision data and order (like git). Whenever StatefulSet is syncing, StatefulSet Controller will firstly compare it’s spec to the latest corresponding ControllerRevision data (similar to git diff). If changed, a new ControllerrRevision will be generated, and the revision number will be incremented by 1. WQ-RDS borrows the process, MySQL Sharding Cluster object will record all the revision and order in ControllerRevision.
**How to initialize MySQL Sharding Cluster to meet request **replicas: Statefulset supports two Pod management policies: Parallel and OrderedReady. Because MySQL Sharding Cluster doesn’t require ordered creation for its initial processes, we use the Parallel policy to accelerate the initialization of the cluster.
**How to perform a Rolling **Upgrade: Statefulset recreates pods in strictly decreasing order. The difference is that WQ-RDS updates shards instead of recreating them, as shown below:
When RollingUpdate ends: Kubernetes signals termination clearly. A rolling update completes when all of a set’s Pods have been updated to the updateRevision. The status’s currentRevision is set to updateRevision and its updateRevision is set to the empty string. The status’s currentReplicas is set to updateReplicas and its updateReplicas are set to 0.

Controller revision in WQ-RDS

Revision information is stored in MysqlCluster.Status and is no different than Statefulset.Status.


root@k8s-master ~]# kubectl get mysqlcluster -o yaml clustershard-c

apiVersion: v1

items:

\- apiVersion: mysql.orain.com/v1

 kind: MysqlCluster

 metadata:

   creationTimestamp: 2017-10-20T08:19:41Z

   labels:

     AppName: clustershard-crm

     Createdby: orain.com

     DBType: MySQL

   name: clustershard-c

   namespace: default

   resourceVersion: "415852"

   selfLink: /apis/mysql.orain.com/v1/namespaces/default/mysqlclusters/clustershard-c

   uid: 6bb089bb-b56f-11e7-ae02-525400e717a6

 spec:



     dbresourcespec:

       limitedcpu: 1200m

       limitedmemory: 400Mi

       requestcpu: 1000m

       requestmemory: 400Mi



 status:

   currentReplicas: 8

   currentRevision: clustershard-c-648d878965

   replicas: 8

   updateRevision: clustershard-c-648d878965

kind: List

Example: Perform a rolling upgrade

Finally, We can now update “clustershard-c” to update configuration “innodb_buffer_pool_size” from 6GB to 7GB and reboot.

The process takes 480 seconds.

The upgrade is in monotonically decreasing manner:

Conclusion

RollingUpgrade is meaningful to database administrators. It provides a more effective way to operator database.

--Orain Xiong, co-founder, Woqutech

↧

What We’ll Accomplish

Prerequisites

Set Up a Cluster

Deploy an Application to Kubernetes

Connect to Google Cloud

Add Cluster

Optional: Use an Alternative Cluster

Deploy Static Image to Kubernetes

Build and Deploy an Image

Creating the Pull Secret

Get the image name

Deploy the private image to Kubernetes

Automate Deployment to Kubernetes

Conclusions

Addendums

containerd

cri-containerd

Architecture

Status

Try it Out

Next Steps

Contribute

Community

Example application: PaymentProcessor

Visibility and governance over the PaymentProcessor Code

Summary

PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes

Introduction

What is BPF?

BPF Superpowers

The ‘e’ in eBPF

Existing Use Cases of eBPF with Kubernetes

Dynamic Network Control and Visibility with Cilium

Tracking TCP Connections in Weave Scope

Limiting syscalls with seccomp-bpf

Potential Use Cases for eBPF with Kubernetes

Pod and container level network statistics

Application-applied LSM

Use Case: Landlock in Kubernetes-based serverless frameworks

Auditing kubectl-exec with eBPF

Learn more about eBPF

Conclusion

Workloads API GA

Windows Support (beta)

Storage Enhancements

Additional Features

Availability

Release team

Project Velocity

User highlights

Ecosystem updates

KubeCon

Webinar

Get involved:

Kubernetes and Machine Learning

Introducing Kubeflow

Using Kubeflow

Kubeflow + ksonnet

What’s Next?

Features

What’s coming next

Get Involved

Why Kubernetes CSI?

What is CSI?

How do I deploy a CSI driver on a Kubernetes Cluster?

How do I use a CSI Volume?

Dynamic Provisioning

Pre-Provisioned Volumes

Attaching and Mounting

How do I create a CSI driver?

Where can I find CSI drivers?

What about Flex?

What will happen to the in-tree volume plugins?

What are the limitations of alpha?

What’s next?

How Do I Get Involved?

What is Admission?

Mutation

Validation

What are admission webhooks?