From 92b079f17ec1e98a9e11ef77c3a7082a81d7b483 Mon Sep 17 00:00:00 2001 From: Joost van der Griendt Date: Fri, 2 Feb 2018 23:12:16 +0100 Subject: [PATCH 01/10] Update graceful shutdown with k8s. Add new file on MKDocs usage. Signed-off-by: Joost van der Griendt --- docs/docker/graceful-shutdown.md | 96 ++++++++++++++++++++- docs/other/mkdocs.md | 139 +++++++++++++++++++++++++++++++ mkdocs.yml | 3 + 3 files changed, 237 insertions(+), 1 deletion(-) create mode 100644 docs/other/mkdocs.md diff --git a/docs/docker/graceful-shutdown.md b/docs/docker/graceful-shutdown.md index 71b70ab..2c81134 100644 --- a/docs/docker/graceful-shutdown.md +++ b/docs/docker/graceful-shutdown.md @@ -211,7 +211,7 @@ func main() { } ``` -### Java plain +### Java plain (Docker Swarm) This application is a Java 9 modular application, which can be found on github, [github.com/joostvdg](https://github.com/joostvdg/buming). @@ -310,6 +310,100 @@ public class DockerApp { } ``` +### Java Plain (Kubernetes) + +So far we've utilized the utilities from Docker itself in conjunction with it's native Docker Swarm orchestrator. + +Unfortunately, when it comes to popularity [Kubernetes beats Swarm hands down](https://platform9.com/blog/kubernetes-docker-swarm-compared/). + +So this isn't complete if it doesn't also do graceful shutdown in Kubernetes. + +#### In Dockerfile + +Our original file had to be changed, as Debian's Slim image doesn't actually contain the kill package. +And we need a kill package, as we cannot instruct Kubernetes to issue a specific SIGNAL. +Instead, we can issue a [PreStop exec command](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/), which we can utilise to execute a [killall](https://packages.debian.org/wheezy/psmisc) java [-INT](https://www.tecmint.com/how-to-kill-a-process-in-linux/). + +The command will be specified in the Kubernetes deployment definition below. + +```dockerfile +FROM openjdk:9-jdk AS build + +RUN mkdir -p /usr/src/mods/jars +RUN mkdir -p /usr/src/mods/compiled + +COPY . /usr/src +WORKDIR /usr/src + +RUN javac -Xlint:unchecked -d /usr/src/mods/compiled --module-source-path /usr/src/src $(find src -name "*.java") +RUN jar --create --file /usr/src/mods/jars/joostvdg.dui.logging.jar --module-version 1.0 -C /usr/src/mods/compiled/joostvdg.dui.logging . +RUN jar --create --file /usr/src/mods/jars/joostvdg.dui.api.jar --module-version 1.0 -C /usr/src/mods/compiled/joostvdg.dui.api . +RUN jar --create --file /usr/src/mods/jars/joostvdg.dui.client.jar --module-version 1.0 -C /usr/src/mods/compiled/joostvdg.dui.client . +RUN jar --create --file /usr/src/mods/jars/joostvdg.dui.server.jar --module-version 1.0 -e com.github.joostvdg.dui.server.cli.DockerApp\ + -C /usr/src/mods/compiled/joostvdg.dui.server . + +RUN rm -rf /usr/bin/dui-image +RUN jlink --module-path /usr/src/mods/jars/:/${JAVA_HOME}/jmods \ + --add-modules joostvdg.dui.api \ + --add-modules joostvdg.dui.logging \ + --add-modules joostvdg.dui.server \ + --add-modules joostvdg.dui.client \ + --launcher dui=joostvdg.dui.server \ + --output /usr/bin/dui-image + +RUN ls -lath /usr/bin/dui-image +RUN ls -lath /usr/bin/dui-image +RUN /usr/bin/dui-image/bin/java --list-modules + +FROM debian:stable-slim +LABEL authors="Joost van der Griendt " +LABEL version="0.1.0" +LABEL description="Docker image for playing with java applications in a concurrent, parallel and distributed manor." +# Add Tini - it is already included: https://docs.docker.com/engine/reference/commandline/run/ +ENV TINI_VERSION v0.16.1 +ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini +RUN chmod +x /tini +ENTRYPOINT ["/tini", "-vv","-g", "--", "/usr/bin/dui/bin/dui"] +ENV DATE_CHANGED="20180120-1525" +RUN apt-get update && apt-get install --no-install-recommends -y psmisc=22.* && rm -rf /var/lib/apt/lists/* +COPY --from=build /usr/bin/dui-image/ /usr/bin/dui +RUN /usr/bin/dui/bin/java --list-modules +``` + +#### Kubernetes Deployment + +So here we have the image's K8s [Deployment]() descriptor. + +Including the Pod's [lifecycle]() ```preStop``` with a exec style command. You should know by now [why we prefer that](http://www.johnzaccone.io/entrypoint-vs-cmd-back-to-basics/). + +```yaml +apiVersion: extensions/v1beta1 +kind: Deployment +metadata: + name: dui-deployment + namespace: default + labels: + k8s-app: dui +spec: + replicas: 3 + template: + metadata: + labels: + k8s-app: dui + spec: + containers: + - name: master + image: caladreas/buming + ports: + - name: http + containerPort: 7777 + lifecycle: + preStop: + exec: + command: ["killall", "java" , "-INT"] + terminationGracePeriodSeconds: 60 +``` + ### Java Spring Boot (1.x) This example is for Spring Boot 1.x, in time we will have an example for 2.x. diff --git a/docs/other/mkdocs.md b/docs/other/mkdocs.md new file mode 100644 index 0000000..fea6b90 --- /dev/null +++ b/docs/other/mkdocs.md @@ -0,0 +1,139 @@ +# MKDocs + +This website is build using the following: + +* [MKDocs](http://www.mkdocs.org/) a python tool for building static websites from [MarkDown](https://en.wikipedia.org/wiki/Markdown) files +* [MK Material](https://squidfunk.github.io/mkdocs-material/) expansion/theme of MK Docs that makes it a responsive website with Google's Material theme + +## Add information to the docs + +MKDocs can be a bit daunting to use, especially when extended with ```MKDocs Material``` and [PyMdown Extensions](https://facelessuser.github.io/pymdown-extensions/). + +There are two parts to the site: 1) the markdown files, they're in ```docs/``` and 2) the site listing (mkdocs.yml) and automation scripts, these can be found in ```docs-scripts/```. + +### Extends current page + +To extend a current page, simply write the MarkDown as you're used to. + +For the specific extensions offered by PyMX and Material, checkout the following pages: + +* [MKDocs Material Getting Started Guide](https://squidfunk.github.io/mkdocs-material/getting-started/) +* [MKDocs Extensions](https://squidfunk.github.io/mkdocs-material/extensions/admonition/) +* [PyMdown Extensions Usage Guide](https://squidfunk.github.io/mkdocs-material/extensions/pymdown/) + +### Add a new page + +In the ```docs-scripts/mkdocs.yml``` you will find the site structure under the yml item of ```pages```. + +```yml +pages: +- Home: index.md +- Other Root Page: some-page.md +- Root with children: + - ChildOne: root2/child1.md + - ChildTwo: root2/child2.md +``` + +### Things to know + +* All .md files that are listed in the ```pages``` will be translated to an HTML file and dubbed {OriginalFileName}.html +* Naming a file index.md will allow you to refer to it by path without the file name + * we can refer to root2 simply by ```site/root2``` and can omit the index. + ```yml + - Root: index.md + - Root2: root2/index.html + ``` + +## Build the site locally + +As it is a Python tool, you can easily build it with Python (2.7 is recommended). + +The requirements are captured in a [pip](https://pip.pypa.io/en/stable/) install scripts: ```docs-scripts/install.sh``` where the dependencies are in [Pip's requirements.txt](https://pip.pypa.io/en/stable/user_guide/#requirements-files). + +Once that is done, you can do the following: + +```bash +mkdocs build --clean +``` + +Which will generate the site into ```docs-scripts/site``` where you can simply open the index.html with a browser - it is a static site. + +For docker, you can use the ```*.sh``` scripts, or simply ```run.sh``` to kick of the entire build. + +## Jenkins build + +### Declarative format + +```json +pipeline { + agent none + options { + timeout(time: 10, unit: 'MINUTES') + timestamps() + buildDiscarder(logRotator(numToKeepStr: '5')) + } + stages { + stage('Prepare'){ + agent { label 'docker' } + steps { + deleteDir() + } + } + stage('Checkout'){ + agent { label 'docker' } + steps { + checkout scm + script { + env.GIT_COMMIT_HASH = sh returnStdout: true, script: 'git rev-parse --verify HEAD' + } + } + } + stage('Build Docs') { + agent { + docker { + image "caladreas/mkdocs-docker-build-container" + label "docker" + } + } + steps { + sh 'cd docs-scripts && mkdocs build' + } + } + stage('Prepare Docker Image'){ + agent { label 'docker' } + environment { + DOCKER_CRED = credentials('ldap') + } + steps { + parallel ( + TestDockerfile: { + script { + def lintResult = sh returnStdout: true, script: 'cd docs-scripts && docker run --rm -i lukasmartinelli/hadolint < Dockerfile' + if (lintResult.trim() == '') { + println 'Lint finished with no errors' + } else { + println 'Error found in Lint' + println "${lintResult}" + currentBuild.result = 'UNSTABLE' + } + } + }, // end test dockerfile + BuildImage: { + sh 'chmod +x docs-scripts/build.sh' + sh 'cd docs-scripts && ./build.sh' + }, + login: { + sh "docker login -u ${DOCKER_CRED_USR} -p ${DOCKER_CRED_PSW} registry" + } + ) + } + post { + success { + sh 'chmod +x push.sh' + sh './push.sh' + } + } + } + } +} +``` \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 7407dd8..6dc5ab4 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -40,6 +40,9 @@ pages: - Paradigms: productivity/paradigms.md - Studies: productivity/studies.md +- Other: + - MKDocs (Static Website Generator): other/mkdocs.md + - Jenkins: jenkins/index.md - Jenkins Job Management: - JobDSL: jenkins-jobs/jobdsl.md From 808a3cc9afaed46564a75d96c55cb6e040b84aa8 Mon Sep 17 00:00:00 2001 From: Joost van der Griendt Date: Sat, 10 Mar 2018 14:42:28 +0100 Subject: [PATCH 02/10] Updating some pages --- docs/devops/index.md | 26 ++++++++++++++++++++++++++ docs/docker/graceful-shutdown.md | 3 ++- docs/docker/kubernetes.md | 18 +++++++++++++++++- docs/java/java9.md | 6 +++++- docs/jenkins/java-gradle.md | 0 docs/jenkins/plugins.md | 0 docs/productivity/remote.md | 5 +++++ docs/swe/naming.md | 7 ++++++- docs/swe/observability.md | 7 +++++++ mkdocs.yml | 1 + 10 files changed, 69 insertions(+), 4 deletions(-) create mode 100644 docs/devops/index.md create mode 100644 docs/jenkins/java-gradle.md create mode 100644 docs/jenkins/plugins.md create mode 100644 docs/productivity/remote.md create mode 100644 docs/swe/observability.md diff --git a/docs/devops/index.md b/docs/devops/index.md new file mode 100644 index 0000000..ccf6ab6 --- /dev/null +++ b/docs/devops/index.md @@ -0,0 +1,26 @@ +# DevOps Assessment + +How can I assess an organisation for what to do next. + +## Questions + +* To which extent can your development teams request/create an environment on their own, without going through lengthy approval processes? +* To which extent can your development teams use pre-configured/ template tool sets (e.g. Jenkins master jobs, master POM etc) which they can extend and/or modify to their needs? +* To which extent can your developments teams deploy to any environment (including production)? If not, what do they lack: knowledge or passwords to higher environments? +* Does your system of record provide you tractability from idea to production? +* How tightly coupled are your key delivery pipeline tools? + * Is it easy to replace them? +* Do you have different release management activities based on application blocks? + * Who is keeping it up-to-date? + +## Maturity Model + +## Resources + +* https://www.devon.nl/continuous-delivery-at-enterprise-level-pitfall-1/ +* https://www.devon.nl/continuous-delivery-at-enterprise-level-pitfall-2/ +* https://devops-research.com/ + +## References + + diff --git a/docs/docker/graceful-shutdown.md b/docs/docker/graceful-shutdown.md index 2c81134..6ba940d 100644 --- a/docs/docker/graceful-shutdown.md +++ b/docs/docker/graceful-shutdown.md @@ -750,4 +750,5 @@ buming_dui.0.pnoui2x6elrz@dui-2 | [Server-Ken Thompson] [INFO] [14:19:0 [^7]: [Grigorii Chudnov blog on Trapping Docker Signals](https://medium.com/@gchudnov/trapping-signals-in-docker-containers-7a57fdda7d86) [^8]: [Andy Wilkinson (from pivotal) explaining Spring Boot shutdown hook for Tomcat](https://github.com/spring-projects/spring-boot/issues/4657#issuecomment-161354811) [^9]: [Docker Swarm issue with multicast](https://github.com/docker/swarm/issues/1691) -[^10]: [Docker network library issue with multicast](https://github.com/docker/libnetwork/issues/552) \ No newline at end of file +[^10]: [Docker network library issue with multicast](https://github.com/docker/libnetwork/issues/552) +[^11]: [Excellent article on JVM details inside Containers](https://jaxenter.com/nobody-puts-java-container-139373.html) \ No newline at end of file diff --git a/docs/docker/kubernetes.md b/docs/docker/kubernetes.md index ddf04c9..8023955 100644 --- a/docs/docker/kubernetes.md +++ b/docs/docker/kubernetes.md @@ -1 +1,17 @@ -# Kubernetes \ No newline at end of file +# Kubernetes + +## Kubernetes terminology + + + +## Kubernetes model + +## Resources + +* https://github.com/weaveworks/scope +* https://github.com/hjacobs/kube-ops-view +* https://coreos.com/tectonic/docs/latest/tutorials/sandbox/install.html +* https://github.com/kubernetes/dashboard +* https://blog.alexellis.io/you-need-to-know-kubernetes-and-swarm/ +* https://kubernetes.io/docs/reference/kubectl/cheatsheet/ +* https://blog.heptio.com/core-kubernetes-jazz-improv-over-orchestration-a7903ea92ca \ No newline at end of file diff --git a/docs/java/java9.md b/docs/java/java9.md index 65c1523..364c886 100644 --- a/docs/java/java9.md +++ b/docs/java/java9.md @@ -1 +1,5 @@ -# Java 9 \ No newline at end of file +# Java 9 + +* https://jaxenter.com/maven-on-java-9-things-you-need-to-know-140985.html +* https://blog.codefx.org/java/five-command-line-options-to-hack-the-java-9-module-system/ +* \ No newline at end of file diff --git a/docs/jenkins/java-gradle.md b/docs/jenkins/java-gradle.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/jenkins/plugins.md b/docs/jenkins/plugins.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/productivity/remote.md b/docs/productivity/remote.md new file mode 100644 index 0000000..dfc2011 --- /dev/null +++ b/docs/productivity/remote.md @@ -0,0 +1,5 @@ +# Working Remote + +## Resources + +* https://open.nytimes.com/how-to-grow-as-an-engineer-working-remotely-3baff8211f3e \ No newline at end of file diff --git a/docs/swe/naming.md b/docs/swe/naming.md index f7a2464..af72901 100644 --- a/docs/swe/naming.md +++ b/docs/swe/naming.md @@ -1 +1,6 @@ -# On Naming \ No newline at end of file +# On Naming + +## Resources + +* https://www.slideshare.net/pirhilton/naming-guidelines-for-professional-programmers +* http://www.yourdictionary.com/diction4.html \ No newline at end of file diff --git a/docs/swe/observability.md b/docs/swe/observability.md new file mode 100644 index 0000000..1515db6 --- /dev/null +++ b/docs/swe/observability.md @@ -0,0 +1,7 @@ +# Observability + +## Resources + +* https://www.vividcortex.com/blog/monitoring-isnt-observability +* https://medium.com/@copyconstruct/monitoring-and-observability-8417d1952e1c +* https://codeascraft.com/2011/02/15/measure-anything-measure-everything/ \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 6dc5ab4..9309d02 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -31,6 +31,7 @@ pages: - Domain Driven Design: swe/ddd.md - Microservices: swe/microservices.md - Algorithms: swe/algorithms.md + - Observability: swe/observability.md - Others: swe/others.md - Productivity: From 25651f8b10d509d15e695dc3e3637797c1100d11 Mon Sep 17 00:00:00 2001 From: Joost van der Griendt Date: Sat, 10 Mar 2018 14:44:46 +0100 Subject: [PATCH 03/10] Updating productivity page. Only resources so far. --- docs/productivity/index.md | 48 +++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/docs/productivity/index.md b/docs/productivity/index.md index 39fefb1..e353339 100644 --- a/docs/productivity/index.md +++ b/docs/productivity/index.md @@ -1,7 +1,53 @@ # Developer Productivity + ## The Balancing act between centralized and decentralized +## How do you measure productivity + +## On Multitasking + +## Learning from Lean/Toyota + +## Human Psychology + +## Conway's Law + +> "organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations." - M. Conway [^1] + + ### Further reading -* [Blog on Twitter's Engineering Efficiency](http://www.gigamonkeys.com/flowers/) \ No newline at end of file +### Articles + +* [Blog on Twitter's Engineering Efficiency](http://www.gigamonkeys.com/flowers/) +* [Why Companies should have a Heroku platform for their developers](https://medium.com/@anubhavmishra/why-all-companies-should-have-a-heroku-like-platform-for-their-developers-ee96a6fc6bc0) +* [Multitasking is bad for your health](http://time.com/4737286/multitasking-mental-health-stress-texting-depression/) +* [Microsoft research on Developer's perception of productivity](https://www.microsoft.com/en-us/research/publication/software-developers-perceptions-of-productivity/) +* [Developer Productivity Struggles](https://www.mongodb.com/blog/post/stack-overflow-and-mongodb-research-unveils-developer-productivity-struggles) +* [You cannot measure productivity](https://martinfowler.com/bliki/CannotMeasureProductivity.html) +* [The Productivity Paradox](https://en.wikipedia.org/wiki/Productivity_paradox) +* [There is no Productivity Paradox: it lags behind investments](https://cs.stanford.edu/people/eroberts/cs201/projects/productivity-paradox/lag.html) +* [Economist: solving the paradox](https://www.economist.com/node/375522) +* [The Myth Of Developer Productivity](https://dev9.com/blog-posts/2015/1/the-myth-of-developer-productivity) +* [Effectiveness vs. Efficiency](http://www.insightsquared.com/2013/08/effectiveness-vs-efficiency-whats-the-difference/) +* [Lean Manufactoring](https://en.wikipedia.org/wiki/Lean_manufacturing) +* [Theory of Constraints](https://en.wikipedia.org/wiki/Theory_of_constraints) +* [Thoughtworks: demystifying Conway's Law](https://www.thoughtworks.com/insights/blog/demystifying-conways-law) +* [John Allspaw: a mature role for automation](https://www.kitchensoap.com/2012/09/21/a-mature-role-for-automation-part-i/) +* [Research from DORA](https://devops-research.com/research.html) + +### Books + +* [The Goal](https://en.wikipedia.org/wiki/The_Goal_(novel)) +* [The Phoenix Project](https://www.amazon.com/Phoenix-Project-DevOps-Helping-Business/dp/0988262592) +* [Continuous Delivery](https://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912/ref=pd_sim_14_2?_encoding=UTF8&pd_rd_i=0321601912&pd_rd_r=ZD568BK3F8WG61C5M7EJ&pd_rd_w=X19Xz&pd_rd_wg=0nvDe&psc=1&refRID=ZD568BK3F8WG61C5M7EJ) +* [The Lean Startup](https://www.amazon.com/Lean-Startup-Entrepreneurs-Continuous-Innovation/dp/0307887898/ref=pd_sim_14_13?_encoding=UTF8&pd_rd_i=0307887898&pd_rd_r=SRT93T4D0PQ42EMVM0M3&pd_rd_w=YzhGz&pd_rd_wg=ADDPF&psc=1&refRID=SRT93T4D0PQ42EMVM0M3) +* [The Lean Enterprise](https://www.amazon.com/Lean-Enterprise-Performance-Organizations-Innovate/dp/1449368425/ref=pd_sim_14_8?_encoding=UTF8&pd_rd_i=1449368425&pd_rd_r=ZD568BK3F8WG61C5M7EJ&pd_rd_w=X19Xz&pd_rd_wg=0nvDe&psc=1&refRID=ZD568BK3F8WG61C5M7EJ) +* [DevOps Handbook](https://www.amazon.com/DevOps-Handbook-World-Class-Reliability-Organizations/dp/1942788002/ref=pd_bxgy_14_img_2?_encoding=UTF8&pd_rd_i=1942788002&pd_rd_r=SRT93T4D0PQ42EMVM0M3&pd_rd_w=DYLwa&pd_rd_wg=ADDPF&psc=1&refRID=SRT93T4D0PQ42EMVM0M3) +* [Thinking Fast and Slow](https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman-ebook/dp/B005MJFA2W/ref=sr_1_1?s=books&ie=UTF8&qid=1520193959&sr=1-1&keywords=thinking+fast+and+slow) +* [Sapiens](https://www.amazon.com/Sapiens-Humankind-Yuval-Noah-Harari-ebook/dp/B00K7ED54M/ref=pd_sim_351_2?_encoding=UTF8&psc=1&refRID=A1KNY9QCKWPQ94BA248Q) + +## References + +[^1]: [Conway's law in wikipedia](https://en.wikipedia.org/wiki/Conway%27s_law) From 7a073b24f367133b4b9c390e87c950fce2eb167d Mon Sep 17 00:00:00 2001 From: Joost van der Griendt Date: Wed, 1 Aug 2018 12:32:35 +0200 Subject: [PATCH 04/10] Writing down ideas for research --- docs/jenkins-pipeline/shared-library.md | 0 docs/productivity/index.md | 77 ++++++++++++++++++++++++- 2 files changed, 75 insertions(+), 2 deletions(-) create mode 100644 docs/jenkins-pipeline/shared-library.md diff --git a/docs/jenkins-pipeline/shared-library.md b/docs/jenkins-pipeline/shared-library.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/productivity/index.md b/docs/productivity/index.md index e353339..b19e63d 100644 --- a/docs/productivity/index.md +++ b/docs/productivity/index.md @@ -1,25 +1,91 @@ # Developer Productivity +## Commoditization -## The Balancing act between centralized and decentralized +> "The big change has been in the hardware/software cost ratio. The buyer of a $2 million machine in 1960 felt that he could afford $250,000 more ofr a customized payroll program, one that slipped easily and nondisruptively into the computer-hostile social environment. Buyers of %50,000 office machines today cannot conceivably afford customized payroll programs; so they adapt their paryoll procedures to the packages available." - [^2] F. Brooks - No Silver Bullet + +## Where should productivity be sought + +If you're looking to increase productivity, it would be best to answer some fundamental questions first. + +* What should we be productive in? +* What is productivity? +* How do you measure productivity? + +The first step is to determine, what you should be productive in. +If you're building software for example, it is in finding out what to build. + +> "The hardest single part of building a software system is deciding precisely what to build." - F. Brooks [^2] + +That is actually already one step to far, as you would need a reason to build a software system. +So the first step for any individual or organization (start up, or otherwise) is to find out what people want that you can offer. + +> "The fundamental activity of a startup is to turn ideas into products, measure how customers respond, and then learn whether to pivot or persevere. All successful startup processes should be geared to accelerate that feedback loop." - The Lean Startup [^3] ## How do you measure productivity +## Grow v.s. Build + + +## The Balancing act between centralized and decentralized + ## On Multitasking +* [Deep Work](https://www.amazon.com/Deep-Work-Focused-Success-Distracted/dp/1455586692) +* Attention Residue > [Why is it so hard to do my work](https://www.sciencedirect.com/science/article/pii/S0749597809000399) + ## Learning from Lean/Toyota -## Human Psychology +## Open Space Floor Plans + +http://rstb.royalsocietypublishing.org/content/373/1753/20170239 ## Conway's Law > "organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations." - M. Conway [^1] +## Undifferentiated Heavy Lifting + +> "Work that needs to get done, but having it done doesn't bring our customers any direct benefit." - [Dave Hahn](https://www.youtube.com/watch?v=UTKIT6STSVM) ### Further reading +### Others + +* https://www.youtube.com/watch?v=UTKIT6STSVM +* https://en.wikipedia.org/wiki/Complex_adaptive_system +* https://jobs.netflix.com/culture +* http://blackswanfarming.com/cost-of-delay/ +* https://www.rundeck.com/blog/tickets_make_operations_unnecessarily_miserable +* https://www.digitalocean.com/community/tutorials/what-is-immutable-infrastructure +* https://hbr.org/2015/12/what-the-research-tells-us-about-team-creativity-and-innovation +* https://www.thoughtworks.com/insights/blog/continuous-improvement-safe-environment +* https://qualitysafety.bmj.com/content/13/suppl_2/ii22 +* https://www.plutora.com/wp-content/uploads/dlm_uploads/2018/03/StateOfDevOpsTools_v14.pdf +* https://medium.com/@ATavgen/never-fail-twice-608147cb49b +* https://blogs.dropbox.com/dropbox/2018/07/study-high-performing-teams/?_tk=social&oqa=183tl01liov&linkId=100000003064606 +* http://psycnet.apa.org/record/1979-28632-001 +* https://pdfs.semanticscholar.org/a85d/432f44e43d61753bb8a121c246127b562a39.pdf +* https://medium.com/@dr_eprice/laziness-does-not-exist-3af27e312d01 +* https://en.wikipedia.org/wiki/Mindset#Fixed_and_growth +* http://www.reinventingorganizationswiki.com/Teal_Organizations +* https://www.mckinsey.com/business-functions/organization/our-insights/the-irrational-side-of-change-management +* https://www.barrypopik.com/index.php/new_york_city/entry/how_do_you_eat_an_elephant +* https://kadavy.net/blog/posts/mind-management-intro/ +* https://en.wikipedia.org/wiki/Planning_fallacy +* https://stories.lemonade.com/lemonade-proves-trust-pays-off-big-time-fdcf587af5a1 +* https://www.venturi-group.com/developer-to-cto/ +* https://dzone.com/articles/an-introduction-to-devops-principles +* https://www.thoughtworks.com/insights/blog/evolving-thoughtworks-internal-it-solve-broader-cross-cutting-problems +* https://www.thoughtworks.com/insights/blog/platform-tech-strategy-three-layers +* https://www.thoughtworks.com/insights/blog/why-it-departments-must-reinvent-themselves-part-1 +* https://en.wikipedia.org/wiki/Peter_principle +* https://hackernoon.com/why-all-engineers-must-understand-management-the-view-from-both-ladders-cc749ae14905 + ### Articles +* [Article on the state of Systems Languages](https://blog.usejournal.com/systems-languages-an-experience-report-d008b2b12628) +* [Article on SILO's](https://www.rundeck.com/blog/whats-a-silo-and-why-they-ruin-everything) * [Blog on Twitter's Engineering Efficiency](http://www.gigamonkeys.com/flowers/) * [Why Companies should have a Heroku platform for their developers](https://medium.com/@anubhavmishra/why-all-companies-should-have-a-heroku-like-platform-for-their-developers-ee96a6fc6bc0) * [Multitasking is bad for your health](http://time.com/4737286/multitasking-mental-health-stress-texting-depression/) @@ -48,6 +114,13 @@ * [Thinking Fast and Slow](https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman-ebook/dp/B005MJFA2W/ref=sr_1_1?s=books&ie=UTF8&qid=1520193959&sr=1-1&keywords=thinking+fast+and+slow) * [Sapiens](https://www.amazon.com/Sapiens-Humankind-Yuval-Noah-Harari-ebook/dp/B00K7ED54M/ref=pd_sim_351_2?_encoding=UTF8&psc=1&refRID=A1KNY9QCKWPQ94BA248Q) +## On Writing + +* https://www.proofreadingservices.com/pages/very +* + ## References [^1]: [Conway's law in wikipedia](https://en.wikipedia.org/wiki/Conway%27s_law) +[^2]: [No Silver Bullet - F. Brooks](http://faculty.salisbury.edu/~xswang/Research/Papers/SERelated/no-silver-bullet.pdf) +[^3]: [The Lean Startup Principles](http://theleanstartup.com/principles) \ No newline at end of file From bc64882081a05d0147ecbfc379907fcfde72ece4 Mon Sep 17 00:00:00 2001 From: Joost van der Griendt Date: Sun, 5 Aug 2018 23:22:45 +0200 Subject: [PATCH 05/10] Add more notes about kubernetes. And along the way, also added a chapter on vim --- docs/kubernetes/cka-exam-prep.md | 255 ++++++++++++++++++++++++++++++ docs/kubernetes/cka-exam.md | 2 + docs/kubernetes/index.md | 68 ++++++++ docs/kubernetes/install-gke.md | 261 +++++++++++++++++++++++++++++++ docs/other/vim.md | 43 +++++ docs/productivity/index.md | 6 + mkdocs.yml | 6 + 7 files changed, 641 insertions(+) create mode 100644 docs/kubernetes/cka-exam-prep.md create mode 100644 docs/kubernetes/cka-exam.md create mode 100644 docs/kubernetes/index.md create mode 100644 docs/kubernetes/install-gke.md create mode 100644 docs/other/vim.md diff --git a/docs/kubernetes/cka-exam-prep.md b/docs/kubernetes/cka-exam-prep.md new file mode 100644 index 0000000..c7b6d55 --- /dev/null +++ b/docs/kubernetes/cka-exam-prep.md @@ -0,0 +1,255 @@ +# CKA Exam Prep + +* https://github.com/kelseyhightower/kubernetes-the-hard-way +* https://github.com/walidshaari/Kubernetes-Certified-Administrator +* https://github.com/kubernetes/community/blob/master/contributors/devel/e2e-tests.md +* https://www.cncf.io/certification/cka/ +* https://oscon2018.container.training +* https://github.com/ahmetb/kubernetes-network-policy-recipes +* https://github.com/ramitsurana/awesome-kubernetes +* https://sysdig.com/blog/kubernetes-security-guide/ +* https://severalnines.com/blog/installing-kubernetes-cluster-minions-centos7-manage-pods-services +* https://docs.google.com/presentation/d/1Gp-2blk5WExI_QR59EUZdwfO2BWLJqa626mK2ej-huo/edit#slide=id.g27a78b354c_0_0 + +## Some basic commands + +```bash +kubectl -n kube-public get secrets +``` + +## Test network policy + +For some common recipes, look at [Ahmet's recipe repository](https://github.com/ahmetb/kubernetes-network-policy-recipes). + +!!! warning + Make sure you have CNI enabled and you have a network plugin that enforces the policies. + +!!! note + You can check current existing policies like this: ```kubectl get netpol --all-namespaces``` + +### Example Ingress Policy + +```yaml +kind: NetworkPolicy +apiVersion: networking.k8s.io/v1 +metadata: + name: dui-network-policy + namespace: dui +spec: + podSelector: + matchLabels: + app: dui + distribution: server + ingress: [] +``` + +### Run test pod + +Apply above network policy, and then test in the same `dui` namespace, and in the `default` namespace. + +!!! note + Use `alpine:3.6` because telnet was dropped starting 3.7. + +```bash +kubectl -n dui get pods -l app=dui -o wide +kubectl run --rm -i -t --image=alpine:3.6 -n dui test -- sh +telnet 10.32.0.7 8888 +``` + +This should now fail - timeout - due the packages being dropped. + +### Egress + +```yaml +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: dui-network-policy-egress + namespace: dui +spec: + podSelector: + matchLabels: + app: dui + policyTypes: + - Egress + egress: + - ports: + - port: 7777 + protocol: TCP + - to: + - podSelector: + matchLabels: + app: dui +``` + +!!! warning + This should in theory, block our test pod from reading this. + As it doesn't have the label `app=dui`. But it seems it is working just fine. + +#### Allow DNS + +If it should also be able to do DNS calls, we have to enable port 53. + +```yaml + + - ports: + - port: 53 + protocol: UDP + - port: 53 + protocol: TCP + - port: 7777 + protocol: TCP + - to: + - namespaceSelector: {} +``` + +#### Create a test pod with curl + +```bash +kubectl run --rm -i -t --image=alpine:3.6 -n dui test -- sh +apk --no-cache add curl +curl 10.32.0.11:7777/servers +``` + + +## Run minikube cluster + +```bash +###################### +# Create The Cluster # +###################### + +# Make sure that your minikube version is v0.25 or higher + +# WARNING!!! +# Some users experienced problems starting the cluster with minikuber v0.26 and v0.27. +# A few of the reported issues are https://github.com/kubernetes/minikube/issues/2707 and https://github.com/kubernetes/minikube/issues/2703 +# If you are experiencing problems creating a cluster, please consider downgrading to minikube v0.25. + +minikube start \ + --vm-driver virtualbox \ + --cpus 4 \ + --memory 12228 \ + --network-plugin=cni \ + --extra-config=kubelet.network-plugin=cni + +############################### +# Install Ingress and Storage # +############################### + +minikube addons enable ingress + +minikube addons enable storage-provisioner + +minikube addons enable default-storageclass + +################## +# Install Tiller # +################## + +kubectl create \ + -f https://raw.githubusercontent.com/vfarcic/k8s-specs/master/helm/tiller-rbac.yml \ + --record --save-config + +helm init --service-account tiller + +kubectl -n kube-system \ + rollout status deploy tiller-deploy + +################## +# Get Cluster IP # +################## + +export LB_IP=$(minikube ip) + +####################### +# Install ChartMuseum # +####################### + +CM_ADDR="cm.$LB_IP.nip.io" + +echo $CM_ADDR + +CM_ADDR_ESC=$(echo $CM_ADDR \ + | sed -e "s@\.@\\\.@g") + +echo $CM_ADDR_ESC + +helm install stable/chartmuseum \ + --namespace charts \ + --name cm \ + --values helm/chartmuseum-values.yml \ + --set ingress.hosts."$CM_ADDR_ESC"={"/"} \ + --set env.secret.BASIC_AUTH_USER=admin \ + --set env.secret.BASIC_AUTH_PASS=admin + +kubectl -n charts \ + rollout status deploy \ + cm-chartmuseum + +# http "http://$CM_ADDR/health" # It should return `{"healthy":true} + +###################### +# Install Weave Net ## +###################### + +kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" +kubectl -n kube-system rollout status daemonset weave-net +``` + +## Weave Net + +### On minikube + +> To run Weave Net on minikube, after upgrading minikube, you need to overwrite the default CNI config shipped with minikube: mkdir -p ~/.minikube/files/etc/cni/net.d/ && touch ~/.minikube/files/etc/cni.net.d/k8s.conf and then to start minikube with CNI enabled: minikube start --network-plugin=cni --extra-config=kubelet.network-plugin=cni. Afterwards, you can install Weave Net. + +```bash +kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" +``` + +## Install stern + +* [Stern - aggregate log rendering tool](https://github.com/wercker/stern) + +### via brew + +```bash +brew install stern +``` + +### Binary release + +```bash +sudo curl -L -o /usr/local/bin/stern \ + https://github.com/wercker/stern/releases/download/1.6.0/stern_linux_amd64 +sudo chmod +x /usr/local/bin/stern +``` + +## Sysdig + +### Install Sysdig + +### Run Sysdig for Kubernetes + +* collect API server address +* collect client cert + key +* https://www.digitalocean.com/community/tutorials/how-to-monitor-your-ubuntu-16-04-system-with-sysdig + +```bash +certificate-authority: /home/joostvdg/.minikube/ca.crt +server: https://192.168.99.100:8443 +client-certificate: /home/joostvdg/.minikube/client.crt +client-key: /home/joostvdg/.minikube/client.key +``` + +```bash +sysdig -k https://192.168.99.100:8443 -K /home/joostvdg/.minikube/client.crt:/home/joostvdg/.minikube/client.key + +sysdig -k https://192.168.99.100:8443 -K /home/joostvdg/.minikube/client.crt:/home/joostvdg/.minikube/client.key syslog.severity.str=info +``` + +### CSysdig + +```bash +sudo csysdig -k https://192.168.99.100:8443 -K /home/joostvdg/.minikube/client.crt:/home/joostvdg/.minikube/client.key +``` diff --git a/docs/kubernetes/cka-exam.md b/docs/kubernetes/cka-exam.md new file mode 100644 index 0000000..9e8019e --- /dev/null +++ b/docs/kubernetes/cka-exam.md @@ -0,0 +1,2 @@ +# Certified Kubernetes Administrator Exam + diff --git a/docs/kubernetes/index.md b/docs/kubernetes/index.md new file mode 100644 index 0000000..6314b5c --- /dev/null +++ b/docs/kubernetes/index.md @@ -0,0 +1,68 @@ +# Kubernetes + +## What is kubernetes + +## Kubernetes Objects + +## Kubernetes tutorials + + +## Kubernetes Guides + +### Linux basics + +#### Namespaces & CGroups + +* https://jvns.ca/blog/2016/10/10/what-even-is-a-container/ +* https://www.youtube.com/watch?v=sK5i-N34im8 +* https://www.ianlewis.org/en/what-are-kubernetes-pods-anyway + +### Networking + +* https://itnext.io/kubernetes-networking-behind-the-scenes-39a1ab1792bb +* https://github.com/nleiva/kubernetes-networking-links +* [IP Tables](https://www.booleanworld.com/depth-guide-iptables-linux-firewall/) +* [CIDR Explanation video](https://www.youtube.com/watch?v=Q1U9wVXRuHA) +* [Packets & Frames introduction](https://www.youtube.com/watch?v=zhlMLRNY5-4) + +### Ingress + +* [Traefik on AWS](https://medium.com/@carlosedp/multiple-traefik-ingresses-with-letsencrypt-https-certificates-on-kubernetes-b590550280cf) + + +### Metrics + +* https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-66936addedae + +### Secrets + +* [Hashicorp Vault - Kelsey Hightower](https://github.com/kelseyhightower/vault-on-google-kubernetes-engine) +* https://medium.com/qubit-engineering/kubernetes-up-integrated-secrets-configuration-5a15b9f5a6c6 + +### Security + +* [RBAC](https://docs.bitnami.com/kubernetes/how-to/configure-rbac-in-your-kubernetes-cluster/) +* [11 ways not to get hacked on Kubernetes](https://kubernetes.io/blog/2018/07/18/11-ways-not-to-get-hacked/) + +## Tools to use + +* [Microscanner](https://github.com/aquasecurity/microscanner) +* [Dex - OpenID Connect solution](https://github.com/coreos/dex) +* [Sonubuoy](https://github.com/heptio/sonobuoy) +* [Helm](https://helm.sh/) + * [Learn LUA book](https://www.lua.org/pil/contents.html), lua's required for Helm 3.0 +* [ChartMuseum](https://github.com/kubernetes-helm/chartmuseum) +* [Skaffold](https://github.com/GoogleContainerTools/skaffold) +* [KSynch](https://github.com/vapor-ware/ksync) +* Traefik +* [Istio](https://istio.io/) +* Falco +* Prometheus +* Grafana +* Jenkins +* Kaniko +* Prow? +* Knative +* [Rook](https://rook.io/) +* [Stern - aggregate log rendering tool](https://github.com/wercker/stern) +* [Linkerd 2](https://linkerd.io/) \ No newline at end of file diff --git a/docs/kubernetes/install-gke.md b/docs/kubernetes/install-gke.md new file mode 100644 index 0000000..c0cb60d --- /dev/null +++ b/docs/kubernetes/install-gke.md @@ -0,0 +1,261 @@ +# GCE + +## GKE Install + +### Set env + +```bash +ZONE=$(gcloud compute zones list --filter "region:(europe-west4)" | awk '{print $1}' | tail -n 1) +ZONES=$(gcloud compute zones list --filter "region:(europe-west4)" | tail -n +2 | awk '{print $1}' | tr '\n' ',') + +MACHINE_TYPE=n1-highcpu-2 +MACHINE_TYPE=n1-standard-2 + +echo ZONE=$ZONE +echo ZONES=$ZONES +echo MACHINE_TYPE=$MACHINE_TYPE +``` + +### Get supported K8s versions + +```bash +gcloud container get-server-config --zone=$ZONE --format=json +``` +```bash +MASTER_VERSION="1.10.5-gke.0" +``` + +### Create cluster + +```bash +gcloud container clusters \ + create devops24 \ + --zone $ZONE \ + --node-locations $ZONES \ + --machine-type $MACHINE_TYPE \ + --enable-autoscaling \ + --num-nodes 1 \ + --max-nodes 1 \ + --min-nodes 1 \ + --cluster-version $MASTER_VERSION +``` + +### Kubernetes post install + +* create cluster role binding +* install nginx as ingress controller +* install tiller +* configure helm + +```bash +kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account) +kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/mandatory.yaml +kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/provider/cloud-generic.yaml +kubectl create -f https://raw.githubusercontent.com/vfarcic/k8s-specs/master/helm/tiller-rbac.yml --record --save-config +kubectl create serviceaccount --namespace kube-system tiller +kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller +helm init --service-account tiller +``` + +### Install Weave net + +```bash +kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')" +``` + +#### Encryption + +You can run Weave Net with encryption on. +This requires a Kubernetes secret containing the encryption password. + +```bash +cat > weave-secret << EOF +MSjNDSC6Rw7F3P3j8klHZq1v +EOF + +kubectl create secret -n kube-system generic weave-secret --from-file=./weave-secret +kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&password-secret=weave-secret" +kubectl get pods -n kube-system -l name=weave-net -o wide +kubectl exec -n kube-system weave-net- -c weave -- /home/weave/weave --local status +``` + +#### Installation error + +Note, that installing Weave Net on GKE requires the cluster-admin role to be bound to yourself. +Else you will not have enough rights. + +If you see: + +```bash +rror from server (Forbidden): clusterroles.rbac.authorization.k8s.io "weave-net" is forbidden: attempt to grant extra privileges: [PolicyRule{APIGroups:[""], Resources:["pods"], Verbs:["get"]} PolicyRule{APIGroups:[""], Resources:["pods"], Verbs:["list"]} PolicyRule{APIGroups:[""], Resources:["pods"], Verbs:["watch"]} PolicyRule{APIGroups:[""], Resources:["namespaces"], Verbs:["get"]} PolicyRule{APIGroups:[""], Resources:["namespaces"], Verbs:["list"]} PolicyRule{APIGroups:[""], Resources:["namespaces"], Verbs:["watch"]} PolicyRule{APIGroups:[""], Resources:["nodes"], Verbs:["get"]} PolicyRule{APIGroups:[""], Resources:["nodes"], Verbs:["list"]} PolicyRule{APIGroups:[""], Resources:["nodes"], Verbs:["watch"]} PolicyRule{APIGroups:["networking.k8s.io"], Resources:["networkpolicies"], Verbs:["get"]} PolicyRule{APIGroups:["networking.k8s.io"], Resources:["networkpolicies"], Verbs:["list"]} PolicyRule{APIGroups:["networking.k8s.io"], Resources:["networkpolicies"], Verbs:["watch"]} PolicyRule{APIGroups:[""], Resources:["nodes/status"], Verbs:["patch"]} PolicyRule{APIGroups:[""], Resources:["nodes/status"], Verbs:["update"]}] user=&{joostvdg@gmail.com [system:authenticated] map[]} ownerrules=[PolicyRule{APIGroups:["authorization.k8s.io"], Resources:["selfsubjectaccessreviews" "selfsubjectrulesreviews"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/openapi" "/openapi/*" "/swagger-2.0.0.pb-v1" "/swagger.json" "/swaggerapi" "/swaggerapi/*" "/version" "/version/"], Verbs:["get"]}] ruleResolutionErrors=[] +Error from server (Forbidden): roles.rbac.authorization.k8s.io "weave-net" is forbidden: attempt to grant extra privileges: [PolicyRule{APIGroups:[""], Resources:["configmaps"], ResourceNames:["weave-net"], Verbs:["get"]} PolicyRule{APIGroups:[""], Resources:["configmaps"], ResourceNames:["weave-net"], Verbs:["update"]} PolicyRule{APIGroups:[""], Resources:["configmaps"], Verbs:["create"]}] user=&{joostvdg@gmail.com [system:authenticated] map[]} ownerrules=[PolicyRule{APIGroups:["authorization.k8s.io"], Resources:["selfsubjectaccessreviews" "selfsubjectrulesreviews"], Verbs:["create"]} PolicyRule{NonResourceURLs:["/api" "/api/*" "/apis" "/apis/*" "/healthz" "/openapi" "/openapi/*" "/swagger-2.0.0.pb-v1" "/swagger.json" "/swaggerapi" "/swaggerapi/*" "/version" "/version/"], Verbs:["get"]}] ruleResolutionErrors=[] +``` + +Execute the following before attempting to install Weave Net again. + +```bash +kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account) +``` + +### Prometheus & Grafana + +https://rohanc.me/monitoring-kubernetes-prometheus-grafana/ + +```bash +helm install stable/prometheus --name my-prometheus +``` + +#### Grafana config + +```yaml +persistence: + enabled: true + accessModes: + - ReadWriteOnce + size: 5Gi + +datasources: + datasources.yaml: + apiVersion: 1 + datasources: + - name: Prometheus + type: prometheus + url: http://my-prometheus-server + access: proxy + isDefault: true + +dashboards: + default: + kube-dash: + gnetId: 6663 + revision: 1 + datasource: Prometheus + kube-official-dash: + gnetId: 2 + revision: 1 + datasource: Prometheus + +dashboardProviders: + dashboardproviders.yaml: + apiVersion: 1 + providers: + - name: 'default' + orgId: 1 + folder: '' + type: file + disableDeletion: false + editable: true + options: + path: /var/lib/grafana/dashboards +``` + +Other dashboards to import: +* 3131 +* 5309 +* 5312 +* 315 + +### Get Cluster IP + +```bash +export LB_IP=$(kubectl -n ingress-nginx \ + get svc ingress-nginx \ + -o jsonpath="{.status.loadBalancer.ingress[0].ip}") + +echo $LB_IP + +export DNS=${LB_IP}.nip.io +echo $DNS + +export JENKINS_DNS="jenkins.${DNS}" +echo $JENKINS_DNS +``` + +### Install CJE + +#### Create SSD SC + +```bash +echo "apiVersion: storage.k8s.io/v1 +kind: StorageClass +metadata: + name: ssd +provisioner: kubernetes.io/gce-pd +parameters: + type: pd-ssd" > ssd-storage.yaml + +kubectl create -f ssd-storage.yaml +``` + +#### Setup CJE Namespace + +```bash +kubectl create namespace cje +kubectl label namespace cje name=cje +kubectl config set-context $(kubectl config current-context) --namespace=cje +``` + +#### Adjust Domain name + +```bash +export PREV_DOMAIN_NAME= +``` + +```bash +sed -e s,$PREV_DOMAIN_NAME,$JENKINS_DNS,g < cje.yml > tmp && mv tmp cje.yml +``` + +### Install Jenkins + +```bash +kubectl apply -f cje.yml +kubectl rollout status sts cjoc +sleep 180 +kubectl exec cjoc-0 -- cat /var/jenkins_home/secrets/initialAdminPassword +``` + +### Install Jenkins - k8s-specs + +```bash +kubectl apply -f joost/jenkins.yml +sleep 180 +kubectl exec -it --namespace jenkins jenkins-0 cat /var/jenkins_home/secrets/initialAdminPassword +``` + +### Install Keycloak - k8s-specs + +```bash +kubectl apply -f joost/keycloak.yml +sleep 120 +kubectl -n jenkins exec -it keycloak-0 -- /bin/bash +keycloak/bin/add-user-keycloak.sh -u somekindofuser -p X5qpLMnWKUx7 +ps -ef | grep java +kill -9 +``` + +#### Follow log + +```bash +k -n jenkins logs -f keycloak +``` + +#### Jenkins Keycloak config + +```json +{ + "realm": "master", + "auth-server-url": "http://35.204.112.229/auth", + "ssl-required": "external", + "resource": "jenkins", + "public-client": true +} +``` + +### Destroy cluster + +```bash +gcloud container clusters \ + delete devops24 \ + --zone $ZONE \ + --quiet +``` \ No newline at end of file diff --git a/docs/other/vim.md b/docs/other/vim.md new file mode 100644 index 0000000..8174017 --- /dev/null +++ b/docs/other/vim.md @@ -0,0 +1,43 @@ +# VIM + +## Install Vundle + +```bash +git clone https://github.com/VundleVim/Vundle.vim.git ~/.vim/bundle/Vundle.vim +``` + +## Install plugins + +```bash +vim ~/.vimrc +``` + +```bash +filetype off +filetype plugin indent on +syntax on + +set rtp+=~/.vim/bundle/Vundle.vim +call vundle#begin() + +Plugin 'gmarik/Vundle.vim' +Plugin 'reedes/vim-thematic' +Plugin 'airblade/vim-gitgutter' +Plugin 'vim-airline/vim-airline' +Plugin 'vim-airline/vim-airline-themes' +Plugin 'itchyny/lightline.vim' +Plugin 'nathanaelkane/vim-indent-guides' +Plugin 'scrooloose/nerdtree' +Plugin 'editorconfig/editorconfig-vim' +Plugin 'mhinz/vim-signify' + +call vundle#end() + +filetype plugin indent on +``` + +Open VIM, and install the plugins: + +```bash +:installPlugins +``` \ No newline at end of file diff --git a/docs/productivity/index.md b/docs/productivity/index.md index b19e63d..4fbfa87 100644 --- a/docs/productivity/index.md +++ b/docs/productivity/index.md @@ -81,9 +81,15 @@ http://rstb.royalsocietypublishing.org/content/373/1753/20170239 * https://www.thoughtworks.com/insights/blog/why-it-departments-must-reinvent-themselves-part-1 * https://en.wikipedia.org/wiki/Peter_principle * https://hackernoon.com/why-all-engineers-must-understand-management-the-view-from-both-ladders-cc749ae14905 +* https://medium.freecodecamp.org/cognitive-bias-and-why-performance-management-is-so-hard-8852a1b874cd +* https://en.wikipedia.org/wiki/Horn_effect +* https://en.wikipedia.org/wiki/Halo_effect +* http://serendipstudio.org/bb/neuro/neuro02/web2/hhochman.html ### Articles +* [Concept of Shared Services and beyond](https://medium.com/@mattklein123/the-human-scalability-of-devops-e36c37d3db6a) +* [Introduction to Observability by Weave Net](https://www.weave.works/technologies/monitoring-kubernetes-with-prometheus/#observability-vs-monitoring) * [Article on the state of Systems Languages](https://blog.usejournal.com/systems-languages-an-experience-report-d008b2b12628) * [Article on SILO's](https://www.rundeck.com/blog/whats-a-silo-and-why-they-ruin-everything) * [Blog on Twitter's Engineering Efficiency](http://www.gigamonkeys.com/flowers/) diff --git a/mkdocs.yml b/mkdocs.yml index 9309d02..316c3ee 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,6 +26,12 @@ pages: - Swarm (mode): docker/swarm.md - Kubernetes: docker/kubernetes.md +- Kubernetes: + - Introduction: kubernetes/index.md + - CKA Exam details: kubernetes/cka-exam.md + - CKA Exam prep: kubernetes/cka-exam-prep.md + - GKE Installation: kubernetes/install-gke.md + - Software Engineering: - Naming: swe/naming.md - Domain Driven Design: swe/ddd.md From 00ad51aa5abd08bde7702bb4d9e6fa2757b64e6e Mon Sep 17 00:00:00 2001 From: Joost van der Griendt Date: Sun, 19 Aug 2018 13:05:18 +0200 Subject: [PATCH 06/10] Add more stuff about k8s and linux. --- docs/blogs/docker-graceful-shutdown.md | 419 +++++++++++++++++++++++++ docs/kubernetes/index.md | 7 +- docs/kubernetes/khw-controller.md | 114 +++++++ docs/kubernetes/khw-worker.md | 314 ++++++++++++++++++ docs/kubernetes/the-hard-way.md | 176 +++++++++++ docs/linux/index.md | 1 + docs/linux/iptables.md | 2 + docs/linux/networking.md | 1 + docs/linux/systemd.md | 19 ++ mkdocs.yml | 9 + 10 files changed, 1060 insertions(+), 2 deletions(-) create mode 100644 docs/blogs/docker-graceful-shutdown.md create mode 100644 docs/kubernetes/khw-controller.md create mode 100644 docs/kubernetes/khw-worker.md create mode 100644 docs/kubernetes/the-hard-way.md create mode 100644 docs/linux/index.md create mode 100644 docs/linux/iptables.md create mode 100644 docs/linux/networking.md create mode 100644 docs/linux/systemd.md diff --git a/docs/blogs/docker-graceful-shutdown.md b/docs/blogs/docker-graceful-shutdown.md new file mode 100644 index 0000000..eb73daa --- /dev/null +++ b/docs/blogs/docker-graceful-shutdown.md @@ -0,0 +1,419 @@ +# Graceful shutdown + +> We can speak about the graceful shutdown of our application, when all of the resources it used and all of the traffic and/or data processing what it handled are closed and released properly. It means that no database connection remains open and no ongoing request fails because we stop our application. - [Péter Márton](https://blog.risingstack.com/graceful-shutdown-node-js-kubernetes/) + +As I could not have done it better myself, I've quoted Péter Márton. + +I think we can say that cleaning up your mess and informing people of your impending departure is a good thing. Many programming languages and frameworks have hooks for listening to signals - which we explore later - allowing you to handle a shutdown, expected or not. + +When we have resources open, such as files, database connections, background processes and others. It would be best for ourselves, but also for our environment to clean those up before exiting. This cleanup would constitute a graceful shutdown. + +We're going to dive into this subject, exploring several complimentary topics that together should help improve your (Docker) application's ability to gracefully shutdown. + +* The case for graceful shutdown +* How to run processes in Docker +* Process management +* Signals management + +## The case for graceful shutdown + +We're in an age where many applications are running in Docker containers across a multitude of clusters and (potentially) different orchestrators. These bring with it, other concerns to tackle, such as logging, monitoring, tracing and many more. One significant way we defend ourselves against the perils of distributed nature of these clusters is to make our applications more resilient. + +However, there is still no guarantee your application is always up and running. So another concern we should tackle is how it responds when it does fail, including it being told to stop by the orchestrator. Now, this can happen for a variety of reasons, for example; your application's health check fails or your application consumed more resources than allowed. + +Not only does this increase the reliability of your application, but it also increases the reliability of the cluster it lives in. As you can not always know in advance where your application is run, you might not even be the one putting it in a docker container, make sure your application knows how to quit! + +## How to run processes in Docker + +There are many ways to run a process in Docker. I prefer to make things easy to understand and easy to know what to expect. So this article deals with processes started by commands in a Dockerfile. + +There are several ways to run a command in a Dockerfile. + +These are: + +* **RUN**: runs a command during the docker build phase +* **CMD**: runs a command when the container gets started +* **ENTRYPOINT**: provides the location from where commands get run when the container starts +You need at least one ENTRYPOINT or CMD in a Dockerfile for it to be valid. They can be used in collaboration but they can do similar things. + +You can put these commands in both a shell form and an exec form. For more information on these commands, you should check out [Docker's docs on Entrypoint vs. CMD](https://docs.docker.com/engine/reference/builder/#exec-form-entrypoint-example). + +In summary, the shell form runs the command as a shell command and spawn a process via /bin/sh -c. + +Whereas the exec form executes a child process that is still attached to PID1. + +We'll show you what that looks like, borrowing the Docker docs example referred to earlier. + +### Docker Shell form example + +Create the following Dockerfile: + +```dockerfile +FROM ubuntu:18.04 +ENTRYPOINT top -b +``` + +Then build and run it: + +```bash +docker image build --tag shell-form . +docker run --name shell-form --rm shell-form +``` + +This should yield the following: + +```bash +top - 16:34:56 up 1 day, 5:15, 0 users, load average: 0.00, 0.00, 0.00 +Tasks: 2 total, 1 running, 1 sleeping, 0 stopped, 0 zombie +%Cpu(s): 0.4 us, 0.3 sy, 0.0 ni, 99.2 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st +KiB Mem : 2046932 total, 541984 free, 302668 used, 1202280 buff/cache +KiB Swap: 1048572 total, 1042292 free, 6280 used. 1579380 avail Mem + + PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND + 1 root 20 0 4624 760 696 S 0.0 0.0 0:00.05 sh + 6 root 20 0 36480 2928 2580 R 0.0 0.1 0:00.01 top +``` + +As you can see, two processes are running, **sh** and **top**. +Meaning, that killing the process, with *ctrl+c* for example, terminates the **sh** process, but not **top**. +To kill this container, open a second terminal and execute the following command. + +```bash +docker rm -f shell-form +``` + +As you can imagine, this is usually not what you want. +So as a general rule, you should never use the shell form. So on to the exec form we go! + +### Docker exec form example + +The exec form is written as an array of parameters: `ENTRYPOINT ["top", "-b"]` + +To continue in the same line of examples, we will create a Dockerfile, build and run it. + +```dockerfile +FROM ubuntu:18.04 +ENTRYPOINT ["top", "-b"] +``` + +Then build and run it: + +```bash +docker image build --tag exec-form . +docker run --name exec-form --rm exec-form +``` + +This should yield the following: + +```bash +top - 18:12:30 up 1 day, 6:53, 0 users, load average: 0.00, 0.00, 0.00 +Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie +%Cpu(s): 0.4 us, 0.3 sy, 0.0 ni, 99.2 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st +KiB Mem : 2046932 total, 535896 free, 307196 used, 1203840 buff/cache +KiB Swap: 1048572 total, 1042292 free, 6280 used. 1574880 avail Mem + + PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND + 1 root 20 0 36480 2940 2584 R 0.0 0.1 0:00.03 top +``` + +### Docker exec form with parameters + +A caveat with the exec form is that it doesn't interpolate parameters. + +You can try the following: + +```dockerfile +FROM ubuntu:18.04 +ENV PARAM="-b" +ENTRYPOINT ["top", "${PARAM}"] +``` + +Then build and run it: + +```bash +docker image build --tag exec-param . +docker run --name exec-form --rm exec-param +``` + +This should yield the following: + +```bash +/bin/sh: 1: [top: not found +``` + +This is where Docker created a mix between the two styles. +It allows you to create an *Entrypoint* with a shell command - performing interpolation - but executing it as an exec form. +This can be done by prefixing the shell form, with, you guessed it, *exec*. + +```dockerfile +FROM ubuntu:18.04 +ENV PARAM="-b" +ENTRYPOINT exec "top" "${PARAM}" +``` + +Then build and run it: + +```bash +docker image build --tag exec-param . +docker run --name exec-form --rm exec-param +``` + +This will return the exact same as if we would've run `ENTRYPOINT ["top", "-b"]`. + +Now you can also override the param, by using the environment variable flag. + +```bash +docker image build --tag exec-param . +docker run --name exec-form --rm -e PARAM="help" exec-param +``` + +Resulting in top's help string. + +### The special case of Alpine + +One of the main [best practices for Dockerfiles](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/), is to make them as small as possible. +The easiest way to do this is to start with a minimal image. +This is where [Alpine Linux](https://hub.docker.com/_/alpine/) comes in. We will revisit out shell form example, but replace ubuntu with alpine. + +Create the following Dockerfile. + +```dockerfile +FROM alpine:3.8 +ENTRYPOINT top -b +``` + +Then build and run it. + +```bash +docker image build --tag exec-param . +docker run --name exec-form --rm -e PARAM="help" exec-param +``` + +It will result in the following output. + +```bash +Mem: 1509068K used, 537864K free, 640K shrd, 126756K buff, 1012436K cached +CPU: 0% usr 0% sys 0% nic 100% idle 0% io 0% irq 0% sirq +Load average: 0.00 0.00 0.00 2/404 5 + PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND + 1 0 root R 1516 0% 0 0% top -b +``` + +Aside from **top**'s output looking a bit different, there is only one command. + +Alpine Linux helps us avoid the problem of shell form altogether! + +## Process management + +Now that we know how to create a Dockerfile that helps us make sure we can run as PID1 so that we can make sure our process correctly responds to signals? + +We'll get into signal handling next, but first, let us explore how we can manage our process. +As you're used to by now, there are multiple solutions at our disposal. + +We can broadly categorize them like this: + +* Process manages itself and it's children, by itself +* We let Docker manage the process, and it's children +* We use a process manager to do the work for us + +### Process manages itself + +Great, if this is the case, it saves you some trouble of relying on dependencies. +Unfortunately, not all processes are [designed for PID1](https://www.fpcomplete.com/blog/2016/10/docker-demons-pid1-orphans-zombies-signals), and some might be [prone to zombie processes regardless](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem). + +In those cases, you still have to invest some time and effort to get a solution in place. + + +### Docker manages PID1 + +Docker has a build in feature, that it uses a lightweight process manager to help you. + +So if you're running your images with Docker itself, either directly or via Compose or Swarm, you're fine. You can use the init flag in your run command or your compose file. + +Please, note that the below examples require a certain minimum version of Docker. + +* run - 1.13+ +* [compose (v 2.2)](https://docs.docker.com/compose/compose-file/compose-file-v2/#image) - 1.13.0+ +* [swarm (v 3.7)](https://docs.docker.com/compose/compose-file/#init) - 18.06.0+ + +#### Docker Run + +```bash +docker run --rm -ti --init caladreas/dui +``` + +#### Docker Compose + +```yaml +version: '2.2' +services: + web: + image: caladreas/java-docker-signal-demo:no-tini + init: true +``` + +#### Docker Swarm + +```yaml +version: '3.7' +services: + web: + image: caladreas/java-docker-signal-demo:no-tini + init: true +``` + +Relying on Docker does create a dependency on how your container runs. It only runs correctly in Docker-related technologies (run, compose, swarm) and only if the proper versions are available. + +Creating either a different experience for users running your application somewhere else or not able to meet the version requirements. So maybe another solution is to bake a process manager into your image and guarantee its behavior. + +### Depend on a process manager +One of our goals for Docker images is to keep them small. We should look for a lightweight process manager. It does not have too many a whole machine worth or processes, just one and perhaps some children. + +Here we would like to introduce you to [Tini](https://github.com/krallin/tini), a lightweight process manager [designed for this purpose](https://github.com/krallin/tini/issues/8). +It is a very successful and widely adopted process manager in the Docker world. So successful, that the before mentioned init flags from Docker are implemented by baking [Tini into Docker](https://github.com/krallin/tini/issues/81). + +#### Debian example + +For brevity, the build process is excluded, and for image size, we use Debian slim instead of default Debian. + +```dockerfile +FROM debian:stable-slim +ENV TINI_VERSION v0.18.0 +ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini +RUN chmod +x /tini +ENTRYPOINT ["/tini", "-vv","-g", "--", "/usr/bin/dui/bin/dui","-XX:+UseCGroupMemoryLimitForHeap", "-XX:+UnlockExperimentalVMOptions"] +COPY --from=build /usr/bin/dui-image/ /usr/bin/dui +``` + +#### Alpine example + +Alpine Linux works wonders for Docker images, so to improve our lives, you can very easily install it if you want. + +```dockerfile +FROM alpine +RUN apk add --no-cache tini +ENTRYPOINT ["/sbin/tini", "-vv","-g","-s", "--"] +CMD ["top -b"] +``` + +## Signals management + +Now that we can capture signals and manage our process, we have to see how we can manage those signals. There are three parts to this: + +* **Handle signals**: we should make sure our process can deal with the signals it receives +* **Receive the right signals**: we might have to alter the signals we receive from our orchestrators +* **Signals and Docker orchestrators**: we have to help our orchestrators to know when to deliver these signals. + +For more details on the subject of Signals and Docker, please read this excellent blog from [Grigorii Chudnov](https://medium.com/@gchudnov/trapping-signals-in-docker-containers-7a57fdda7d86). + +### Handle signals + +Handling process signals depend on your application, programming language or framework. + +For Java and Go(lang) we dive into this further, exploring some options we have here, including some of the most used frameworks. + +### Receive the right signals + +Sometimes your language or framework of choice, doesn't handle signals all that well. +It might be very rigid in what it does with specific signals, removing your ability to do the right thing. +Of course, not all languages or frameworks are designed with Docker container or Microservices in mind, are yet to catch up to this more dynamic environment. + +Luckily Docker and Kubernetes allow you to specify what signal too sent to your process. + +#### Docker run + +```bash +docker run --rm -ti --init --stop-signal=SIGINT \ + caladreas/java-docker-signal-demo +``` + +#### Docker compose/swarm + +Docker's compose file format allows you to specify a [stop signal](https://docs.docker.com/compose/compose-file/compose-file-v2/#stop_signal). +This is the signal sent when the container is stopped in a normal fashion. Normal in this case, meaning `docker stop` or when docker itself determines it should stop the container. + +If you forcefully remove the container, for example with `docker rm -f`  it will directly kill the process, so don't do that. + +```yaml +version: '2.2' +services: + web: + image: caladreas/java-docker-signal-demo + stop_signal: SIGINT + stop_grace_period: 15s +``` + +If you run this with `docker-compose up` and then in a second terminal, stop the container, you will see something like this. + +```bash +web_1 | HelloWorld! +web_1 | Shutdown hook called! +web_1 | We're told to stop early... +web_1 | java.lang.InterruptedException: sleep interrupted +web_1 | at java.base/java.lang.Thread.sleep(Native Method) +web_1 | at joostvdg.demo.signal@1.0/com.github.joostvdg.demo.signal.HelloWorld.printHelloWorld(Unknown Source) +web_1 | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) +web_1 | at java.base/java.util.concurrent.FutureTask.run(Unknown Source) +web_1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) +web_1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) +web_1 | at java.base/java.lang.Thread.run(Unknown Source) +web_1 | [DEBUG tini (1)] Passing signal: 'Interrupt' +web_1 | [DEBUG tini (1)] Received SIGCHLD +web_1 | [DEBUG tini (1)] Reaped child with pid: '7' +web_1 | [INFO tini (1)] Main child exited with signal (with signal 'Interrupt') +``` + +#### Kubernetes + +In Kubernetes we can make use of [Container Lifecycle Hooks](https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/) to manage how our container should be stopped. +We could, for example, send a SIGINT (interrupt) to tell our application to stop. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: java-signal-demo + namespace: default + labels: + app: java-signal-demo +spec: + replicas: 1 + template: + metadata: + labels: + app: java-signal-demo + spec: + containers: + - name: main + image: caladreas/java-docker-signal-demo + lifecycle: + preStop: + exec: + command: ["killall", "java" , "-INT"] + terminationGracePeriodSeconds: 60 +``` + +When you create this as deployment.yml, create and delete it - `kubectl apply -f deployment.yml` / `kubectl delete -f deployment.yml` - you will see the same behavior. + +  +### Signals and Docker orchestrators + +Now that we can respond to signals and receive the correct signals, there's one last thing to take care off. +We have to make sure our orchestrator of choice sends these signals for the right reasons. +Quickly telling us, there's something wrong with our running process, and it should shut down, which of course, we'll do gracefully! + +As the topic for health, readiness and liveness checks is a topic on its own, we'll keep it short. +Giving some basic examples and pointing you to more work to further investigate how to use it to your advantage. + +### Docker + +You can either configure your health check in your [Dockerfile](https://docs.docker.com/engine/reference/builder/#healthcheck) or + configure it in your [docker-compose.yml](https://docs.docker.com/compose/compose-file/#healthcheck) for either compose or swarm. + +Considering only Docker can use the health check in your Dockerfile, + it is strongly recommended to have health checks in your application and document how they can be used. + +### Kubernetes + +In Kubernetes we have the concept of [Container Probes](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes). +This allows you to configure whether your container is ready (readinessProbe) to be used and if it is still working as expected (livenessProbe). diff --git a/docs/kubernetes/index.md b/docs/kubernetes/index.md index 6314b5c..65d7f6c 100644 --- a/docs/kubernetes/index.md +++ b/docs/kubernetes/index.md @@ -43,6 +43,7 @@ * [RBAC](https://docs.bitnami.com/kubernetes/how-to/configure-rbac-in-your-kubernetes-cluster/) * [11 ways not to get hacked on Kubernetes](https://kubernetes.io/blog/2018/07/18/11-ways-not-to-get-hacked/) +* https://www.youtube.com/channel/UCiqnRXPAAk6iv2m47odUFzw ## Tools to use @@ -61,8 +62,10 @@ * Grafana * Jenkins * Kaniko -* Prow? -* Knative +* [Prow](https://github.com/kubernetes/test-infra/tree/master/prow) +* [Tarmak](http://docs.tarmak.io/user-guide.html#user-guide) +* [Kube-Lego](https://github.com/jetstack/kube-lego#usage) +* [Knative](https://github.com/GoogleCloudPlatform/knative-build-tutorials#hello-world) * [Rook](https://rook.io/) * [Stern - aggregate log rendering tool](https://github.com/wercker/stern) * [Linkerd 2](https://linkerd.io/) \ No newline at end of file diff --git a/docs/kubernetes/khw-controller.md b/docs/kubernetes/khw-controller.md new file mode 100644 index 0000000..6ef4c1c --- /dev/null +++ b/docs/kubernetes/khw-controller.md @@ -0,0 +1,114 @@ +# Controller Config + +## Configure API Server + +### Prepare folders + +```bash +sudo mkdir -p /var/lib/kubernetes/ + +sudo mv ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem \ + service-account-key.pem service-account.pem \ + encryption-config.yaml /var/lib/kubernetes/ +``` + +### Get internal IP + +```bash +INTERNAL_IP=$(curl -s -H "Metadata-Flavor: Google" \ + http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip) +echo INTERNAL_IP=$INTERNAL_IP +``` + +### Configure SystemD service + +```ini +cat <` argument to ensure traffic to IPs outside this range will use IP masquerade. + + Not sure, if this is the cause, but looks like this is a requirement and is missing from the Kubelet config. + + + +```ini +cat < Increasingly, Linux distributions are adopting or planning to adopt the `systemd` init system. +This powerful suite of software can manage many aspects of your server, + from services to mounted devices and system states. [^1] + + +## Concepts + +### Unit + + > In `systemd`, a `unit` refers to any resource that the system knows how to operate on and manage. + This is the primary object that the `systemd` tools know how to deal with. + These resources are defined using configuration files called **unit files**. [^1] + + + +## References +[^1]: [Introduction to systemd from Digital Ocean](https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files) \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 316c3ee..727a3ef 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,11 +26,20 @@ pages: - Swarm (mode): docker/swarm.md - Kubernetes: docker/kubernetes.md +- Linux: + - Introduction: linux/index.md + - SystemD: linux/systemd.md + - Networking: linux/networking.md + - iptables: linux/iptables.md + - Kubernetes: - Introduction: kubernetes/index.md - CKA Exam details: kubernetes/cka-exam.md - CKA Exam prep: kubernetes/cka-exam-prep.md - GKE Installation: kubernetes/install-gke.md + - KHW - GKE Terraform infra : kubernetes/the-hard-way.md + - KHW - GKE Workers: kubernetes/khw-worker.md + - KHW - GKE Controller: kubernetes/khw-controller.md - Software Engineering: - Naming: swe/naming.md From 74f824233daf927417f627bf0d54d0d251a85e15 Mon Sep 17 00:00:00 2001 From: Joost van der Griendt Date: Mon, 27 Aug 2018 19:55:42 +0200 Subject: [PATCH 07/10] Update K8S hardway and expand --- docs/kubernetes/cka-exam-prep.md | 1 + docs/kubernetes/khw-gce/certificates.md | 67 +++ .../controller.md} | 3 +- docs/kubernetes/khw-gce/encryption.md | 28 ++ docs/kubernetes/khw-gce/etcd.md | 45 ++ docs/kubernetes/khw-gce/index.md | 144 +++++++ docs/kubernetes/khw-gce/kubeconfigs.md | 53 +++ docs/kubernetes/khw-gce/network.md | 75 ++++ docs/kubernetes/khw-gce/remote-access.md | 29 ++ docs/kubernetes/khw-gce/terraform-compute.md | 399 ++++++++++++++++++ .../{khw-worker.md => khw-gce/worker.md} | 75 ---- docs/kubernetes/the-hard-way.md | 176 -------- docs/linux/systemd.md | 32 ++ docs/productivity/index.md | 13 + mkdocs.yml | 18 +- 15 files changed, 903 insertions(+), 255 deletions(-) create mode 100644 docs/kubernetes/khw-gce/certificates.md rename docs/kubernetes/{khw-controller.md => khw-gce/controller.md} (99%) create mode 100644 docs/kubernetes/khw-gce/encryption.md create mode 100644 docs/kubernetes/khw-gce/etcd.md create mode 100644 docs/kubernetes/khw-gce/index.md create mode 100644 docs/kubernetes/khw-gce/kubeconfigs.md create mode 100644 docs/kubernetes/khw-gce/network.md create mode 100644 docs/kubernetes/khw-gce/remote-access.md create mode 100644 docs/kubernetes/khw-gce/terraform-compute.md rename docs/kubernetes/{khw-worker.md => khw-gce/worker.md} (79%) delete mode 100644 docs/kubernetes/the-hard-way.md diff --git a/docs/kubernetes/cka-exam-prep.md b/docs/kubernetes/cka-exam-prep.md index c7b6d55..8d9a77d 100644 --- a/docs/kubernetes/cka-exam-prep.md +++ b/docs/kubernetes/cka-exam-prep.md @@ -10,6 +10,7 @@ * https://sysdig.com/blog/kubernetes-security-guide/ * https://severalnines.com/blog/installing-kubernetes-cluster-minions-centos7-manage-pods-services * https://docs.google.com/presentation/d/1Gp-2blk5WExI_QR59EUZdwfO2BWLJqa626mK2ej-huo/edit#slide=id.g27a78b354c_0_0 +* https://engineering.bitnami.com/articles/a-deep-dive-into-kubernetes-controllers.html ## Some basic commands diff --git a/docs/kubernetes/khw-gce/certificates.md b/docs/kubernetes/khw-gce/certificates.md new file mode 100644 index 0000000..3f136b5 --- /dev/null +++ b/docs/kubernetes/khw-gce/certificates.md @@ -0,0 +1,67 @@ +# Certificates + +!!! note + Before we can continue here, we need to have our nodes up and running with their external ip addresses and our fixed public ip address. + This is because some certificates require these external ip addresses! + ```bash + gcloud compute instances list + gcloud compute addresses list --filter="name=('kubernetes-the-hard-way')" + ``` + +We need to create a whole lot of certificates, listed below, with the help of [cfssl](https://github.com/cloudflare/cfssl). +A tool from CDN provider CloudFlare. + +## Required certificates + +* **CA** (or Certificate Authority): will be the root certificate of our trust chain + * result: `ca.pem` & `ca-key.pem` +* **Admin**: the admin of our cluster (you!) + * result: `admin-key.pem` & `admin.pem` +* **Kubelet**: the certificates of the kubelet processes on the worker nodes + * result: + ```worker-0-key.pem + worker-0.pem + worker-1-key.pem + worker-1.pem + worker-2-key.pem + worker-2.pem + ``` +* **Controller Manager** + * result: `kube-controller-manager-key.pem` & `kube-controller-manager.pem` +* **Scheduler** + * result: `kube-scheduler-key.pem` & `kube-scheduler.pem` +* **API Server** + * result `kubernetes-key.pem` & `kubernetes.pem` +* **Service Account**: ??? + * result: `service-account-key.pem` & `service-account.pem` + +## Certificate example + +Because we will use the `cfssl` tool from CloudFlare, we will define our certificate signing request (CSR's) in json. + +```json +{ + "CN": "service-accounts", + "key": { + "algo": "rsa", + "size": 2048 + }, + "names": [ + { + "C": "NL", + "L": "Utrecht", + "O": "Kubernetes", + "OU": "Kubernetes The Hard Way", + "ST": "Utrecht" + } + ] +} +``` + +## Install scripts + +Make sure you're in `k8s-the-hard-way/scripts` + +```bash +./certs.sh +``` diff --git a/docs/kubernetes/khw-controller.md b/docs/kubernetes/khw-gce/controller.md similarity index 99% rename from docs/kubernetes/khw-controller.md rename to docs/kubernetes/khw-gce/controller.md index 6ef4c1c..9152311 100644 --- a/docs/kubernetes/khw-controller.md +++ b/docs/kubernetes/khw-gce/controller.md @@ -111,4 +111,5 @@ RestartSec=5 [Install] WantedBy=multi-user.target EOF -``` \ No newline at end of file +``` + diff --git a/docs/kubernetes/khw-gce/encryption.md b/docs/kubernetes/khw-gce/encryption.md new file mode 100644 index 0000000..96ae6c1 --- /dev/null +++ b/docs/kubernetes/khw-gce/encryption.md @@ -0,0 +1,28 @@ +# Encryption + +> Kubernetes stores a variety of data including cluster state, application configurations, and secrets. Kubernetes supports the ability to encrypt cluster data at rest. + +In order to use this ability to encrypt data at rest, each member of the control plane has to know the encryption key. + +So we will have to create one. + +## Encryption configuration + +We have to create a encryption key first. +For the sake of embedding it into a yaml file, we will have to encode it to `base64`. + +```bash +ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64) +``` + +```yaml + +``` + +## Install scripts + +Make sure you're in `k8s-the-hard-way/scripts` + +```bash +./encryption.sh +``` diff --git a/docs/kubernetes/khw-gce/etcd.md b/docs/kubernetes/khw-gce/etcd.md new file mode 100644 index 0000000..2e2df65 --- /dev/null +++ b/docs/kubernetes/khw-gce/etcd.md @@ -0,0 +1,45 @@ +# ETCD + +> Kubernetes components are stateless and store cluster state in etcd. In this lab you will bootstrap a three node etcd cluster and configure it for high availability and secure remote access. + +The bare minimum is to have a single `etcd` instance running. But for production purposes it is best to run etcd in HA mode. +This means we need to have three instances running that know eachother. + +Again, this is not a production ready setup, as the static nature prevents automatic recovery if a node fails. + +## Steps to take + +* download & install etcd binary +* prepare required certificates +* create `systemd` service definition +* reload `systemd` configuration, enable & start the service + +### Install script + +Make sure that the local install script is on every server, you can use the `etcd.sh` script for this. + +Then, make sure you're connect to all three controller VM's at the same time, for example via tmux or iterm. +For iterm: + +* use `ctrl` + `shift` + `d` to open three horizontal windows +* use `ctrl` + `shift` + `i` to write output to all three windows at once +* login to each controller `gcloud compute ssh controller-?` +* `./etcd-local.sh` + +## Verification + +```bash +sudo ETCDCTL_API=3 etcdctl member list \ + --endpoints=https://127.0.0.1:2379 \ + --cacert=/etc/etcd/ca.pem \ + --cert=/etc/etcd/kubernetes.pem \ + --key=/etc/etcd/kubernetes-key.pem +``` + +### Expected Output + +```bash +3a57933972cb5131, started, controller-2, https://10.240.0.12:2380, https://10.240.0.12:2379 +f98dc20bce6225a0, started, controller-0, https://10.240.0.10:2380, https://10.240.0.10:2379 +ffed16798470cab5, started, controller-1, https://10.240.0.11:2380, https://10.240.0.11:2379 +``` diff --git a/docs/kubernetes/khw-gce/index.md b/docs/kubernetes/khw-gce/index.md new file mode 100644 index 0000000..4de0008 --- /dev/null +++ b/docs/kubernetes/khw-gce/index.md @@ -0,0 +1,144 @@ +# Kubernetes the Hard Way - GCE + +This assumes OSX and GCE. + +## Goal + +The goal is to setup up HA Kubernetes cluster on GCE from it's most basic parts. +That means we will install and configure the basic components ourselves, such as the API server and Kubelets. + +## Setup + +As to limit the scope to doing the setup of the Kubernetes cluster ourselves, we will make it static. +That means we will create and configure the network and compute resources to be fit for 3 Control Plane VM's and 3 worker VM's. +We will not be able to recover a failing node or accomidate additional resources. + +### Resources in GCE + +* Public IP address, as front-end for the three API servers +* 3 VM's for the Control Plance +* 3 VM's as workers +* VPC +* Network Routes: from POD CIDR blocks to the host VM (for workers) +* Firewall configuration: allow health checks, dns, internal communication and connection to API server + +### Kubernetes Resources + +#### Control Plane + +* **etcd**: stores cluster state +* **kube-api server**: entry point for interacting with the cluster by exposing the api +* **kube-scheduler**: makes sure pods get scheduled +* **kube-controller-manager**: aggregate of required controllers + * **Node Controller**: > Responsible for noticing and responding when nodes go down. + * **Replication Controller**: > Responsible for maintaining the correct number of pods for every replication controller object in the system. + * **Endpoints Controller**: > Populates the Endpoints object (that is, joins Services & Pods). + * **Service Account & Token Controller**: > Create default accounts and API access tokens for new namespaces. + +#### Worker nodes + +* **kubelet**: > An agent that runs on each node in the cluster. It makes sure that containers are running in a pod. +* **kube-proxy**: > kube-proxy enables the Kubernetes service abstraction by maintaining network rules on the host and performing connection forwarding +* A container runtime: this can be `Docker`, `rkt` or as in our case `containerd` + +## Network + +* https://blog.csnet.me/k8s-thw/part1/ +* https://github.com/kelseyhightower/kubernetes-the-hard-way + +We will be using the network components - with Weave-Net and CoreDNS - as described in the csnet blog. +But we will use the CIDR blocks as stated in the Kelsey Hightower's Kubernetes the Hard Way (`KHW`). + +### Kelsey's KHW + + +| Range | Use | +|-------------- |------------------- | +|10.240.0.10/24 | LAN (GCE VMS) | +|10.200.0.0/16 | k8s Pod network | +|10.32.0.0/24 | k8s Service network| +|10.32.0.1 | k8s API server | +|10.32.0.10 | k8s dns | + +* API Server: https://127.0.0.1:6443 +* service-cluster-ip-range=10.32.0.0/24 +* cluster-cidr=10.200.0.0/1 + + +### CSNETs + +| Range | Use | +|-------------- |------------------- | +|10.32.2.0/24 | LAN (csnet.me) | +|10.16.0.0/16 | k8s Pod network | +|10.10.0.0/22 | k8s Service network| +|10.10.0.1 | k8s API server | +|10.10.0.10 | k8s dns | + +* API Server: https://10.32.2.97:6443 +* service-cluster-ip-range=10.10.0.0/22 +* cluster-cidr=10.16.0.0/16 + + +## Install tools + +On the machine doing the installation, we will need some tools installed. +We will use the following tools: + +* **kubectl**: for communicating with the API server +* **cfssl**: for creating the certificates and sign them +* **helm**: for installing additional tools later +* **stern**: for viewing logs of multiple pods at once (for example, all kube-dns pods) +* **terraform**: for managing our resources in GCE + +```bash +brew install kubernetes-cli +brew install cfssl +brew install kubernetes-helm +brew install stern +brew install terraform +``` + +### Check versions + +```bash +kubectl version -c -o yaml +cfssl version +helm version -c --short +stern --version +terraform version +``` + +### Terraform remote storage + +The help with problems of local storage and potential loss of data when local OS problems occur, +we will use an S3 bucket as Terraform state storage. + +* create s3 bucket +* configure Terraform to use this as remote state storage +* see how to this [here](https://medium.com/@jessgreb01/how-to-terraform-locking-state-in-s3-2dc9a5665cb6) +* read more about this, in [Terraform's docs](https://www.terraform.io/docs/backends/types/s3.html) + +```bash +export AWS_ACCESS_KEY_ID="anaccesskey" +export AWS_SECRET_ACCESS_KEY="asecretkey" +export AWS_DEFAULT_REGION="eu-central-1" +``` + +```terraform +terraform { + backend "s3" { + bucket = "euros-terraform-state" + key = "terraform.tfstate" + region = "eu-central-1" + encrypt = "true" + } +} + +``` + +## GKE Service Account + +Create a new GKE service account, and export it's json credentials file for use with Terraform. + +See [GKE Tutorial page](https://cloud.google.com/docs/authentication/production) for how you can do this. \ No newline at end of file diff --git a/docs/kubernetes/khw-gce/kubeconfigs.md b/docs/kubernetes/khw-gce/kubeconfigs.md new file mode 100644 index 0000000..f6d2c29 --- /dev/null +++ b/docs/kubernetes/khw-gce/kubeconfigs.md @@ -0,0 +1,53 @@ +# Kubeconfigs + +Now that we have certificates we have to make sure we have configurations that the Kubernetes parts can actually use - certificates themselves are not enough. + +This is where we will use kubernetes configuration files, or `kubeconfigs`. + +We will have to create the following `kubeconfigs`: + +* controller manager +* kubelet +* kube-proxy +* kube-scheduler +* admin user + +## Create & Test kubeconfig file + +Here's an example script: + +```bash +kubectl config set-cluster kubernetes-the-hard-way \ + --certificate-authority=ca.pem \ + --embed-certs=true \ + --server=https://127.0.0.1:6443 \ + --kubeconfig=kube-controller-manager.kubeconfig + +kubectl config set-credentials system:kube-controller-manager \ + --client-certificate=kube-controller-manager.pem \ + --client-key=kube-controller-manager-key.pem \ + --embed-certs=true \ + --kubeconfig=kube-controller-manager.kubeconfig + +kubectl config set-context default \ + --cluster=kubernetes-the-hard-way \ + --user=system:kube-controller-manager \ + --kubeconfig=kube-controller-manager.kubeconfig + +kubectl config use-context default --kubeconfig=kube-controller-manager.kubeconfig +``` + +The steps we execute in order are the following: + +* create a kubeconfig entry for our `kubernetes-the-hard-way` cluster and export this into a `.kubeconfig` file +* add credentials to this config file, in the form of our kubernetes component's certificate +* set the default config of this config file to namespace `default` and user to the component we're configuring +* test the configuration file by using it + +## Install scripts + +Make sure you're in `k8s-the-hard-way/scripts` + +```bash +./kube-configs.sh +``` diff --git a/docs/kubernetes/khw-gce/network.md b/docs/kubernetes/khw-gce/network.md new file mode 100644 index 0000000..a1f8e04 --- /dev/null +++ b/docs/kubernetes/khw-gce/network.md @@ -0,0 +1,75 @@ +# Networking + +First, [configure external access](#remote-access) so we can run `kubectl` commands from our own machine. + +Confirm the you can now call the following: + +```bash +kubectl get nodes -o wide +``` + +## Configure WeaveNet + +```bash +kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=10.200.0.0/16" +``` + +### Confirm WeaveNet works + +```bash +kubectl get pod --namespace=kube-system -l name=weave-net +``` + +It should look like this: + +```bash +NAME READY STATUS RESTARTS AGE +weave-net-fwvsr 2/2 Running 1 4h +weave-net-v9z9n 2/2 Running 1 4h +weave-net-zfghq 2/2 Running 1 4h +``` + +## Configure CoreDNS + +Before installing `CoreDNS`, please confirm networking is in order. + +```bash +kubectl get nodes -o wide +``` + +!!! warning + If nodes are not `Ready`, something is wrong and needs to be fixed before you continue. + +```bash +kubectl apply -f ../configs/core-dns-config.yaml +``` + +### Confirm CoreDNS pods + +```bash +kubectl get pod --all-namespaces -l k8s-app=coredns -o wide +``` + +## Confirm DNS works + +```bash +kubectl run busybox --image=busybox:1.28 --command -- sleep 3600 +``` + +```bash +POD_NAME=$(kubectl get pods -l run=busybox -o jsonpath="{.items[0].metadata.name}") +``` + +```bash +kubectl exec -ti $POD_NAME -- nslookup kubernetes +``` + +!!! note + It should look like this: + ```bash + Server: 10.10.0.10 + Address 1: 10.10.0.10 kube-dns.kube-system.svc.cluster.local + + Name: kubernetes + Address 1: 10.10.0.1 kubernetes.default.svc.cluster.local + ``` \ No newline at end of file diff --git a/docs/kubernetes/khw-gce/remote-access.md b/docs/kubernetes/khw-gce/remote-access.md new file mode 100644 index 0000000..170dc78 --- /dev/null +++ b/docs/kubernetes/khw-gce/remote-access.md @@ -0,0 +1,29 @@ +# Remote Access + +```bash +KUBERNETES_PUBLIC_ADDRESS=$(gcloud compute addresses describe kubernetes-the-hard-way \ + --region $(gcloud config get-value compute/region) \ + --format 'value(address)') +echo "KUBERNETES_PUBLIC_ADDRESS=${KUBERNETES_PUBLIC_ADDRESS}" + +kubectl config set-cluster kubernetes-the-hard-way \ + --certificate-authority=ca.pem \ + --embed-certs=true \ + --server=https://${KUBERNETES_PUBLIC_ADDRESS}:6443 + +kubectl config set-credentials admin \ + --client-certificate=admin.pem \ + --client-key=admin-key.pem + +kubectl config set-context kubernetes-the-hard-way \ + --cluster=kubernetes-the-hard-way \ + --user=admin + +kubectl config use-context kubernetes-the-hard-way +``` + +## Confirm + +```bash +kubectl get nodes -o wide +``` \ No newline at end of file diff --git a/docs/kubernetes/khw-gce/terraform-compute.md b/docs/kubernetes/khw-gce/terraform-compute.md new file mode 100644 index 0000000..c1c91b1 --- /dev/null +++ b/docs/kubernetes/khw-gce/terraform-compute.md @@ -0,0 +1,399 @@ +# Compute resources + +## Create network + +### VPC with Firewall rules + +```terraform +provider "google" { + credentials = "${file("${var.credentials_file_path}")}" + project = "${var.project_name}" + region = "${var.region}" +} + +resource "google_compute_network" "khw" { + name = "kubernetes-the-hard-way" + auto_create_subnetworks = "false" +} + +resource "google_compute_subnetwork" "khw-kubernetes" { + name = "kubernetes" + ip_cidr_range = "10.240.0.0/24" + region = "${var.region}" + network = "${google_compute_network.khw.self_link}" +} + +resource "google_compute_firewall" "khw-allow-internal" { + name = "kubernetes-the-hard-way-allow-internal" + network = "${google_compute_network.khw.name}" + + source_ranges = ["10.240.0.0/24", "10.200.0.0/16"] + + allow { + protocol = "tcp" + } + + allow { + protocol = "udp" + } + + allow { + protocol = "icmp" + } +} + +resource "google_compute_firewall" "khw-allow-external" { + name = "kubernetes-the-hard-way-allow-external" + network = "${google_compute_network.khw.name}" + + allow { + protocol = "icmp" + } + + allow { + protocol = "tcp" + ports = ["22", "6443"] + } + + source_ranges = ["0.0.0.0/0"] +} + +resource "google_compute_firewall" "khw-allow-dns" { + name = "kubernetes-the-hard-way-allow-dns" + network = "${google_compute_network.khw.name}" + + source_ranges = ["0.0.0.0"] + + allow { + protocol = "tcp" + ports = ["53", "443"] + } + + allow { + protocol = "udp" + ports = ["53"] + } +} + +resource "google_compute_firewall" "khw-allow-health-check" { + name = "kubernetes-the-hard-way-allow-health-check" + network = "${google_compute_network.khw.name}" + + allow { + protocol = "tcp" + } + + source_ranges = ["209.85.152.0/22", "209.85.204.0/22", "35.191.0.0/16"] +} +``` + +### Confirm network + +```bash +gcloud compute firewall-rules list --filter="network:kubernetes-the-hard-way" +``` + +Should look like: + +```bash +NAME NETWORK DIRECTION PRIORITY ALLOW DENY +kubernetes-the-hard-way-allow-external kubernetes-the-hard-way INGRESS 1000 icmp,tcp:22,tcp:6443 +kubernetes-the-hard-way-allow-internal kubernetes-the-hard-way INGRESS 1000 icmp,udp,tcp +``` + +## Public IP + +```json +resource "google_compute_address" "khw-lb-public-ip" { + name = "kubernetes-the-hard-way" +} +``` + +Confirm: + +```bash +gcloud compute addresses list --filter="name=('kubernetes-the-hard-way')" +``` + +Output: + +```bash +NAME REGION ADDRESS STATUS +kubernetes-the-hard-way europe-west4 35.204.134.219 RESERVED +``` + +## VM Definitions with Terraform modules + +We're going to need to create 6 VM's. 3 Controller nodes and 3 worker nodes. + +Within each of the two categories, all the three VM's will be the same. +So it would be a waste to define them more than once. +This can be achieved via Terraform's [Module system](https://www.terraform.io/docs/modules/usage.html)(read more [here](https://blog.gruntwork.io/how-to-create-reusable-infrastructure-with-terraform-modules-25526d65f73d). + +### Define a module + +For the sake of naming convention, we'll put all of our `modules` in a *modules* subfolder. +We'll start with the controller module, but you can do the same for the worker. + +```bash +mkdir -p modules/controller +``` + +```bash hl_lines="4" +ls -lath +drwxr-xr-x 27 joostvdg staff 864B Aug 26 12:50 . +drwxr-xr-x 20 joostvdg staff 640B Aug 22 14:47 .. +drwxr-xr-x 4 joostvdg staff 128B Aug 7 22:43 modules +``` + +```bash +ls -lath modules +drwxr-xr-x 27 joostvdg staff 864B Aug 26 12:50 .. +drwxr-xr-x 4 joostvdg staff 128B Aug 7 22:43 . +drwxr-xr-x 4 joostvdg staff 128B Aug 7 22:03 controller +``` + +Inside `modules/controller` we create two files, `main.tf` and `variables.tf`. +We have to create an additional variables file, as the module cannot use the main folder's variables. + +Then, in our main folder we'll create a tf file for using these modules, called `nodes.tf`. +As stated above, we pass along any variable from our main `variables.tf` to the module. + +```terraform +module "controller" { + source = "modules/controller" + machine_type = "${var.machine_type_controllers}" + num = "${var.num_controllers}" + zone = "${var.region_default_zone}" + subnet = "${var.subnet_name}" +} + +module "worker" { + source = "modules/worker" + machine_type = "${var.machine_type_workers}" + num = "${var.num_workers}" + zone = "${var.region_default_zone}" + network = "${google_compute_network.khw.name}" + subnet = "${var.subnet_name}" +} +``` + +### Controller config + +```terraform +data "google_compute_image" "khw-ubuntu" { + family = "ubuntu-1804-lts" + project = "ubuntu-os-cloud" +} + +resource "google_compute_instance" "khw-controller" { + count = "${var.num}" + name = "controller-${count.index}" + machine_type = "${var.machine_type}" + zone = "${var.zone}" + can_ip_forward = "true" + + tags = ["kubernetes-the-hard-way", "controller"] + + boot_disk { + initialize_params { + image = "${data.google_compute_image.khw-ubuntu.self_link}" + size = 200 // in GB + } + } + + network_interface { + subnetwork = "${var.subnet}" + address = "10.240.0.1${count.index}" + + access_config { + // Ephemeral External IP + } + } + + # compute-rw,storage-ro,service-management,service-control,logging-write,monitoring + service_account { + scopes = ["compute-rw", + "storage-ro", + "service-management", + "service-control", + "logging-write", + "monitoring", + ] + } +} +``` + +#### Variables + +```terraform +variable "num" { + description = "The number of controller VMs" +} + +variable "machine_type" { + description = "The type of VM for controllers" +} + +variable "zone" { + description = "The zone to create the controllers in" +} + +variable "subnet" { + description = "The subnet to create the nic in" +} + +``` + +### Worker config + +Extra config for the worker are the routes, to aid the pods going out of the node. + +```terraform +data "google_compute_image" "khw-ubuntu" { + family = "ubuntu-1804-lts" + project = "ubuntu-os-cloud" +} + +resource "google_compute_instance" "khw-worker" { + count = "${var.num}" + name = "worker-${count.index}" + machine_type = "${var.machine_type}" + zone = "${var.zone}" + can_ip_forward = "true" + + tags = ["kubernetes-the-hard-way", "worker"] + + metadata { + pod-cidr = "10.200.${count.index}.0/24" + } + + boot_disk { + initialize_params { + image = "${data.google_compute_image.khw-ubuntu.self_link}" + size = 200 // in GB + } + } + + network_interface { + subnetwork = "${var.subnet}" + address = "10.240.0.2${count.index}" + + access_config { + // Ephemeral External IP + } + } + + service_account { + scopes = ["compute-rw", + "storage-ro", + "service-management", + "service-control", + "logging-write", + "monitoring", + ] + } +} + +resource "google_compute_route" "khw-worker-route" { + count = "${var.num}" + name = "kubernetes-route-10-200-${count.index}-0-24" + network = "${var.network}" + next_hop_ip = "10.240.0.2${count.index}" + dest_range = "10.200.${count.index}.0/24" +} +``` + +#### Variables + +```terraform +variable "num" { + description = "The number of controller VMs" +} + +variable "machine_type" { + description = "The type of VM for controllers" +} + +variable "zone" { + description = "The zone to create the controllers in" +} + +variable "network" { + description = "The network to use for routes" +} + +variable "subnet" { + description = "The subnet to create the nic in" +} +``` + +### Health check + +Because we will have three controllers, we have to make sure that GKE forwards Kubernetes API requests to each of them via our public IP address. + +We do this via a http health check, wich involves a forwarding rule and a target pool. +Target pool being the group of controller VM's for which the forwarding rule is active. + +```terraform +resource "google_compute_target_pool" "khw-hc-target-pool" { + name = "instance-pool" + + # TODO: fixed set for now, maybe we can make this dynamic some day + instances = [ + "${var.region_default_zone}/controller-0", + "${var.region_default_zone}/controller-1", + "${var.region_default_zone}/controller-2", + ] + + health_checks = [ + "${google_compute_http_health_check.khw-health-check.name}", + ] +} + +resource "google_compute_http_health_check" "khw-health-check" { + name = "kubernetes" + request_path = "/healthz" + description = "The health check for Kubernetes API server" + host = "${var.kubernetes-cluster-dns}" +} + +resource "google_compute_forwarding_rule" "khw-hc-forward" { + name = "kubernetes-forwarding-rule" + target = "${google_compute_target_pool.khw-hc-target-pool.self_link}" + region = "${var.region}" + port_range = "6443" + ip_address = "${google_compute_address.khw-lb-public-ip.self_link}" +} +``` + +## Apply Terraform state + +In the end, our configuration should consist out of several `.tf` files and look something like this. + +```bash +ls -lath +drwxr-xr-x 27 joostvdg staff 864B Aug 26 12:50 . +drwxr-xr-x 20 joostvdg staff 640B Aug 22 14:47 .. +drwxr-xr-x 4 joostvdg staff 128B Aug 7 22:43 modules +-rw-r--r-- 1 joostvdg staff 1.5K Aug 26 12:50 variables.tf +-rw-r--r-- 1 joostvdg staff 1.3K Aug 17 16:03 firewall.tf +-rw-r--r-- 1 joostvdg staff 4.4K Aug 17 12:06 worker-config.md +-rw-r--r-- 1 joostvdg staff 1.6K Aug 17 09:35 healthcheck.tf +-rw-r--r-- 1 joostvdg staff 517B Aug 16 17:09 nodes.tf +-rw-r--r-- 1 joostvdg staff 92B Aug 16 13:52 publicip.tf +-rw-r--r-- 1 joostvdg staff 365B Aug 7 22:07 vpc.tf +-rw-r--r-- 1 joostvdg staff 189B Aug 7 16:51 base.tf +drwxr-xr-x 5 joostvdg staff 160B Aug 7 21:52 .terraform +-rw-r--r-- 1 joostvdg staff 0B Aug 7 18:28 terraform.tfstate +``` + +We're now going to `plan` and then `apply` our Terraform configuration to create the resources in GCE. + +```bash +terraform plan +``` + +```bash +terraform apply +``` \ No newline at end of file diff --git a/docs/kubernetes/khw-worker.md b/docs/kubernetes/khw-gce/worker.md similarity index 79% rename from docs/kubernetes/khw-worker.md rename to docs/kubernetes/khw-gce/worker.md index 3b14bf3..20fc3ac 100644 --- a/docs/kubernetes/khw-worker.md +++ b/docs/kubernetes/khw-gce/worker.md @@ -237,78 +237,3 @@ gcloud compute ssh controller-0 --command "kubectl get nodes --kubeconfig admin. !!! note As we didn't configure networking yet, the nodes should be shown as `NotReady` status. -## Networking - -First, [configure external access]() so we can run `kubectl` commands from our own machine. - -Confirm the you can now call the following: - -```bash -kubectl get nodes -o wide -``` - -### Configure WeaveNet - -```bash -kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=10.200.0.0/16" -``` - -#### Confirm WeaveNet works - -```bash -kubectl get pod --namespace=kube-system -l name=weave-net -``` - -It should look like this: - -```bash -NAME READY STATUS RESTARTS AGE -weave-net-fwvsr 2/2 Running 1 4h -weave-net-v9z9n 2/2 Running 1 4h -weave-net-zfghq 2/2 Running 1 4h -``` - -### Configure CoreDNS - -Before installing `CoreDNS`, please confirm networking is in order. - -```bash -kubectl get nodes -o wide -``` - -!!! warning - If nodes are not `Ready`, something is wrong and needs to be fixed before you continue. - -```bash -kubectl apply -f https://raw.githubusercontent.com/mch1307/k8s-thw/master/coredns.yaml -``` - -#### Confirm CoreDNS pods - -```bash -kubectl get pod --all-namespaces -l k8s-app=coredns -o wide -``` - -### Confirm DNS works - -```bash -kubectl run busybox --image=busybox --command -- sleep 3600 -``` - -```bash -POD_NAME=$(kubectl get pods -l run=busybox -o jsonpath="{.items[0].metadata.name}") -``` - -```bash -kubectl exec -ti $POD_NAME -- nslookup kubernetes -``` - -!!! note - It should look like this: - ```bash - Server: 10.10.0.10 - Address 1: 10.10.0.10 kube-dns.kube-system.svc.cluster.local - - Name: kubernetes - Address 1: 10.10.0.1 kubernetes.default.svc.cluster.local - ``` \ No newline at end of file diff --git a/docs/kubernetes/the-hard-way.md b/docs/kubernetes/the-hard-way.md deleted file mode 100644 index 634dd12..0000000 --- a/docs/kubernetes/the-hard-way.md +++ /dev/null @@ -1,176 +0,0 @@ -# Kubernetes the Hard Way - -This assumes OSX and GKE. - -## Network - -* https://blog.csnet.me/k8s-thw/part1/ -* https://github.com/kelseyhightower/kubernetes-the-hard-way - - -### Kelseys - -| Range | Use | -|10.240.0.10/24 |LAN (GCE VMS) | -|10.200.0.0/16 |k8s Pod network| -|10.32.0.0/24 |k8s Service network| -|10.32.0.1 |k8s API server | -|10.32.0.10 |k8s dns | - -* API Server: https://127.0.0.1:6443 -* service-cluster-ip-range=10.32.0.0/24 -* cluster-cidr=10.200.0.0/1 - - -### CSNETs - -| Range | Use | -|10.32.2.0/24 |LAN (csnet.me) | -|10.16.0.0/16 |k8s Pod network| -|10.10.0.0/22 |k8s Service network| -|10.10.0.1 |k8s API server | -|10.10.0.10 |k8s dns | - -* API Server: https://10.32.2.97:6443 -* service-cluster-ip-range=10.10.0.0/22 -* cluster-cidr=10.16.0.0/16 - - -## Install tools - -```bash -brew install kubernetes-cli -brew install cfssl -brew install kubernetes-helm -brew install stern -brew install terraform -``` - -### Check versions - -```bash -kubectl version -c -o yaml -cfssl version -helm version -c --short -stern --version -terraform version -``` - -### Terraform remote storage - -* create s3 bucket -* configure terraform to use this as remote state storage - -```bash -export AWS_ACCESS_KEY_ID="anaccesskey" -export AWS_SECRET_ACCESS_KEY="asecretkey" -export AWS_DEFAULT_REGION="eu-central-1" -``` - -```terraform -terraform { - backend "s3" { - bucket = "euros-terraform-state" - key = "terraform.tfstate" - region = "eu-central-1" - encrypt = "true" - } -} - -``` - -## Compute resources - -### Create network - -#### VPC with Firewall rules - -```terraform -provider "google" { - credentials = "${file("${var.credentials_file_path}")}" - project = "${var.project_name}" - region = "${var.region}" -} - -resource "google_compute_network" "khw" { - name = "kubernetes-the-hard-way" - auto_create_subnetworks = "false" -} - -resource "google_compute_subnetwork" "khw-kubernetes" { - name = "kubernetes" - ip_cidr_range = "10.240.0.0/24" - region = "${var.region}" - network = "${google_compute_network.khw.self_link}" -} - -resource "google_compute_firewall" "khw-allow-internal" { - name = "kubernetes-the-hard-way-allow-internal" - network = "${google_compute_network.khw.name}" - - source_ranges = ["10.240.0.0/24", "10.200.0.0/16"] - - allow { - protocol = "tcp" - } - - allow { - protocol = "udp" - } - - allow { - protocol = "icmp" - } -} - -resource "google_compute_firewall" "khw-allow-external" { - name = "kubernetes-the-hard-way-allow-external" - network = "${google_compute_network.khw.name}" - - allow { - protocol = "icmp" - } - - allow { - protocol = "tcp" - ports = ["22", "6443"] - } - - source_ranges = ["0.0.0.0/0"] -} -``` - -#### Confirm network - -```bash -gcloud compute firewall-rules list --filter="network:kubernetes-the-hard-way" -``` - -Should look like: - -```bash -NAME NETWORK DIRECTION PRIORITY ALLOW DENY -kubernetes-the-hard-way-allow-external kubernetes-the-hard-way INGRESS 1000 icmp,tcp:22,tcp:6443 -kubernetes-the-hard-way-allow-internal kubernetes-the-hard-way INGRESS 1000 icmp,udp,tcp -``` - -### Public IP - -```json -resource "google_compute_address" "khw-lb-public-ip" { - name = "kubernetes-the-hard-way" -} -``` - -Confirm: - -```bash -gcloud compute addresses list --filter="name=('kubernetes-the-hard-way')" -``` - -Output: - -```bash -NAME REGION ADDRESS STATUS -kubernetes-the-hard-way europe-west4 35.204.134.219 RESERVED -``` \ No newline at end of file diff --git a/docs/linux/systemd.md b/docs/linux/systemd.md index a78c69e..3e8fe1c 100644 --- a/docs/linux/systemd.md +++ b/docs/linux/systemd.md @@ -14,6 +14,38 @@ This powerful suite of software can manage many aspects of your server, These resources are defined using configuration files called **unit files**. [^1] +### Path + +> A path unit defines a filesystem `path` that `systmed` can monitor for changes. + Another unit must exist that will be be activated when certain activity is detected at the path location. + Path activity is determined through `inotify events`. + +My idea, you can use this for those services that should trigger on file uploads or backup dumps. +Although I wonder if the Unit's main service knows which path was triggered? +If it does, than it's easy, else you still need a "file walker". + +## Example + +```ini +[Unit] +Description=Timezone Helper Service +After=network.target +StartLimitIntervalSec=0 + +[Service] +Type=simple +Restart=always +RestartSec=3 +User=joostvdg +ExecStart=/usr/bin/timezone_helper_service + +[Install] +WantedBy=multi-user.target +``` + +## Resources + +* https://www.linuxjournal.com/content/linux-filesystem-events-inotify ## References [^1]: [Introduction to systemd from Digital Ocean](https://www.digitalocean.com/community/tutorials/understanding-systemd-units-and-unit-files) \ No newline at end of file diff --git a/docs/productivity/index.md b/docs/productivity/index.md index 4fbfa87..fcdd5ef 100644 --- a/docs/productivity/index.md +++ b/docs/productivity/index.md @@ -85,9 +85,22 @@ http://rstb.royalsocietypublishing.org/content/373/1753/20170239 * https://en.wikipedia.org/wiki/Horn_effect * https://en.wikipedia.org/wiki/Halo_effect * http://serendipstudio.org/bb/neuro/neuro02/web2/hhochman.html +* https://betterhumans.coach.me/how-to-be-a-better-manager-by-understanding-the-difference-between-market-norms-and-social-norms-3082d97d440f +* https://skillsmatter.com/skillscasts/10466-deep-dive-on-kubernetes-networking +* https://purplegriffon.com/blog/is-itil-agile-enough +* https://launchdarkly.com/blog/progressive-delivery-a-history-condensed/ +* http://www.collaborativefund.com/blog/real-world-vs-book-knowledge/ + +### Presentations + +* https://speakerdeck.com/tylertreat/the-future-of-ops + ### Articles +* http://blog.christianposta.com/microservices/application-safety-and-correctness-cannot-be-offloaded-to-istio-or-any-service-mesh/ +* https://www.gatesnotes.com/Books/Capitalism-Without-Capital?WT.mc_id=08_16_2018_06_CapitalismWithoutCapital_BG-LI_&WT.tsrc=BGLI&linkId=55623312 +* https://uxdesign.cc/stop-delivering-software-with-agile-it-doesn-t-work-edccea3ab5d3 * [Concept of Shared Services and beyond](https://medium.com/@mattklein123/the-human-scalability-of-devops-e36c37d3db6a) * [Introduction to Observability by Weave Net](https://www.weave.works/technologies/monitoring-kubernetes-with-prometheus/#observability-vs-monitoring) * [Article on the state of Systems Languages](https://blog.usejournal.com/systems-languages-an-experience-report-d008b2b12628) diff --git a/mkdocs.yml b/mkdocs.yml index 727a3ef..b4bfe53 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -32,14 +32,26 @@ pages: - Networking: linux/networking.md - iptables: linux/iptables.md +- Blogs: + - Docker Graceful Shutdown: blogs/docker-graceful-shutdown.md + - Kubernetes: - Introduction: kubernetes/index.md - CKA Exam details: kubernetes/cka-exam.md - CKA Exam prep: kubernetes/cka-exam-prep.md - GKE Installation: kubernetes/install-gke.md - - KHW - GKE Terraform infra : kubernetes/the-hard-way.md - - KHW - GKE Workers: kubernetes/khw-worker.md - - KHW - GKE Controller: kubernetes/khw-controller.md + +- K8S Hard Way - GCE: + - Installer Preparation: kubernetes/khw-gce/index.md + - Create GCE resources: kubernetes/khw-gce/terraform-compute.md + - Prepare Certificates: kubernetes/khw-gce/certificates.md + - Prepare Kubeconfigs: kubernetes/khw-gce/kubeconfigs.md + - Encryption configuration: kubernetes/khw-gce/encryption.md + - ETCD configuration: kubernetes/khw-gce/etcd.md + - Controller Config: kubernetes/khw-gce/controller.md + - Worker Config: kubernetes/khw-gce/worker.md + - Remote access: kubernetes/khw-gce/remote-access.md + - Network config: kubernetes/khw-gce/network.md - Software Engineering: - Naming: swe/naming.md From 33c068f25aa234a58f19d074bb5b8b2b0a3f3513 Mon Sep 17 00:00:00 2001 From: Joost van der Griendt Date: Tue, 28 Aug 2018 12:55:34 +0200 Subject: [PATCH 08/10] Improved controller config. Add info about debugging.: --- docs/kubernetes/khw-gce/controller.md | 136 ++++---------------- docs/kubernetes/khw-gce/debug.md | 178 ++++++++++++++++++++++++++ mkdocs.yml | 1 + 3 files changed, 205 insertions(+), 110 deletions(-) create mode 100644 docs/kubernetes/khw-gce/debug.md diff --git a/docs/kubernetes/khw-gce/controller.md b/docs/kubernetes/khw-gce/controller.md index 9152311..cbbcb15 100644 --- a/docs/kubernetes/khw-gce/controller.md +++ b/docs/kubernetes/khw-gce/controller.md @@ -1,115 +1,31 @@ # Controller Config -## Configure API Server - -### Prepare folders - -```bash -sudo mkdir -p /var/lib/kubernetes/ - -sudo mv ca.pem ca-key.pem kubernetes-key.pem kubernetes.pem \ - service-account-key.pem service-account.pem \ - encryption-config.yaml /var/lib/kubernetes/ -``` - -### Get internal IP - -```bash -INTERNAL_IP=$(curl -s -H "Metadata-Flavor: Google" \ - http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip) -echo INTERNAL_IP=$INTERNAL_IP -``` - -### Configure SystemD service - -```ini -cat <` leaves the pods. +The pods are terminated (they never started) but are not removed. + +To remove them, use the line below, as explained [on stackoverflow](https://stackoverflow.com/questions/35453792/pods-stuck-at-terminating-status). + +```bash +kubectl delete pod NAME --grace-period=0 --force +``` + +### Restart Kubelet + +I'm not sure if this is 100% required, but I've had better luck with restarting the kubelet before reinstalling weave-net. + +So, login to each worker node, `gcloud compute ssh worker-?` and issue the following commands. + +```bash +sudo systemctl daemon-reload +sudo systemctl restart kubelet +``` + +## DNS on GCE not working + +It seemed something has changed in GCE after Kelsey Hightower's [Kubernetes The Hardway](https://github.com/kelseyhightower/kubernetes-the-hard-way/) was written/updated. + +This means that if you follow through the documentation, you will run into this: + +```bash +kubectl exec -ti $POD_NAME -- nslookup kubernetes +;; connection timed out; no servers could be reached + +command terminated with exit code 1 +``` + +The cure seems to be to add additional `resolve.conf` file configuration to the kubelet's systemd service definition. + +```ini hl_lines="13" +cat < Date: Sat, 8 Sep 2018 13:21:37 +0200 Subject: [PATCH 09/10] Update blog based on feedback viktor --- docs/blogs/docker-graceful-shutdown.md | 151 +++++++++++++------------ 1 file changed, 79 insertions(+), 72 deletions(-) diff --git a/docs/blogs/docker-graceful-shutdown.md b/docs/blogs/docker-graceful-shutdown.md index eb73daa..bb14a15 100644 --- a/docs/blogs/docker-graceful-shutdown.md +++ b/docs/blogs/docker-graceful-shutdown.md @@ -1,66 +1,74 @@ -# Graceful shutdown +# Gracefully Shutting Down Applications in Docker -> We can speak about the graceful shutdown of our application, when all of the resources it used and all of the traffic and/or data processing what it handled are closed and released properly. It means that no database connection remains open and no ongoing request fails because we stop our application. - [Péter Márton](https://blog.risingstack.com/graceful-shutdown-node-js-kubernetes/) +I'm not sure about you, but I like it when my neighbors leave our shared spaces clean and don't take up parking spaces when they don't need. -As I could not have done it better myself, I've quoted Péter Márton. +Imagine you live in an apartment complex with the above-mentioned parking lot. Some tenants go away and never come back. If nothing is done to clean up after them - to reclaim their apartment and parking space - then after some time, more and more apartments are unavailable for no reason, the parking lot fills up with cars which belong to no one. -I think we can say that cleaning up your mess and informing people of your impending departure is a good thing. Many programming languages and frameworks have hooks for listening to signals - which we explore later - allowing you to handle a shutdown, expected or not. +Some tenants did not get a parking lot and are getting frustrated that none are opening up. When they moved in, they were told when others leave, they would be next in line. While they're waiting, they parked outside the complex. Eventually, the entrance is blocked and no one can enter or leave. The end result is a completely unlivable apartment block with trapped tenants - never to be seen or heard. -When we have resources open, such as files, database connections, background processes and others. It would be best for ourselves, but also for our environment to clean those up before exiting. This cleanup would constitute a graceful shutdown. +If you agree with me that if a tenant leaves, the tenant should clean the apartment and free the parking spot to make it ready for the next inhabitant; then please read on. We're going to dive into the equivalent of doing this with containers. -We're going to dive into this subject, exploring several complimentary topics that together should help improve your (Docker) application's ability to gracefully shutdown. - -* The case for graceful shutdown -* How to run processes in Docker -* Process management -* Signals management +We will explore running our container with Docker (run, compose, swarm) and Kubernetes. +Even if you use another way to run your containers, this article should provide you with enough insight to get you on your way. ## The case for graceful shutdown We're in an age where many applications are running in Docker containers across a multitude of clusters and (potentially) different orchestrators. These bring with it, other concerns to tackle, such as logging, monitoring, tracing and many more. One significant way we defend ourselves against the perils of distributed nature of these clusters is to make our applications more resilient. -However, there is still no guarantee your application is always up and running. So another concern we should tackle is how it responds when it does fail, including it being told to stop by the orchestrator. Now, this can happen for a variety of reasons, for example; your application's health check fails or your application consumed more resources than allowed. +However, there is still no guarantee your application is always up and running. So another concern we should tackle is how it responds when it needs to shut down. Where we can differentiate between an unexpected shutdown - we crashed - or an expected shutdown. + +Shutting down can happen for a variety of reasons, in this post we dive into how to deal with an expected shutdown such as it being told to stop by an orchestrator such as Kubernetes. + +This can happen for several reasons, including but limited too: + +* your application's health check fails +* your application consumed more resources than allowed +* the application is scaling down +* and more + +Not only does this increase the reliability of your application, but it also increases that of the cluster it lives in. As you can not always know in advance where your application runs, you might not even be the one putting it in a docker container, make sure your application knows how to quit! + +Graceful shutdown is not unique to Docker, as it permeates Linux's best practices for quite some years before Docker's existence. However, applying them to Docker container adds extra dimensions. + +## Start Good So You Can End Well -Not only does this increase the reliability of your application, but it also increases the reliability of the cluster it lives in. As you can not always know in advance where your application is run, you might not even be the one putting it in a docker container, make sure your application knows how to quit! +When you sign up for an apartment, you probably have to sign a contract detailing your rights and obligations. The more you state explicitly, the easier it is to deal with bad behaving neighbors. This holds the same when running a process; we should make sure we set the rules, obligations, and expectations from the start. -## How to run processes in Docker +As we say in Dutch: a good beginning is half the work. We will start with how you can run a process in a container that is beneficial to Graceful Shutdown. -There are many ways to run a process in Docker. I prefer to make things easy to understand and easy to know what to expect. So this article deals with processes started by commands in a Dockerfile. +There are many ways to start a process in a Docker container. I prefer to make things easy to understand and easy to know what to expect. So this article deals with processes started by commands in a Dockerfile. There are several ways to run a command in a Dockerfile. -These are: +These are as follows: -* **RUN**: runs a command during the docker build phase * **CMD**: runs a command when the container gets started -* **ENTRYPOINT**: provides the location from where commands get run when the container starts +* **ENTRYPOINT**: provides the location (entrypoint) from where commands get run when the container starts + You need at least one ENTRYPOINT or CMD in a Dockerfile for it to be valid. They can be used in collaboration but they can do similar things. You can put these commands in both a shell form and an exec form. For more information on these commands, you should check out [Docker's docs on Entrypoint vs. CMD](https://docs.docker.com/engine/reference/builder/#exec-form-entrypoint-example). -In summary, the shell form runs the command as a shell command and spawn a process via /bin/sh -c. - -Whereas the exec form executes a child process that is still attached to PID1. - -We'll show you what that looks like, borrowing the Docker docs example referred to earlier. ### Docker Shell form example -Create the following Dockerfile: +We start with the shell form and see if it can do what we want; begin in such a way, we can stop it nicely. + +We create the following Dockerfile: ```dockerfile FROM ubuntu:18.04 ENTRYPOINT top -b ``` -Then build and run it: +Then build and run it. ```bash docker image build --tag shell-form . docker run --name shell-form --rm shell-form ``` -This should yield the following: +This yields the following output. ```bash top - 16:34:56 up 1 day, 5:15, 0 users, load average: 0.00, 0.00, 0.00 @@ -74,19 +82,20 @@ KiB Swap: 1048572 total, 1042292 free, 6280 used. 1579380 avail Mem 6 root 20 0 36480 2928 2580 R 0.0 0.1 0:00.01 top ``` -As you can see, two processes are running, **sh** and **top**. -Meaning, that killing the process, with *ctrl+c* for example, terminates the **sh** process, but not **top**. +As you can see, two processes are running, **sh** and **top**. +Meaning, that killing the process, with *ctrl+c* for example, terminates the **sh** process, but not **top**. To kill this container, open a second terminal and execute the following command. ```bash docker rm -f shell-form ``` -As you can imagine, this is usually not what you want. -So as a general rule, you should never use the shell form. So on to the exec form we go! +Shell form doesn't do what we need. Starting a process with shell form will only lead us to the disaster of parking lots filling up unless there's a someone actively cleaning up. ### Docker exec form example +This leads us to the exec form. Hopefully, this gets us somewhere. + The exec form is written as an array of parameters: `ENTRYPOINT ["top", "-b"]` To continue in the same line of examples, we will create a Dockerfile, build and run it. @@ -96,14 +105,14 @@ FROM ubuntu:18.04 ENTRYPOINT ["top", "-b"] ``` -Then build and run it: +Then build and run it. ```bash docker image build --tag exec-form . docker run --name exec-form --rm exec-form ``` -This should yield the following: +This yields the following output. ```bash top - 18:12:30 up 1 day, 6:53, 0 users, load average: 0.00, 0.00, 0.00 @@ -116,7 +125,13 @@ KiB Swap: 1048572 total, 1042292 free, 6280 used. 1574880 avail Mem 1 root 20 0 36480 2940 2584 R 0.0 0.1 0:00.03 top ``` -### Docker exec form with parameters +Now we got something we can work with. If something would tell this Container to stop, it will tell our only running process so it is sure to reach the correct one! + +### Gotchas + +Knowing we can use the exec form for our goal - gracefully shutting down our container - we can move on to the next part of our efforts. For the sake of imparting you with some hard learned lessons, we will explore two gotchas. They're optional, so you can also choose to skip to *Make Sure Your Process Listens*. + +#### Docker exec form with parameters A caveat with the exec form is that it doesn't interpolate parameters. @@ -169,7 +184,7 @@ docker run --name exec-form --rm -e PARAM="help" exec-param Resulting in top's help string. -### The special case of Alpine +#### The special case of Alpine One of the main [best practices for Dockerfiles](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/), is to make them as small as possible. The easiest way to do this is to start with a minimal image. @@ -189,7 +204,7 @@ docker image build --tag exec-param . docker run --name exec-form --rm -e PARAM="help" exec-param ``` -It will result in the following output. +This yields the following output. ```bash Mem: 1509068K used, 537864K free, 640K shrd, 126756K buff, 1012436K cached @@ -203,28 +218,25 @@ Aside from **top**'s output looking a bit different, there is only one command. Alpine Linux helps us avoid the problem of shell form altogether! -## Process management - -Now that we know how to create a Dockerfile that helps us make sure we can run as PID1 so that we can make sure our process correctly responds to signals? - -We'll get into signal handling next, but first, let us explore how we can manage our process. -As you're used to by now, there are multiple solutions at our disposal. +## Make Sure Your Process Listens -We can broadly categorize them like this: +It is excellent if your tenants are all signed up, know their rights and obligations. +But you can't contact them when something happens, how will they ever know when to act? -* Process manages itself and it's children, by itself -* We let Docker manage the process, and it's children -* We use a process manager to do the work for us +Translating that into our process. It starts and can be told to shut down, but does it process listen? +Can it interpret the message it gets from Docker or Kubernetes? And if it does, can it relay the message correctly to its Child Processes? +In order for your process to gracefully shutdown, it should know when to do so. As such, it should listen not only for itself but also on behalf of its children - yours never do anything wrong though! -### Process manages itself +Some processes do, but many aren't designed to [listen](https://www.fpcomplete.com/blog/2016/10/docker-demons-pid1-orphans-zombies-signals) or tell [their Children](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem). They expect someone else to listen for them and tell them and their children - process managers. -Great, if this is the case, it saves you some trouble of relying on dependencies. -Unfortunately, not all processes are [designed for PID1](https://www.fpcomplete.com/blog/2016/10/docker-demons-pid1-orphans-zombies-signals), and some might be [prone to zombie processes regardless](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem). +In order to listen to these **signals**, we can call in the help of others. We will look at two options. -In those cases, you still have to invest some time and effort to get a solution in place. +* we let Docker manage the process and its children +* we use a process manager +### Let Docker manage it for us -### Docker manages PID1 +If you're not using Docker to run or manage your containers, you should skip to *Depend on a process manager*. Docker has a build in feature, that it uses a lightweight process manager to help you. @@ -236,13 +248,13 @@ Please, note that the below examples require a certain minimum version of Docker * [compose (v 2.2)](https://docs.docker.com/compose/compose-file/compose-file-v2/#image) - 1.13.0+ * [swarm (v 3.7)](https://docs.docker.com/compose/compose-file/#init) - 18.06.0+ -#### Docker Run +#### With Docker Run ```bash docker run --rm -ti --init caladreas/dui ``` -#### Docker Compose +#### With Docker Compose ```yaml version: '2.2' @@ -252,7 +264,7 @@ services: init: true ``` -#### Docker Swarm +#### With Docker Swarm ```yaml version: '3.7' @@ -267,6 +279,7 @@ Relying on Docker does create a dependency on how your container runs. It only r Creating either a different experience for users running your application somewhere else or not able to meet the version requirements. So maybe another solution is to bake a process manager into your image and guarantee its behavior. ### Depend on a process manager + One of our goals for Docker images is to keep them small. We should look for a lightweight process manager. It does not have too many a whole machine worth or processes, just one and perhaps some children. Here we would like to introduce you to [Tini](https://github.com/krallin/tini), a lightweight process manager [designed for this purpose](https://github.com/krallin/tini/issues/8). @@ -296,26 +309,26 @@ ENTRYPOINT ["/sbin/tini", "-vv","-g","-s", "--"] CMD ["top -b"] ``` -## Signals management +## How To Be Told What You Want To Hear + +You've made it this far; your tenets are reachable so you can inform them if they need to act. However, there's another problem lurking around the corner. Do they speak your language? -Now that we can capture signals and manage our process, we have to see how we can manage those signals. There are three parts to this: +Our process now starts knowing it can be talked to, it has someone who takes care of listening for it and its children. Now we need to make sure it can understand what it hears, it should be able to handle the incoming signals. We have two main ways of doing this. -* **Handle signals**: we should make sure our process can deal with the signals it receives -* **Receive the right signals**: we might have to alter the signals we receive from our orchestrators -* **Signals and Docker orchestrators**: we have to help our orchestrators to know when to deliver these signals. +* **Handle signals as they come**: we should make sure our process deal with the signals as they come +* **State the signals we want**: we can also tell up front, which signals we want to hear and put the burden of translation on our callers For more details on the subject of Signals and Docker, please read this excellent blog from [Grigorii Chudnov](https://medium.com/@gchudnov/trapping-signals-in-docker-containers-7a57fdda7d86). -### Handle signals +### Handle signals as they come Handling process signals depend on your application, programming language or framework. -For Java and Go(lang) we dive into this further, exploring some options we have here, including some of the most used frameworks. -### Receive the right signals +### State the signals we want -Sometimes your language or framework of choice, doesn't handle signals all that well. -It might be very rigid in what it does with specific signals, removing your ability to do the right thing. +Sometimes your language or framework of choice, doesn't handle signals all that well. +It might be very rigid in what it does with specific signals, removing your ability to do the right thing. Of course, not all languages or frameworks are designed with Docker container or Microservices in mind, are yet to catch up to this more dynamic environment. Luckily Docker and Kubernetes allow you to specify what signal too sent to your process. @@ -395,15 +408,9 @@ spec: When you create this as deployment.yml, create and delete it - `kubectl apply -f deployment.yml` / `kubectl delete -f deployment.yml` - you will see the same behavior. -  -### Signals and Docker orchestrators - -Now that we can respond to signals and receive the correct signals, there's one last thing to take care off. -We have to make sure our orchestrator of choice sends these signals for the right reasons. -Quickly telling us, there's something wrong with our running process, and it should shut down, which of course, we'll do gracefully! +## How To Be Told When You Want To Hear It -As the topic for health, readiness and liveness checks is a topic on its own, we'll keep it short. -Giving some basic examples and pointing you to more work to further investigate how to use it to your advantage. +Our process now will now start knowing it will hear what it wants to hear. But we now have to make sure we hear it when we need to hear it. An intervention is excellent when you can still be saved, but it is a bit useless if you're already dead. ### Docker @@ -415,5 +422,5 @@ Considering only Docker can use the health check in your Dockerfile, ### Kubernetes -In Kubernetes we have the concept of [Container Probes](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes). +In Kubernetes we have the concept of [Container Probes](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes). This allows you to configure whether your container is ready (readinessProbe) to be used and if it is still working as expected (livenessProbe). From 9165eec62bc37588c47fab9e89ebee8bea57f241 Mon Sep 17 00:00:00 2001 From: Viktor Farcic Date: Tue, 25 Sep 2018 19:09:11 +0200 Subject: [PATCH 10/10] Review --- docs/blogs/docker-graceful-shutdown.md | 57 +++++++++++++++++++------- 1 file changed, 43 insertions(+), 14 deletions(-) diff --git a/docs/blogs/docker-graceful-shutdown.md b/docs/blogs/docker-graceful-shutdown.md index bb14a15..eea5b63 100644 --- a/docs/blogs/docker-graceful-shutdown.md +++ b/docs/blogs/docker-graceful-shutdown.md @@ -1,25 +1,31 @@ # Gracefully Shutting Down Applications in Docker -I'm not sure about you, but I like it when my neighbors leave our shared spaces clean and don't take up parking spaces when they don't need. +I'm not sure about you, but I prefer that my neighbors leave our shared spaces clean and don't take up parking spaces when they don't need them. -Imagine you live in an apartment complex with the above-mentioned parking lot. Some tenants go away and never come back. If nothing is done to clean up after them - to reclaim their apartment and parking space - then after some time, more and more apartments are unavailable for no reason, the parking lot fills up with cars which belong to no one. +Imagine you live in an apartment complex with the above-mentioned parking lot. Some tenants go away and never come back. If nothing is done to clean up after them - to reclaim their apartment and parking space - then after some time, more and more apartments are unavailable for no reason, and the parking lot fills with cars which belong to no one. -Some tenants did not get a parking lot and are getting frustrated that none are opening up. When they moved in, they were told when others leave, they would be next in line. While they're waiting, they parked outside the complex. Eventually, the entrance is blocked and no one can enter or leave. The end result is a completely unlivable apartment block with trapped tenants - never to be seen or heard. +Some tenants did not get a parking lot and are getting frustrated that none are becoming available. When they moved in, they were told that when others leave they would be next in line. While they're waiting, they have to park outside the complex. Eventually, the entrance gets blocked and no one can enter or leave. The end result is a completely unlivable apartment block with trapped tenants. -If you agree with me that if a tenant leaves, the tenant should clean the apartment and free the parking spot to make it ready for the next inhabitant; then please read on. We're going to dive into the equivalent of doing this with containers. +If you agree with me that when a tenant leaves, he or she should clean the apartment and free the parking spot to make it ready for the next inhabitant; then please read on. We're going to dive into the equivalent of doing this with containers. -We will explore running our container with Docker (run, compose, swarm) and Kubernetes. +We will explore running our containers with Docker (run, compose, swarm) and Kubernetes. Even if you use another way to run your containers, this article should provide you with enough insight to get you on your way. ## The case for graceful shutdown -We're in an age where many applications are running in Docker containers across a multitude of clusters and (potentially) different orchestrators. These bring with it, other concerns to tackle, such as logging, monitoring, tracing and many more. One significant way we defend ourselves against the perils of distributed nature of these clusters is to make our applications more resilient. +We're in an age where many applications are running in Docker containers across a multitude of clusters and (potentially) different orchestrators. These bring with it other concerns to tackle, such as logging, monitoring, tracing and many more. One significant way we defend ourselves against the perils of distributed nature of these clusters is to make our applications more resilient. -However, there is still no guarantee your application is always up and running. So another concern we should tackle is how it responds when it needs to shut down. Where we can differentiate between an unexpected shutdown - we crashed - or an expected shutdown. +NOTE: How are the perils from the last sentence related logging, monitoring, and tracing? -Shutting down can happen for a variety of reasons, in this post we dive into how to deal with an expected shutdown such as it being told to stop by an orchestrator such as Kubernetes. +NOTE: The last sentence sounds as if distributed systems make applications less resilient so we need to increase their resiliency. If anything, it's the other way around. Running applications in distributed systems make them more resilient. -This can happen for several reasons, including but limited too: +However, there is still no guarantee your application is always up and running. So another concern we should tackle is how it responds when it needs to shut down. Where we can differentiate between an unexpected shutdown - we crashed - or an expected shutdown. + +NOTE: The first sentence is missleading. The subject is graceful shutdown and that's not directly related with the subject of how containers and schedulers guarantee applications uptime. + +Shutting down can happen for a variety of reasons. In this post we'll dive into expected shutdown, such as through an orchestrator like Kubernetes. + +Containers can be purposelly shut down for a variety of reasons, including but not limited too: * your application's health check fails * your application consumed more resources than allowed @@ -28,18 +34,28 @@ This can happen for several reasons, including but limited too: Not only does this increase the reliability of your application, but it also increases that of the cluster it lives in. As you can not always know in advance where your application runs, you might not even be the one putting it in a docker container, make sure your application knows how to quit! +NOTE: The previous paragraph sounds confusing. + Graceful shutdown is not unique to Docker, as it permeates Linux's best practices for quite some years before Docker's existence. However, applying them to Docker container adds extra dimensions. +NOTE: The previous paragraph sounds confusing. + +NOTE: The subtitle is "The case for graceful shutdown" and yet you did not explain the case. For example, terminating pending requests before shutdown. + ## Start Good So You Can End Well -When you sign up for an apartment, you probably have to sign a contract detailing your rights and obligations. The more you state explicitly, the easier it is to deal with bad behaving neighbors. This holds the same when running a process; we should make sure we set the rules, obligations, and expectations from the start. +When you sign up for an apartment, you probably have to sign a contract detailing your rights and obligations. The more you state explicitly, the easier it is to deal with bad behaving neighbors. The same is true for running processes; we should make sure that we set the rules, obligations, and expectations from the start. -As we say in Dutch: a good beginning is half the work. We will start with how you can run a process in a container that is beneficial to Graceful Shutdown. +As we say in Dutch: a good beginning is half the work. We will start with how you can run a process in a container with a process that shuts down gracefully. There are many ways to start a process in a Docker container. I prefer to make things easy to understand and easy to know what to expect. So this article deals with processes started by commands in a Dockerfile. +NOTE: The second sentence sets expectations that you will make things easy, but the third sentence does not follow on that promise. It's as if they're not connected. + There are several ways to run a command in a Dockerfile. +NOTE: We do not run a command in a Dockerfile but in a container. Dockerfile specifies how will a command be executed. + These are as follows: * **CMD**: runs a command when the container gets started @@ -49,26 +65,30 @@ You need at least one ENTRYPOINT or CMD in a Dockerfile for it to be valid. They You can put these commands in both a shell form and an exec form. For more information on these commands, you should check out [Docker's docs on Entrypoint vs. CMD](https://docs.docker.com/engine/reference/builder/#exec-form-entrypoint-example). +NOTE: If you start explaining something (e.g., shell for and exec form) provide at least some basic info. Otherwise, remove the first part and leave only the link (e.g., for more info...). ### Docker Shell form example We start with the shell form and see if it can do what we want; begin in such a way, we can stop it nicely. -We create the following Dockerfile: +NOTE: Explain (1 sentence is enough) what is the shell form. Since you assumed that readers don't know what is CMD and what is ENTRYPOINT, you must assume that they do not know what is shell form. In other words, you need to be clear who is the target audience and cannot assume first that they do not know stuff and then that they do know things that build on that stuff. + +Please create Dockerfile with the content that follows. ```dockerfile FROM ubuntu:18.04 ENTRYPOINT top -b ``` -Then build and run it. +Then build an image and run a container. ```bash docker image build --tag shell-form . + docker run --name shell-form --rm shell-form ``` -This yields the following output. +The latter command yields the following output. ```bash top - 16:34:56 up 1 day, 5:15, 0 users, load average: 0.00, 0.00, 0.00 @@ -84,6 +104,9 @@ KiB Swap: 1048572 total, 1042292 free, 6280 used. 1579380 avail Mem As you can see, two processes are running, **sh** and **top**. Meaning, that killing the process, with *ctrl+c* for example, terminates the **sh** process, but not **top**. + +NOTE: Elaborate why is that so. + To kill this container, open a second terminal and execute the following command. ```bash @@ -92,6 +115,12 @@ docker rm -f shell-form Shell form doesn't do what we need. Starting a process with shell form will only lead us to the disaster of parking lots filling up unless there's a someone actively cleaning up. +NOTE: I'm not sure I understand why does shell form lead to a disaster? + +NOTE: If this is a blog post, you went to far into different ways to execute commands inside containers and there's still not sign of graceful (or non-graceful) shutdown. On the other hand, you're rushing through the shell form. If I don't know what it is, I would probably not understand it after this section. On the other hand, if I do know what it is, I'm bored and will probably give up on my expectation to read about graceful shutdown. I'd recommend to write two articles. One about different ways to run commands in Docker containers and the other about graceful shotdown. The latter can contain the link to the former. + +NOTE: Have to go now, so I'll stop the review at this point. + ### Docker exec form example This leads us to the exec form. Hopefully, this gets us somewhere.