![](https://i.imgur.com/yCIuIaF.png) **I want to elaborate not only on introduced features but also on my thought and reasoning so everyone can benefit and introduce changes in their projects. This post will be long but I believe it might be useful for the entire HiveDevs community.** The workflow is built on top of Gitlab CI/CD but can be easily modified for any other CI/CD tool (i.e. CircleCI or Jenkins). # Docker I've started my work by reviewing `Dockerfile`. I've built the `condenser` image and was a bit shocked: ``` $ docker image ls condenser latest 6d57c0c8a904 19 seconds ago 1.54GB ``` 1.54 GB for a simple frontend application? Boy, it's just too much. Let's do something with it. ## Leaner docker image Let's start by recognizing the reason why is this so big. You can also look into your `Dockerfile` and `package.json` and search for some common mistakes: * using full-sized images rather than `alpine` versions * installing dev dependencies for production images * putting dev dependencies into production dependencies list * copying useless files into the final image Yep, you can check every point here. Let's make some work on a new Dockerfile: ```dockerfile FROM node:12.16.2 as development WORKDIR /var/app COPY package.json yarn.lock ./ RUN yarn install --non-interactive --frozen-lockfile --ignore-optional COPY . . RUN mkdir tmp && yarn build CMD [ "yarn", "run", "start" ] ### REMOVE DEV DEPENDENCIES ## FROM development as dependencies RUN yarn install --non-interactive --frozen-lockfile --ignore-optional --production ### BUILD MINIFIED PRODUCTION ## FROM node:12.16.2-alpine as production WORKDIR /var/app ARG SOURCE_COMMIT ENV SOURCE_COMMIT ${SOURCE_COMMIT} ARG DOCKER_TAG ENV DOCKER_TAG ${DOCKER_TAG} COPY --from=dependencies /var/app/package.json /var/app/package.json COPY --from=dependencies /var/app/config /var/app/config COPY --from=dependencies /var/app/dist /var/app/dist COPY --from=dependencies /var/app/lib /var/app/lib COPY --from=dependencies /var/app/src /var/app/src COPY --from=dependencies /var/app/tmp /var/app/tmp COPY --from=dependencies /var/app/webpack /var/app/webpack COPY --from=dependencies /var/app/node_modules /var/app/node_modules COPY --from=dependencies /var/app/healthcheck.js /var/app/healthcheck.js HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=5 CMD node /var/app/healthcheck.js CMD [ "yarn", "run", "production" ] ``` What has been improved: * Node.js version was upgraded from `8.7` to `12.16.2` which is the latest LTS at the moment. Always try using the latest framework versions, which may include security fixes and performance upgrades. It's also a good habit to use a specific version up to the `patch` number. * Multistage build was used to build an optimal image for production deployment. First, we build a `development` stage with every dependency to be able to compile React application. Next, we're removing development dependencies with `--production` switch during the `dependencies` stage. After all, we're creating a minimal image from the `alpine` node version which is the smallest base available, by copying only necessary files and directories. * Healthcheck has been introduced, so the docker daemon can easily manage containers with automatic restarts if necessary and zero-downtime deployments which will be explained later in this post. It will be also useful for dynamic scaling capabilities with `docker swarm`. Also, `package.json` file was modified, but it's not worth to post its content here: * Dev dependencies like `webpack`, `babel` or `eslint` was moved into a proper list called `devDependencies` so `yarn` could install only production deps for the final image. * Strict versions were introduced for every dependency to make sure every build produces exactly the same image. Semantic versioning is popular, but there is no way to check if the package author does not introduce breaking changes with only `patch` version increased. If you need to update a package, do it manually. If you have enough test coverage, you can update the package and run CI/CD to check if everything works fine. After all that work done, the condenser image size was massively reduced: ``` $ docker image ls condenser latest 58406d338e67 8 seconds ago 226MB ``` ![](https://i.imgur.com/xGtpXCQ.png) ![](https://i.imgur.com/aCCoCud.png) The compressed image in the docker registry is even smaller. Much better, right? Shame on you, Steemit! ## Healthcheck Simply speaking, docker is running containers and trying to keep it alive as long as possible. But the system needs to have a tool to determine if the container is `actually alive`. It may seem alive, but is your app responding to requests for example? Fortunately, docker has also integrated health check mechanism which can be integrated into `Dockerfile` or `docker-compose.yml`. Usually, you need to create an endpoint for liveness checks. Fortunately, condenser already has one so we can utilize it easily. There is a lot of examples with `curl` used as a docker health check, but it's not a good way to go. Healthcheck should work cross-platform and `curl` implementation differs on Windows and Unix. You should write health check in the same technology or framework as your projects are written, for condenser it's Node.js. ```javascript const http = require("http"); const options = { host: "localhost", port: "8080", path: '/.well-known/healthcheck.json', timeout: 5000 }; const request = http.request(options, (res) => { console.log(`STATUS: ${res.statusCode}`); if (res.statusCode == 200) { process.exit(0); } else { console.error(`ERROR: ${res.statusCode}`); process.exit(1); } }); request.on('error', (err) => { console.error('ERROR', err); process.exit(1); }); request.end(); ``` When ready, instruct Docker to use your health check mechanism. Add following line into your `Dockefile`: `HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=5 CMD node /var/app/healthcheck.js` Make sure this file `/var/app/healthcheck.js` exists inside your image. If you want to be sure your health check is working, inspect your container after running it: ``` $ docker container ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 060166cf52ee hiveio/condenser:development "docker-entrypoint.s…" 5 minutes ago Up 5 minutes (healthy) 0.0.0.0:8080->8080/tcp mystifying_dhawan ``` There should be a `(healthy)` indicator in `STATUS` column. And there is. Please also note that during the container startup process, it will indicate slightly different status (`starting`), as the docker daemon will wait before making the first check. It's because we're giving some time for our app to startup. It's the `start-period` parameter. Depending on your app and traffic/load, those parameters should vary. ## Pushing images to Docker Hub We already have an improved (smaller) docker image, so it's time to push it to the repository. Doing it manually is a waste of time and may cause human mistakes. The best way is to utilize Gitlab Runner to do it for us in an automatic and bullet-proof manner. Here is a `job` definition from `.gitlab-ci.yml` file with some additional code which we will breakdown: ```yaml variables: DOCKER_IMAGE: hiveio/condenser .docker-job: &docker-job image: docker:stable services: - docker:dind before_script: - echo $HUB_TOKEN | docker login -u $HUB_USERNAME --password-stdin build-development: <<: *docker-job stage: build variables: DOCKER_TAG: $DOCKER_IMAGE:$CI_COMMIT_SHORT_SHA DOCKER_TAG_MAIN: $DOCKER_IMAGE:development SOURCE_COMMIT: $CI_COMMIT_SHA only: - develop script: - docker build -t $DOCKER_TAG -t $DOCKER_TAG_MAIN --build-arg SOURCE_COMMIT --build-arg DOCKER_TAG . - docker push $DOCKER_TAG - docker push $DOCKER_TAG_MAIN ``` First, we're creating global variable `DOCKER_IMAGE` so we can reuse it later in many places. And in case we would like to change the image name, we do it only in a single place. Hence we have multiple `jobs` defined in `.gitlab-ci.yml` file, it's good to utilize advanced YAML syntax, which includes **hidden keys** and **anchors**. It will decrease the duplicated code and make the file easier to read and maintain. Every job name starting with the dot will be considered as a hidden key and **won't be directly executed**. Btw, this is a quick way to temporarily disable any job in your GitLab CI/CD without commenting or removing it. By using `.docker-job: &docker-job` we created an anchor which can be later used to extend any job. If you add `<<: *docker-job`, it will populate `image`, `services` and `before_script` properties automatically. It's a good move if you have multiple jobs that do similar things. Later on, we're creating some additional local (job scoped) variables: * `DOCKER_TAG_MAIN` which will be evaluated to `hiveio/condenser:development` * `DOCKER_TAG` which will be evaluated to `hiveio/condenser:344e55ef` or similar * `SOURCE_COMMIT` which will be evaluated to `344e55efefd56e00b15eea6ccf8560a1107b9ff6` (or similar commit SHA) It's a good idea to double tag an image. `Latest` tag is useless if you want to track your development process. Later on, I will describe the way how we're using this specific, `sha`-tagged image to track deployments and rollback them anytime with a single click. Finally, we're building an image with additional build arguments by using `--build-arg`: `docker build -t $DOCKER_TAG -t $DOCKER_TAG_MAIN --build-arg SOURCE_COMMIT --build-arg DOCKER_TAG .` If you scroll up to the Dockerfile section, you will notice `ARG SOURCE_COMMIT` and `ENV SOURCE_COMMIT ${SOURCE_COMMIT}` which means these build arguments will be injected as an environment variables into your containers. It's a quick and easy way to pass additional, build-level variables into your images. Those specific variables are later returned by the condenser health endpoint. It may be useful to check a specific instance source. ```shell $ curl https://staging.condenser.engrave.dev/.well-known/healthcheck.json {"status":"ok","docker_tag":"hiveio/condenser:344e55ef","source_commit":"344e55efefd56e00b15eea6ccf8560a1107b9ff6"} ``` *** An important thing to mention is that `HUB_TOKEN` and `HUB_USERNAME` are environment variables injected into the GitLab runner job and configured in Projects Settings. To prevent unauthorized users from pushing malicious images into the official registry, those variables are configured as `protected` and `masked`, which means they can be only used on specific branches and are moderated from job logs, so there is no way it can leak without internal bad actor with elevated permissions. ![](https://i.imgur.com/o3FAaQb.png) # Merge requests To improve the process of adding new features and fixes to the codebase, Merge Requests have got a brand new CI/CD workflow which includes the following jobs: ![](https://i.imgur.com/3kV2zPi.png) The entire pipeline is fired on every Merge Request and it's required to pass before changes could be merged. If the pipeline fails for some reason (i.e. failing unit tests), there is no way to merge changes into the main branch. This will enforce code quality and prevent regression. ## Eslint Code quality and standardization are important, especially if it's an open-source project that could be maintained by totally different developers from all around the world. `Eslint` is a tool that statically analyzes the code to quickly find potential problems and keep code organized with specified rules. Code analysis is especially useful when developing Javascript applications. It's really easy to make some stupid mistakes. Eslint job will be fired on every Merge Request and on every branch pushed to the repository: ```yaml run-eslint: stage: test image: node:12.16.2 only: - branches - merge_requests before_script: - yarn install --frozen-lockfile --ignore-optional script: - yarn ci:eslint allow_failure: true # will be changed to false when all linter errors removed ``` Because the codebase is a bit neglected, `run-eslint` job is allowed to fail for the moment (`allow_failure: true`), which is indicated by an orange exclamation mark on a MR view or pipelines list. There are "some" errors and warnings right now but it should be cleaned up soon, so we can require eslint job to pass before merging proposed changes: ``` ✖ 1208 problems (1187 errors, 21 warnings) 831 errors and 0 warnings potentially fixable with the `--fix` option. error Command failed with exit code 1. ``` ## Unit tests Extensive testing is the only way to produce bullet-proof code and stable applications. Similar to `run-eslint`, `run-unit-tests` job is fired on every branch and every merge request. ```yaml run-unit-tests: stage: test image: node:12.16.2 only: - branches - merge_requests before_script: - yarn install --frozen-lockfile --ignore-optional script: - yarn run ci:test coverage: /All files[^|]*\|[^|]*\s+([\d\.]+)/ ``` Testing suit (`jest`) was configured to produce coverage report: ![](https://i.imgur.com/43JrdbE.png) This report is later parsed by a Gitlab, using `coverage: /All files[^|]*\|[^|]*\s+([\d\.]+)/` configuration. It will display coverage status and percentage change on the Merge Request view, allowing reviewers to quickly inspect if the code quality is increasing or not. ![](https://i.imgur.com/Yn6lbe8.png) I would advise declining merging new features not covered with tests. This may be omitted for important fixes, but we all should try to make the code better, not worse. Also, repository settings were changed and `Pipelines must succeed` setting is checked by default. It means no more broken code on develop/production branches. ## Review apps `Review Apps` are a huge and very important feature. From now on, every feature can be inspected visually by the reviewer with a single click. Gitlab Runner will create a special instance built from proposed code and expose it for the reviewers: ![](https://i.imgur.com/FOEhZ6d.png) Review app requires three jobs to run on a merge request: ```yaml build-review-app: <<: *docker-job stage: build variables: DOCKER_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA SOURCE_COMMIT: $CI_COMMIT_SHA only: - merge_requests before_script: - echo $CI_JOB_TOKEN | docker login -u $CI_REGISTRY_USER $CI_REGISTRY --password-stdin script: - docker build -t $DOCKER_TAG --build-arg SOURCE_COMMIT --build-arg DOCKER_TAG . - docker push $DOCKER_TAG deploy-review-app: <<: *docker-job <<: *docker-remote-host-review stage: deploy variables: DOCKER_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA SERVICE_NAME: review_$CI_ENVIRONMENT_SLUG only: - merge_requests script: - DOCKER_CERT_PATH=$CERTS docker -H $REVIEW_HOST --tlsverify pull $DOCKER_TAG - DOCKER_CERT_PATH=$CERTS docker -H $REVIEW_HOST --tlsverify service rm $SERVICE_NAME || true # try to remove previous service but do not fail if it not exist - DOCKER_CERT_PATH=$CERTS docker -H $REVIEW_HOST --tlsverify service create --with-registry-auth --network infrastructure --name $SERVICE_NAME $DOCKER_TAG - echo "Review app deployed" environment: name: review/$CI_COMMIT_REF_NAME url: https://$CI_ENVIRONMENT_SLUG$APP_REVIEW_SUBDOMAIN on_stop: stop-review-app auto_stop_in: 1 week stop-review-app: <<: *docker-job <<: *docker-remote-host-review stage: deploy variables: SERVICE_NAME: review_$CI_ENVIRONMENT_SLUG only: - merge_requests when: manual script: - DOCKER_CERT_PATH=$CERTS docker -H $REVIEW_HOST --tlsverify service rm $SERVICE_NAME || true # try to remove previous service but do not fail if it not exist - echo "Review app stopped" environment: name: review/$CI_COMMIT_REF_NAME action: stop ``` The first job should look familiar if you read previous parts of the post. The only difference is that we're overwriting `before_script`. Note that if you're using **anchors**, you can always overwrite the template freely which is what we did here. Because merge requests could be opened by developers which may not be well known in the community (vel. bad actors), it could be a security issue if the CI/CD on Merge Requests could push images to the official repository. To prevent this, we're using an internal registry provided by the Gitlab itself. It's private, will work well for Review Apps but won't be accessible by anyone else. `echo $CI_JOB_TOKEN | docker login -u $CI_REGISTRY_USER $CI_REGISTRY --password-stdin` We are using `CI_JOB_TOKEN`, `CI_REGISTRY_USER` and `CI_REGISTRY` which are [environment variables injected automatically by Gitlab](https://docs.gitlab.com/ee/ci/variables/predefined_variables.html), no need to configure them. Also, please note using `--password-stdin` which is a more secure way to log in as it will prevent the password from being exposed in job logs. By default, docker will connect to the local daemon on unsecured, but not exposed port. It is yet possible to configure Docker daemon to validate TLS certificates so it could be exposed to the external world in a secure way, which is how we deploy services from our runners to our machine. You need to pass additional parameters: * `-H` which is remote docker address * `--tlsverify` makes sure your daemon is trying to identify itself with certificates * `--with-registry-auth` will send registry authentication to swarm agents * `--network` will connect service to the specified network so the reverse proxy could expose the instance * `DOCKER_CERT_PATH` will instruct demon where to search for TLS certificates `DOCKER_CERT_PATH=$CERTS docker -H $REVIEW_HOST --tlsverify service create --with-registry-auth --network infrastructure --name $SERVICE_NAME $DOCKER_TAG` This will create a new `docker service` and expose it automatically under the URL created with `CI_ENVIRONMENT_SLUG`, which is a variable provided by Gitlab. It's guaranteed to be a valid URL or docker/kubernetes service name. Review App instances are automatically removed when MR is closed or 1 week after it's opened. This is achieved by running `stop-review-app` job which is configured to be manually triggered (`when: manual`). # Tracking environment deployments ![](https://i.imgur.com/8ncGrE9.png) By using `sha`-tagged images, it's possible to quickly redeploy the environment at any moment, with a single click. In case of emergency, project maintainers can rollback the environment to the specified point in time (docker image to be specific) # Staging Deploying a staging environment is quite similar to deploying a Review App. It also uses remote docker daemon but the service is not created on-demand, it's updated with a new image. With `docker swarm` mode, you can ensure your application to be highly available. The swarm agent will take care of your containers. It will restart them or spin a new one if necessary (this is why health check is so important). It is a built-in, native docker mode everyone should start using. ```yaml deploy-staging: <<: *docker-job <<: *docker-remote-host-staging stage: deploy variables: DOCKER_TAG: $DOCKER_IMAGE:$CI_COMMIT_SHORT_SHA SERVICE_NAME: staging_condenser only: - develop script: - DOCKER_CERT_PATH=$CERTS docker -H $STAGING_HOST --tlsverify pull $DOCKER_TAG - DOCKER_CERT_PATH=$CERTS docker -H $STAGING_HOST --tlsverify service update --image $DOCKER_TAG --update-failure-action rollback --update-order start-first $SERVICE_NAME environment: name: staging url: https://$STAGING_DOMAIN ``` Job uses protected variables to prevent "bad" developers from pushing/deploying malicious code. It is only possible to push staging from the protected `develop` branch. Pushing directly to `develop` is disabled. ## Zero downtime deployments Updating single service is easy with `docker swarm`: `DOCKER_CERT_PATH=$CERTS docker -H $STAGING_HOST --tlsverify service update --image $DOCKER_TAG --update-failure-action rollback --update-order start-first $SERVICE_NAME` There are additional but important parameters provided: * `--image $DOCKER_TAG` - update existing service by running new containers with the specified image. For this case, it's sha-tagged image build from develop branch * `--update-failure-action rollback` - by default, docker daemon will try to update the service and do nothing if it fails. By passing this parameter, we're instructing docker to roll back the service to a previous state, which means containers using the previous image. * `--update-order start-first` - by default, docker will kill current containers and spin new ones after it. It may cause some downtime which we don't want. By setting `start-first`, we instruct docker to spin new containers first. Swarm agent will switch containers without downtime if the healtcheck result becomes positive. And in case something gone wrong (healtcheck failed for any reason), we end up with a working staging environment because old containers are not touched at all. ## Resources under control With `docker swarm` you have full control over your services and containers. This is an example configuration which is used for staging environment. With some tweaks, it could be used for production also: ```yaml version: "3.7" services: condenser: image: hiveio/condenser:latest deploy: mode: replicated replicas: 2 resources: limits: cpus: "0.85" memory: 2024M restart_policy: condition: any delay: 5s update_config: parallelism: 1 delay: 10s failure_action: rollback order: start-first rollback_config: parallelism: 1 delay: 5s networks: - reverse-proxy networks: reverse-proxy: external: name: reverse-proxy ``` # Production It is not finished yet, but my advice is to start using `docker swarm` mode for production deployments (CI/CD is ready). It's designed to serve services like a condenser. Some key features of Docker Swarm (based on official documentation): * **Cluster management integrated with Docker Engine**: if you know how to build the docker image, start a container, read some logs, you're ready to use Docker Swarm. You don't need to install additional software as it is a native Docker feature. * **Decentralized design**: adding a worker or a manager to your swarm is as easy as running a single command. Those could be machines from all around the world. * **Scaling**: For each service, you can declare the number of tasks you want to run. When you scale up or down, the swarm manager automatically adapts by adding or removing tasks to maintain the desired state. * **Multi-host networking**: You can specify an overlay network for your services. The swarm manager automatically assigns addresses to the containers on the overlay network when it initializes or updates the application. * **Load balancing**: You can expose the ports for services to an external load balancer. Internally, the swarm lets you specify how to distribute service containers between nodes. * **Secure by default**: Each node in the swarm enforces TLS mutual authentication and encryption to secure communications between itself and all other nodes. You have the option to use self-signed root certificates or certificates from a custom root CA. * **Rolling updates**: At rollout time you can apply service updates to nodes incrementally. The swarm manager lets you control the delay between service deployment to different sets of nodes. If anything goes wrong, you can roll back to a previous version of the service. And after all, it is called `Swarm`, can't be a coincidence! ;) I'm using Swarm mode for most of my projects. Within Engrave (dblog.org) swarm is automatically managing a set of 23 microservices with almost 30 containers at the moment. ![](https://i.imgur.com/JMs3d48.png) # My requests and tips for condenser Developers (and not only) * Please do write unit tests covering your code. Improving coverage will only cause fewer problems and bugs. Coverage is now visible on every merge request so it's damn easy to see a regression. Take care of the code quality. * Use exact package versions in `package.json` file. When using `^`, it's not guaranteed to install the same version on two different builds. And some developers like to introduce breaking changes without changing the `major` version number. * Use `alpine` images to create minified production images. * Use the latest images from official Docker HUB when possible. * Use [multi-stage builds](https://docs.docker.com/develop/develop-images/multistage-build/) to create leaner docker image * Write and configure health checks for your applications * Run `eslint` to clean up your code before you push it to the repository. You can use `husky` to ensure it happens automatically. # Vote for @engrave witness if you find my work valuable [![Vote for @engrave witness](https://i.imgur.com/lawFvZF.png)](https://hivesigner.com/sign/account-witness-vote?witness=engrave&approve=1 )

See: Improvements for hive.blog, Continuous Integration and Deployments by @engrave