August 25, 2020 By Jason McGee 3 min read

After years of running Kubernetes, I’ve learned a few things about scale.

Automating the management of more than 1,000,000 containerized applications across the globe tends to expose weaknesses in your management approach, as standard systems and solutions fail under workloads of that magnitude.

But that upsurge in scale also revealed some critical best practices.

In its six-year lifespan, Kubernetes has solved a fundamental challenge—how to build a platform that lets app developers focus on building their apps instead of focusing on all the plumbing and the infrastructure for running those apps.

What we did 20 years ago in Java with app servers, we’re doing now with cloud. And we’re doing it on top of containers. Kubernetes is the open source container orchestration platform of choice. Surveys find that more than 70% of enterprises than run DevOps and use containers are using Kubernetes for web and API applications, databases, data warehouses, machine learning, blockchain, IoT applications, and high-volume websites.

Of course, you can manage one cluster by hand. Add a few more–in the 2-10 cluster-range—and your familiar tools will function satisfactorily. But more than 10? That’s when those stable tools are pressure-tested. They fail. Now, it’s time for help.

Many organizations initially lifted and shifted their applications to the cloud as monoliths. The next wave of cloud native applications are being built using microservices based on containers that primarily run of Kubernetes. Currently, monolithic and cloud native applications are being deployed in roughly equal numbers, while the early monolithic applications are often being modified with extensions based on microservices that make it easier to add additional functionality.

On a growth curve where the number of users dramatically outpaced the size of our development team, we learned that running cloud at scale required a solution to two persistent issues.

  1. How does the team manage such a vast system?
  2. How do we gain visibility into what’s running in our clusters and update them?

Adapt the system, not the team—build all operational work where the team is

If you want a small team to be able to manage a large environment, you have to make everything as efficient for the people as possible. This means adapting the technology to the people. Switching between tools and systems was slowing us down. The “aha” moment was when we realized that the team spent their entire day talking to each other on Slack and if we could bring the management system to Slack, we could all go faster.

The insight led to a ChatOps model, where all of the team data and operations to manage production could take place using bots integrated in the conversations the team was having. Now, pushing an update, handling an incident, identifying the right runbook, access systems, and collecting audit data can all happen without ever having to leave the conversation.

Focus on managing change efficiently

At the start, we used a traditional Jenkins based CI/CD model to update our systems. But it didn’t scale and was too slow to deploy. It was fragile. Its rules over deployment decisions became too complex. So, we built a different system to help us manage and inventory deployments at scale.

Switched to pull-based self-updated cluster model

Instead of pushing applications into production, all clusters could pull changes and update themselves. This allowed us to scale easily and maintain control over what was running.

Flexible rule- and label based-configuration

Having tens of thousands of clusters means you’re not doing anything on an individual cluster, but fleets of systems. To do this, we established rules to decide where applications ran and used labels within the environment to give us the fine-grained controls we needed over the system.

We also needed the systems to report for themselves: what was running in every cluster and what capabilities were deployed in each system around the world.

This new approach allowed us to grow to tens of thousands of managed clusters that we can reliably update 1000s of times every week without having to grow our team. In the spirit of sharing what we learned, we even open-sourced our tools at razee.io.

Beyond one million

Operations teams are typically tasked with running deployments across at least hundreds—usually thousands—of containers. In any enterprise’s IT infrastructure, the need to schedule and automate deployment, availability, and scalability is critical. Kubernetes is the de facto solution.

So, no matter how high the growth curve climbs, where users outnumber developers, Kubernetes empowers small teams to operate at scale in public and hybrid environments.

Discover how Red Hat® OpenShift® on IBM does this with velocity, market responsiveness, scalability, and reliability.

Was this article helpful?
YesNo

More from Cloud

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

9 min read - As organizations strive to stay ahead of the curve in today's fast-paced digital landscape, mainframe application modernization has emerged as a critical component of any digital transformation strategy. In this blog, we'll discuss the example of a US bank which embarked on a journey to modernize its mainframe applications. This strategic project has helped it to transform into a more modern, flexible and agile business. In looking at the ways in which it approached the problem, you’ll gain insights into…

The power of the mainframe and cloud-native applications 

4 min read - Mainframe modernization refers to the process of transforming legacy mainframe systems, applications and infrastructure to align with modern technology and business standards. This process unlocks the power of mainframe systems, enabling organizations to use their existing investments in mainframe technology and capitalize on the benefits of modernization. By modernizing mainframe systems, organizations can improve agility, increase efficiency, reduce costs, and enhance customer experience.  Mainframe modernization empowers organizations to harness the latest technologies and tools, such as cloud computing, artificial intelligence,…

Modernize your mainframe applications with Azure

4 min read - Mainframes continue to play a vital role in many businesses' core operations. According to new research from IBM's Institute for Business Value, a significant 7 out of 10 IT executives believe that mainframe-based applications are crucial to their business and technology strategies. However, the rapid pace of digital transformation is forcing companies to modernize across their IT landscape, and as the pace of innovation continuously accelerates, organizations must react and adapt to these changes or risk being left behind. Mainframe…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters