How self-healing networks help keep the digital world stable and secure
6 May 2021
4 min read

There may never be another March 2020, when much of the world virtually—but not digitally—came to a halt because of the COVID-19 pandemic. But that doesn’t mean that telecommunication providers needn’t be ready for similar, if smaller, shocks in the future.

In fact, disruptions of all kinds could become more prevalent and more problematic as the world becomes more dependent on the very digital networks powered by telcos and communication service providers, or CSPs.

What if the network could predict and even preempt outages and spikes in demand, to help prevent interruptions from ever occurring?

That’s precisely what many carriers are increasingly looking for as they shift to cloud-supported networks infused with AI.

“The ability of the telecom networks to handle increased loads and exponential growth like we saw with the pandemic will diminish if they don’t move into a hybrid cloud, open framework architecture,” Utpal Mangla, a vice president for IBM’s Telecom, Media and Entertainment Center of Competency, told Industrious.

Carriers were already confronting these challenges even before COVID-19. As with so many industries, the pandemic brought the need for digital transformation into stark focus.

As workers and consumers shifted their labor, recreation, communications and shopping patterns to routines that still persist, the resulting stress on telecommunications network was profound. Overnight, usage exploded in many places, as more and more people tried to squeeze into the same amount of network capacity (at least until the networks could respond by scaling up capacity rapidly).

When working from anywhere, people need a network that’s reliable everywhere.

This was an unprecedented and dramatic situation that exposed some hard truths about telecom network maintenance, namely that the current fragmented system is not positioned well to confront such an existential threat. A slowly buffering movie is one thing. A dropped sales pitch or remote wedding ceremony is another problem entirely.

If people were truly going to embrace a new digital lifestyle, they had to know they could rely on the network to support them.

Self-healing, a superpower for telcos

The frequency of network failures (known in the industry as MTTF, or mean time to failure) and amount of system downtime are some of the most important factors a communications service provider uses to measure the efficacy of their network. They must be tamed to remain competitive and keep customers happy—especially the enterprise customers carriers will increasingly rely on for more of their revenues.

As many enterprises move onto the cloud from physical locations, their banking app or streaming service comes to effectively exist on—and rely on—a carrier’s network as much as any device or server users might encounter. This puts more and more responsibility on the carrier for continuous connectivity and reliability. It’s a herculean task, especially when any number of causes can lead to network problems: a failure of some kind to the configuration of the network’s capacity, or hardware issues all the way to software hiccups.

As enterprises rely on the network for more in the 5G era, they need connectivity they can count on.

“Many carriers lack a comprehensive, integrated network platform, for various reason,” Mangla said. “As we know, the scale of the network is growing tremendously with 5G, edge, billions of IOT devices, and predicted to get into trillions over the next five years. The networks must be more agile and responsive to keep up, and AI, analytics and automation will be the means to do that.”

Networks were historically built with this kind of reliability in mind—though it’s been almost impossible to keep up with innovation, at least until now. As the technology existing on networks has evolved, it has been hard both for the physical infrastructure and its human operators and technicians to manage all the change and demands.

Furthermore, existing networks are fragmented because of many mergers and acquisitions over the years, with different lines of business that are not consolidated and operate through silos. Many are still very manual or only partially automated. While COVID-19 revealed some of these weaknesses, they will only become more glaring as the scale of networks continue to grow tremendously due to the emergence of 5G, Edge computing and the resulting billions (soon to be trillions) of IoT devices coming online.

Digital investments compensate for infrastructure

One thing a cloud-based architecture with AI ensures is that carriers can be proactive—even predictive—in their network maintenance. What Mangla calls “brutal automation” can occur, infused with analytics and AI. The network becomes so automated that it requires “zero touch” from humans and can self-heal. The benefits of this are obvious: the network can predict that a failure might happen, and it can be mitigated before it even happens.

Especially in industrial settings, network reliability is crucial to protect workers and machinery.

How does the AI on a self-healing network work? Mangla describes three critical pieces: the network’s nervous system, brain and then arms and legs.

The nervous system is the analytical component, the brain is the cognitive one and automation itself is the arms and legs. Analytics is used for feeling and detecting changes in the network; the cognitive component understands network conditions and recommends best actions for resolving any issues; and the arms and legs is the automation that performs the recommended actions to solve the problem.

As the scale and complexity of networks continue to skyrocket, Mangla marvels at how it would be almost impossible without AI to help orchestrate everything. “I can’t imagine having to manage the network today and tomorrow without brutal automation,” he said.

 
Author
Matt A.V. Chaban Content Producer, IBM Industries