Preparing Networks for AI Workloads Feels a Lot Like Getting Ready for VoIP

Written by Mark Mirrotto | Apr 1, 2026 12:00:01 PM

If you’ve been in networking long enough, you remember the anxiety that came with rolling out Voice over IP (VoIP) and converged voice, video, and data networks. What once felt like a best effort, “good enough” IP network suddenly had to behave like a deterministic, real time system. Latency mattered. Jitter mattered. Packet loss suddenly had real, tangible consequences. One of the main reasons this caused anxiety was we were essentially competing with PBX devices which were historically rock solid. Back in the PBX days, we didn’t talk about uptime in days or months. We talked about it in years and sometimes decades. If a PBX rebooted, something had gone terribly wrong. So how did VoIP get traction to the point where the PBX has pretty much been eliminated from most companies?? The benefits of converged infrastructure far outweighed any additional risks and quite honestly became part of our ‘new normal’ as we worked through all of the challenges. Basically we had to take into account how to ensure our networks were properly prepared for these new workloads.

Fast forward to today, and we’re having many of the same conversations, this time about artificial intelligence (AI) workloads in the datacenter and how the current design norms are no longer valid.

While AI and VoIP couldn’t be more different in purpose or scale, the network design evolution that’s occurring in the datacenter due to AI feels eerily familiar. The same foundational assumptions are being challenged, the same shortcomings exposed, and the same architectural maturity is being demanded.

Let’s explore the similarities and why the lessons from VoIP still matter.

“The Network Is Fine”… Until It Isn’t

Before VoIP, most enterprise networks were built for:

Email
File Transfers
Web Applications

In other words: bursty, delay-tolerant traffic.
Then VoIP arrived and exposed the truth our networks were “fine” only because applications tolerated their flaws.
Voice traffic made those flaws stand out.

AI workloads are doing something similar today our the data centers.

Many data center networks are designed using spine-leaf architecture primarily focused on east west traffic with relatively predictable application behavior. AI training and interpretation, especially distributed, GPU driven workloads, strain those assumptions:

Massive east west data movement between hundreds of thousands of endpoints
Sustained extremely high, asymmetric throughput
Intolerance of latency and micro bursts where current congestion control methods are detrimental
Many systems requiring perfect synchronization

Just like voice once did with the enterprise network, AI is forcing the data center network to evolve to handle modern workloads.

Latency Moves From “Nice to Have” to “Nonnegotiable”

With VoIP, latency needed to be measured and managed:

One-way latency under ~ 150ms
Minimal jitter, depending on the codex being used
Predictable Delivery

We learned that bandwidth alone didn’t solve the problem.
A congested or poorly designed network could cripple call quality even if links weren’t “full.”

AI workloads raise the stakes in the data center even further.

Distributed training frameworks require nodes to exchange gradients, parameters, and state in tight synchronization loops. When latency balloons or becomes inconsistent:

GPUs sit idle waiting for data
Training times explode
Costs skyrocket

This mirrors VoIP’s transition from “best effort IP” to “engineered IP,” where predictability mattered as much as raw speed.

Quality of Service: From Checkbox to Survival Skill

Ask any network engineer who lived through early VoIP days about QoS, and you’ll likely get a knowing sigh.

Before voice, QoS often existed as a line in a design document, configured, maybe, but rarely validated.

VoIP made it mandatory:

Traffic classification and marking
Queuing strategies
Congestion avoidance

And most importantly: end-to-end consistency

AI is forcing a similar reckoning.

While AI traffic may not use priority queues in the same way voice did, it absolutely demands:

Lossless or near lossless transport
Intelligent buffer management
Congestion control that minimizes tail latency

Technologies like RDMA over Converged Ethernet (RoCE) echo the VoIP era’s hard lesson:

QoS only works when it’s coherent across the entire network path.

Network Design Can’t Be an Afterthought

Early VoIP failures were rarely about codecs or call managers. They were about networks that weren’t designed with voice in mind:

Oversubscribed uplinks
Daisy-chained switches
Inconsistent VLAN or MTU configurations

Voice forced architects to think holistically—core, distribution, access, WAN, and everything in between.

AI is raising the same architectural red flags.

Flat Layer2 domains without thought to scale, oversubscribed leaf spine fabrics, mismatched MTUs, or “temporary” designs carried into production all become liabilities under AI workloads.

Once again, the network can’t be reactive. It must be intentionally designed for the workload, not retrofitted after performance complaints start rolling in.

Visibility Becomes Mission-Critical

When voice quality degraded, users didn’t open tickets saying “I’m experiencing 4% packet loss.” They said:

“The call sounds bad.”

That forced IT teams to develop better visibility, MOS scores, jitter graphs, call path analysis, tools that tied user experience
back to network behavior.

AI introduces a similar challenge.

When AI jobs run slowly or fail unpredictably, the root cause might be:

Network congestion
Silent packet loss
Microbursts
Misconfigured flow control

Without deep visibility into traffic patterns, buffer behavior, and latency distribution, teams are left guessing. Just like VoIP, AI elevates observability from nice to have to mandatory.

Cultural Shift: The Network Team Is Back in the Spotlight

VoIP pushed network teams into new roles. Suddenly, they weren’t just keeping packets moving, they were enabling a business critical service. Collaboration with voice teams, application teams, and vendors became essential.

AI is doing this again.

AI projects often kick off with a focus on GPUs, frameworks, and models. But performance and scalability quickly expose the network as a gating factor. Network architects find themselves pulled into conversations earlier, and under more pressure, than they’ve been in years.
It’s a reminder that networks don’t just support innovation; they often determine its ceiling.

The Big Lesson: We’ve Been Here Before

The transition to VoIP taught us some painful but valuable lessons:

Best effort doesn't scale forever
Latency and loss can be just as damaging as outages
End to end design matters more than point solutions
The network either enables transformation - or blocks it

AI is a new workload with new scale, but the underlying truth hasn’t changed.

The organizations that succeed won’t be the ones that simply add faster links or bigger switches. They’ll be the ones that treat the network as a first class platform, engineered deliberately for the demands of AI, just as voice once forced companies to change their view of the network.

If VoIP taught us how to build converged networks that could handle real time traffic, AI is teaching us how to build networks that can handle intelligence itself.

And for those of us who remember the VoIP days, the message is clear:

We already know how this story ends. The question is whether we apply the lessons sooner this time.

View full post