Embracing The Culture Change of Network Automation
Adopting network automation requires the network design and operations teams to undergo a significant cultural shift. What must change and why?
The networking industry has done a great job of training network engineers to use the command line interface for per-device configuration, and training has resulted in an extensive use of manual processes. Even though we’ve included training on how routing and switching protocols work between multiple devices, the training focused on building configurations for individual devices have lagged behind.
The manual processes were not too bad when networks were relatively small. But over time, the size and complexity of corporate networks have grown. We’ve adopted mechanisms to minimize human error, such as change control boards. But we still tend to maintain network configuration data in spreadsheets that must be manually copied or cut and pasted into network configuration templates. There are many opportunities for errors of various types.
Automation gives us the tools we need to use to gain effective control over the scale and complexity of today’s network configuration. The transition to automation isn’t the first time we’ve had a radical technology shift that required a cultural change. Every few years, it seems there’s a shift in what’s popular. Each change required a new set of configuration templates, new processes, new troubleshooting techniques, and changes to monitoring systems.
The transition from PBX to voice over Internet protocol (VoIP) is arguably the most recent change that required a similar culture shift. Teams of voice engineers had to transition from copper circuit technology to packet technology. Dedicated circuit-switched technologists claimed that packet-based systems simply wouldn’t work reliably. While they were correct to some extent, packetized voice and video are often good enough. Packet-based voice and video have been a resounding success. The video phone call that was unsuccessfully forecast for many decades is now commonplace.
In the end, some people adapted to VoIP, and others didn’t. Those who adapted acquired new knowledge, terminology, skills, and processes. Much of the older technology knowledge was not required in the new world. The breadth and depth of change are what drove the cultural shift.
The transition to automation is like the change to VoIP. Although the syntax of the configurations remains unchanged, almost everything else in the network configuration process is changing. The new knowledge and skills are a radical departure from the per-device configuration of routing, switching, and firewall. You may have heard that adopting automation is a journey. That’s an apt description. The culture change and the learning curve for how to apply the new tools take time.
Here are the factors that you'll have to build into the journey
First, terminology and concepts are very different, and it is those factors that are fundamental to adopting automation. The network engineers must have a good mental model and framework into which the new technologies fit. They need to understand what it means to refer to infrastructure as code (IaC), how source code management (SCM) helps, the function provided by a network source of truth (NSoT), the strengths and weaknesses of different types of automation frameworks (Ansible, Puppet, Chef, Gluware), and more. This understanding allows them to answer the following critical questions:
- How do these new technologies interact with one another?
- What are the correct design and implementation tradeoffs to make when implementing automation?
- How do I learn to effectively use these new technologies?
- What should my role be in an automation-driven network environment?
- When should one tool be favored over another?
How difficult is this transition? In one example, a client has about 400 network engineers, of which about 40 are using automation, and an additional 10 have the knowledge and skills to create and maintain the automation systems. This is common.
Automation will require new processes that streamline the workflow, use data to drive those processes, and validate the network’s operation after each change. A data-driven methodology will standardize the workflows and rely on changes in the data to drive changes to the network. This is where automation pays off. It is possible to automate the provisioning of equipment racks, groups of racks, or even entire data centers using data-driven automation systems.
Practice will eventually reduce the amount of time to deploy a change. Don’t forget that pre-change and post-change testing will reduce the number of failures even though they will increase the time needed to create them. It shouldn’t take long before the automated workflows begin to cover all the normal operation tasks. You’ll stop making individual manual changes, and fewer outages will occur.
The next phase is for self-service operations and overall IT orchestration, where a complete chain of provisioning occurs in order to deploy an application. This type of operation can only be done with automation. It is the ultimate in the way of automation.
I recommend that the automation journey begins by building read-only automated processes. You can start by learning how to perform network validation – are network functions configured and functioning correctly? These can be done incrementally to cover more network subsystems. Then use the same techniques to collect data and analyze it for network troubleshooting. The read-only approach allows you to gain experience without risk to the network.
Then learn how to use a network source-of-truth (NSoT) like NetBox, NSoT, or Nautobot to store the intended network state. This state information can be used to drive the network validation system. Just taking this step will allow you to quickly validate that any manual changes were implemented correctly, reducing the impact of failed manual changes.
At some point, it will be time to begin making automated changes to the network. The biggest challenge will be to stop making manual changes to the network and to rely on the automation system to make those changes. Automation will initially seem much slower. Changes will take longer than if they were manually applied to the network devices. But once you get serious about transitioning to automation, it is imperative that manual changes be avoided because they will still need to be incorporated into the automation system. Remember, if manual changes are allowed, those modifications must then be merged back into the automation system, increasing the change deployment workload. The last thing you need is to try to determine whether the network or the automation system contains the desired configuration.
Fortunately, you can use a phased approach here too. Start with password or authentication system configuration changes. Then tackle simple interface or VLAN configurations and move on to more complexity with access control list changes or routing protocol changes. The ultimate goal is the ability to configure a full data center rack or a branch site by specifying the few parameters that are truly unique to that implementation. Then look for additional ways to use automation, perhaps in chat-ops or self-service IT operations.