Automating IT operations is mainstream today, and in fact a strategic priority for many enterprises, but this wasn’t always so. In an earlier era, when automation tools were still embryonic and difficult to use, IT executives had to not only grapple with technical challenges, but also budget justifications, organizational complexities, and not least of all the dynamics of cultural resistance to radical technological change.
As a senior IT executive with market leaders such as Capgemini, Ellie Mae, & NetApp, Dave Robbins was a pioneer in many bleeding edge automation initiatives for these organizations. He shares his insights with us on what worked, what didn’t, and why advice from Henry Ford might be your most powerful tool for succeeding with automation in your enterprise.
Guy Nadivi: Welcome everyone, my name is Guy Nadivi and I’m the host of Intelligent Automation Radio. Our guest on today’s episode is Dave Robbins, formerly the CIO of Ellie Mae, a software company that processes 35% of all US mortgage applications. Prior to Ellie Mae, Dave was the CTO of NetApp, a Fortune 500 cloud data services company well known to many people in the IT industry. Dave currently serves as an advisor to a number of companies including Benhamou Global Ventures better known as BGV, a leading Silicon Valley VC, and in these leadership roles Dave has had an extraordinary amount of first-hand involvement with automation. So we asked him on the show today to share some of his experiences and more importantly the insights he’s gained as a result. Dave, welcome to Intelligent Automation Radio.
Dave Robbins: Well Guy, thanks for having me. It’s a pleasure to be here.
Guy Nadivi: Dave, you were involved with IT automation at a fairly early stage during your career at Capgemini. Can you please tell us what it was like automating IT operations back then and what takeaways you learned from that experience?
Dave Robbins: Sure, you know back in those days it was … you know, we were in the outsourcing business in North America and you know, Capgemini is a global company and running various types of outsourcing in different places around North America and in Europe and a lot of it was fledgling. I worked 15 years at EDS and comparing the two at that point in time was … you know, EDS had been in for much longer, it was much more mature, so a lot of this was emerging and the outsourcing type deals that they were doing were … you know. You take over somebody’s infrastructure, you move it to your data center, you optimize process, and you try to drive costs down and make some money that way.
So while we were attempting to increase the capabilities of that out of our Kansas City office, we worked with IBM as a key partner to implement an SAP On Demand, an infrastructure, and in those days people were talking about being able to do that but not many had achieved it, if any, and we used Tivoli Provisioning Manager. IBM actually donated a bunch of equipment, you know servers, storage, software, everything was gratis, and the team there went through a pretty lengthy process of planning. They were a highly committed team in Kansas City that worked on that. And we, at the time created I think, a completely state of the art system where it was automatically provisioned, it had client segregation, it had dynamic capacity management so that if thresholds were surpassed in certain areas of the database or in the app server layers, servers were added. Capacity was added, and then they were decommissioned as the demand came down.
So it was really a great project, if you will, and we created a real nice environment. Of course IBM’s objective at the end of this project was to sell us the kit, that they had actually worked hard with us to create, so that we could then provide that as a service through our outsourcing organization to clients. So we were very happy with the outcome, but the project in the end didn’t turn out quite the way we wanted it to because there’s a lot of things when you automate, or when you create an environment such as that, they have to be absolutely in lock step. When you’re going along that process, a lot of things look aligned, but I think you have to check in very deeply to find out are they really aligned, and are you really going to be able to use what you create in the end.
In the end, the sales team didn’t know how to sell it. They knew how to sell, “I’ll take your kit and move it into our data center and then we’ll optimize it.” They didn’t know how to sell, “I’ll take your SAP installation and host it for you.”
And so we really got tangled up in the end when it came time to utilize this thing, this great thing that we created, and in the end it kind of fell by the wayside and nobody used it. It was a great activity, great lab experiment if you will, and a great learning experience to really feed into future projects because while we did a lot, it really didn’t bring the value in the end that we wanted it to. But people learned a lot through the process. That was the good outcome from it.
Guy Nadivi: And I’m sure that was applicable down the road to a lot of other projects.
Dave Robbins: Sure. I think it helped Capgemini in that outsourcing area in North America, really think, because everybody was aligned pretty well, I mean we thought they were. But they didn’t really know what it was. It sounded great to have a dynamic infrastructure, run it at a lower cost, higher leverage of assets. Those are all beautiful words, and everybody’s gonna say yeah I want that. But how do you actually deploy it, sell it, run it and gain revenue from it.
Guy Nadivi: When you were CTO of NetApp, a Fortune 500 company with annual revenues at the time of over a billion dollars a year, your growth rate was close to 30% year-over-year for four consecutive years. And it was during that stretch you grappled with how to scale up server capacity and other infrastructure while maintaining a reasonable headcount. How did you leverage automation to achieve that kind of balanced upscaling?
Dave Robbins: Yeah, that was a really tough time, it was exciting, and you know, a company like NetApp even today is very dynamic and growing and kind of the comeback kids if you will of that market. What we ran into was the situation, the more capacity they demanded, essentially the only answer that the team in place had was to add more people to support the incidents that would come out of that or the configuration of deployment requirements that would come out of the growth. Driving more server deployments.
Of course the infrastructure that was there that this company was growing on wasn’t brand new. It was in a state of atrophy really. Some of the leaders there told me when I met with them, we’re sorry we did this to you, because we starved G&A spending for years after the market bottomed out in 2001. The comeback was a long ride and they starved G&A, so a lot of the investment that would’ve normally been done to refresh the capital, refresh infrastructure and put more modern things in place really wasn’t there, so everything kind of had to be done at once.
So we looked at other tools at the time, we looked at Opsware, and we made a pretty big commitment to that product, and it covered configuration management, so it had the CMDB component, had the network components, and server provisioning automation. It even had some ties into incident management, at that time we were using Remedy, so it had some hooks to be able to create managed tickets automatically through the product. And it was an interesting project, like I said it spanned the configuration, the network, and the server, and as we deployed that, it clearly was, we found that the most motivated team was the network team, and part of that motivation was they had evolved or devolved, if you will, to this global network that was essentially a point to point network.
It had grown so large, there were “Sev Ones” 10 to 15 times a month, where network engineers were getting paged because a routing table got updated, a router flapped, a network went down, and they had to do something to fix that. So they were very motivated, but they were stalled because they couldn’t figure out how they could even inventory effectively what was there to put the plan together to normalize the IOS levels, patch levels, and then be able to go from there to migrate to newer assets and even a newer network.
They were point to point, it was a time where we were looking at pretty rapidly going to MPLS to support the scale of the company. We were in 100 countries at the time, so it was a pretty big effort to manage all those assets. So they were very motivated and very quickly they saw things they could never see before with that network tool. It basically gave them the configuration of every asset in the network so they knew everything right away, and instead of manually find the asset, log into it, determine what was there, they were able to do that inventory…we found out we had over 55 versions of IOS at various patch levels. I don’t know what the actual permutation was of variable there, but it was a lot.
That product allowed us to very, very quickly, that automation allowed us to get a hold of the configuration, get a hold of change, because when changes were made, they were alerted to the key people in that organization. So they knew, it actually surprised one network engineer, that his boss called him the next day and said, “Did you have a change control for that change you made last night?” And the fellow was like, “Wait a minute, how did you know I made that change?”
So it really put kind of a very nice magnifying glass on that environment to make sure you knew what was going on, and it also gave us some tools to update the IOS. It updated the IOS down to, I think it was eleven versions of IOS, and the only reason there was that many in the initial pass was that the diversity of the model numbers of the switches and routers that were out there drove a certain IOS configuration. But to take it down by that many that quickly within about a 3 month time period was pretty amazing. And 3 months later we were fully upgraded to an MPLS network and there were zero callouts per month to the network engineers. There were no network incidents after that.
They saw, they believed it would help them, they knew they needed it, they drove the implementation from themselves, they had the business support to get the availability up. So they really achieved their objectives very quickly, and they had a much better quality of life after that, from the standpoint of not getting woken up in the middle of the night. And even change management after that, where we had to grow and add things was a breeze. They were able to push a button and deploy an asset basically.
So that piece of it worked fantastically. On the other hand, the server deployments component and the overall company-wide CMDB kind of flailed a bit. If I had to boil it down, I’d say the human factors here, if you think about the Capgemini case, the human factors there were all positive. The team was committed to create something new. They were motivated by new technology, not being stuck in these old school customer migrations that they were doing, just moving assets and managing old technology. So at the end of the day, this team, the network team was highly motivated because they didn’t want to get those SevOne calls anymore, and they knew the business impact was critical, and they knew they had to get a hold of this configuration that was such a mess, they had to do something.
The server guys didn’t quite believe they needed it. And they didn’t know exactly what would happen to them if this tool did their job. I think their belief in the benefit was low, their fear of change was high. The thing about belief is a key factor in anything. If you don’t believe that it’s going to work, or if you don’t believe that it’s going to make your life better, then your commitment is going to be pretty low. So in the end we got a lot less value out of an area that, probably in the end would have had a lot more potential for value, than the network piece, just because the organic growth of the company was driving server demands and storage demands and so on, that would’ve been clearly met by that part of the product.
Guy Nadivi: During your tenure at NetApp, you were involved with an initiative called Cloupia, that was eventually acquired by Cisco. What interesting insights can you share with us about that venture?
Dave Robbins: Well again, you get down to motivation and belief and commitment. You know Cloupia was an interesting product because when they came out, they were a product, order of magnitude that was selling in the $100,000+ kind of category for a license. And they were competing with companies like Opsware that were millions of dollars to procure and deploy. And the feature set that Cloupia had really had a huge overlay between monitoring, automation and a number of other pieces where you’d have three or four tools in place to do those things, and they had interfaces to change management systems as well.
So when I became aware of Cloupia, I met the CTO and CEO and we chatted about a number of things, I believe I met them first at a Gartner event perhaps, but anyway, we brought them back into NetApp and it turned out they had been working with the product side of the organization as well a little bit, and at that point I was CTO and I had a lab that was a model for how you manage NetApp assets in a typical kind of IT environment and we needed a way to rapidly provision and deprovision assets in that lab and I said why don’t we go ahead and look at Cloupia and see how it works in that environment. The lab team was highly motivated to do this because they had to break down configurations and redo them all the time.
Cloupia, we put them in, and I’d say by the end of day two we had a product that was valuable. So the time to value with them was enormous, not only was their feature set great, but their time to value was really great. And so, at that point I was doing a lot of customer interaction and demos and showing how things work, and I actually demoed some provisioning, at one of my last runs at customer events over in Germany I think it was…anyway, I demonstrated the ability to do that type of provisioning and I had people rush the stage at the end, you know “I gotta have that.”
It was really kind of interesting, again the dynamics of the project were very different, it was you had a small team that was in high demand, that was very resource constrained, that believed they needed this kind of a tool with automation to really give them the ability to respond to that demand without killing themselves to get it done. It really turned out quite well, and we did a lot of joint presentations actually with Cisco, who ended up buying Cloupia, and called it UCS at the end of the day, that was their tag for it.
It ultimately became one of the only tools that was certified in the NetApp labs as full end-to-end management and automation for the whole FlexPod environment that NetApp and Cisco came up with. That was a great experience and from a lab perspective it worked out perfect, and gave us a lot of value very quickly. It ended up actually getting presented at a quarterly all hands meeting and its very rare to see an IT project that’s kind of infrastructural-based get exposure at that level but even the business side was very proud of what we had done.
Guy Nadivi: Back in that earlier era of IT automation, ten plus years ago, I think a lot of our listeners would be interested in knowing what kind of key metrics were used to persuade management that an automation project was justified?
Dave Robbins: Well, the classic metrics that were used then were, I’m not saying they were right, but the classic metrics were avoidance of staffing or reduction in effort to allow for growth. So if I had ten people and I could put some automation in, those ten people could configure ten servers a day today, with automation they could do a hundred. So that would allow me to scale my capabilities in a way that wasn’t linear to the business growth.
Those were the typical arguments that were made, and it was really the cost offset of those things. I think the failure in those days and maybe today as well, is what metrics make sense today, I think not only do those metrics make sense, but I think it also makes sense to say, “how many servers am I provisioning per day, how many incidents did I avoid, could I do automatic resolutions on certain things. When I saw a capacity issue, could I automatically raise a ticket, could I automatically do the provisioning and could I automatically close a ticket?”
So those sorts of metrics, I don’t think at the time we saw a way to easily instrument that kind of data, so we focus more on the avoidance of staffing as the metric to get the project approved. At the end of the day, those numbers aren’t really…once you get the project approved, nobody goes back and looks at those numbers. So I think you’ve really hit on kind of a failure, of our ability to articulate the value of automation and how it actually meets that value over time. And it’s not about staff avoidance, it’s more about what does the automation actually achieve for you.
Guy Nadivi: So for an IT executive listening to this episode of our podcast, what are the…are their specific key metrics that you would suggest they use to justify an automation project today?
Dave Robbins: I think that what you have to do with an automation project, is you have to look at the full business stack and you have to think about what are the current metrics; how long does it take to do certain things? How long does it take to implement a change? What’s your QA process like? When you provision and deprovision servers or when you manage network assets, what are the things that relate to that whole business stack that you’re working on and then what’s the constraint that’s there today?
I think what you could do, is if you could analyze a full stack and say what are my constraints today, and then make those your automation targets and then make sure that those are your key metrics. So if your constraint was it takes me 6 weeks to deploy a certain size of a change in a business app because of all the coordination required to cross the different organizations and the time to get your test kits redeployed if there are errors during testing, those sorts of things.
So think about the full stack and think about what does it take you today and make sure…and you may not have those metrics, you may have it just takes me that long. So you gotta get into, how’s automation going to impact that, or how’s some intelligence going to help me deal with that better, some insight that automation or AI might be able to bring. And then make sure you’re targeting the metrics that mean something to the business.
Guy Nadivi: Sometimes announcements of automation projects can trigger various forms of resistance among staff; psychological, emotional, even on an organization level. Dave, what’s your prescription for IT executives to get buy-in from staff on automation initiatives?
Dave Robbins: That’s a great question. I think a lot of it depends on the culture of your organization and you have to be in touch with that. So what motivates them? What creates the belief that you want to create? Because I think “belief” is the key word in this thing from my perspective.
Dan Warmenhoven, was the CEO at NetApp for many years, used to quote Henry Ford and he’d say, “Believe you can or believe you can’t. Either way you’re probably right.”
I think that’s a great quote to remember, because the human factor and whether they believe, and you gotta think about what do they believe, do they believe that this particular automation is the right one. Do they believe that it will help them? Do they believe it will fix the problem we’re after? Do they believe they have the right resources to get it done? Do they believe they’re being asked to do something reasonable given their current work load?
It’s kind of like, you’re usually asking somebody to do this that has a day job, and that has demands that are overwhelming for them in some cases, so how do they get through not doing the same thing they’re doing now over and over, and changing that cycle, and believe that this is the right thing to do.
And I think a lot of that depends on culture, and an organization that’s clearly aligned with the business and they’re motivated by business outcomes and they can see how the business outcomes actually make their lives better; that they might be more inclined to do that. Where you’ve got a larger organization, and their people are coming in, doing their jobs, not necessarily in touch with the details of the outcomes and their particular impact on the outcomes. They may think more about what it is to them. So I think you need to figure out where your culture is and then figure out how to market and sell this to that group of people, which is key, to get your deployment done and achieve the objectives you’re after because…and they need to be engaged and involved in it too. But again, you’ve got this kind of cycle in most organizations that happens that says, these resources are too busy to change, and so to get them to buy in to something takes a break from what they’re doing and some clean whiteboard thinking. And that’s hard for people.
And change is threatening to people too, and change equals problems they have to solve because nothing you change, most things you change, aren’t perfect the first time. So there’s a lot of problems in the mind that go against the belief and I think understanding your environment, your culture, where this person is in the organization. Are you a small organization, you’re tightly aligned with the business. Or you’re a big organization, these guys are in the bowels of the organization. How do you deal with those different environments, so it’s very situational. But I think it comes down to the human buy-in can be either a great thing, like in the Capgemini example, or it can be a challenge like it was on the server side of the NetApp example.
Guy Nadivi: Last question for you, what advice do you have for IT executives who want to dive into, or scale up, their automation initiatives?
Dave Robbins: I think I would start with a clear objective. That could be a particular project. I like to think green field, you know, if you can create a scope that is not encumbered by the thought that’s we’ve always done it this way, or I’m not sure how that’s going to change my life. Then you say, wait a minute, this is a new thing, it’s a new box, there’s nothing in it and we’re going to create this project. And if you can isolate the thought process within, okay we’re not thinking about what happened yesterday, we’re thinking about what’s going to happen tomorrow.
Get your best team on it, put clear objectives, and get back to those business metrics, whatever they are to your organization, and I mean business metrics not server metrics. How many servers you provisioned? How many incidents you might have avoided? What the avoided downtime was? How much more quickly did you get a release deployed and with how many fewer errors? Whatever the metrics are for your business but really not infrastructure and not business app, end-to-end. If you can create that team, kind of like an office pilot or POC, whatever you might want to call it, and focus on those metrics…
I think what’s an interesting sideline of human behavior is people will watch that. So if you take people out of one team and you put them on this dedicated team to achieve this result, they still go to the same lunch room together, they still talk about things. I think that the drag you get from a success, you know what the outcomes are, where somebody’s like, “well I don’t have to be up at night anymore doing that stuff” or “I don’t do that work anymore, I just push a button.”
And you see that that person is still gainfully employed, they’re still having fun, they’re still challenged. And the project met its objectives and you create a drag that’s like, well I want some of that too, hopefully. And then you create another project. But I don’t think big bangs work. I think you have to carve out, what is it that you can isolate to verify, validate, and prove to the whole organization that this isn’t just the next thing out of a CIO magazine, or whatever, it’s not just a fancy new product.
It actually, these things together, with this business app, create a different dynamic for managing your application.
Guy Nadivi: Sounds like sage advice. Alright, looks like that’s all the time we have for in this episode of Intelligent Automation Radio. Dave, thank you so much for joining us today and providing some great first-hand accounts about the role of automation in IT operations. It’s been good having you as our guest.
Dave Robbins: Well thanks for having me, I’ve enjoyed it.
Guy Nadivi: Dave Robbins, former CIO of Ellie Mae, former CTO of NetApp, and currently an advisor to a number of companies including BGV. Thank you for listening everyone, and remember, don’t hesitate, automate.
Former CIO of Ellie Mae, former CTO of NetApp, and currently an advisor to a number of companies including BGV
Dave Robbins has over 30 years of management and directorship experience, including the executive leadership roles of CTO of NetApp and CIO/Sr. VP of Ellie Mae, respectively. During his four years with leading mortgage industry technology provider Ellie Mae, Mr. Robbins played a direct role in helping the company transition from 70% self-hosted business model to more than 80% SaaS or Hosted Services. As CTO of IT at NetApp, Mr. Robbins spearheaded the development, implementation and enforcement of a three year technology roadmap and adoption strategy for IT systems, networks and storage. Prior to this role, Mr. Robbins served as the company’s Global Infrastructure VP, during which time he was responsible for leading a global team of more than 170 professionals, managing annual OpEx spend of more than $80M and a capital spend of $20–40M. He is highly respected in the industry and has published several articles in both national and global publications, including CIO Magazine and Network Solutions.
Ritu can be found at: