EPISODE #68: Why Taming Unstructured Content Might Be AI’s Greatest Enterprise Accomplishment

29 min readJul 1, 2021

( Click here to listen to this podcast episode )

People of a certain age will fondly remember an American TV game show called “Name That Tune”, where contestants won prizes for correctly identifying a song with as few notes as possible played by an orchestra. Name That Tune offers a fitting analogy to explain Indico’s value proposition — automating complex processes involving unstructured content, using as little as 200 sample documents for its AI model to learn from. With just a fraction of what their competitors require, Indico can name that tune (automate unstructured content workflows) in very few notes (documents), using a proprietary “transfer learning” algorithm. Since Forrester Research estimates that up to 85% of all enterprise content is unstructured, this represents a huge untapped opportunity.

Founder/CTO Slater Victoroff is the driving force behind Indico’s ambitious goal of helping organizations turn process into profit. He joins us to discuss the path that led him to embrace transfer learning and what makes this technique ideal for automating workflows involving unstructured content of intertwined images and text. Along the way we learn why “time on task” is a much more intuitive metric than ROI, why machine teaching is better than machine learning, and why companies are lying about how much data they actually have.

Guy Nadivi: Welcome everyone. My name is Guy Nadivi, and I’m the host of Intelligent Automation Radio. Our guest on today’s episode is Slater Victoroff, Founder and CTO of Indico, pioneers of an AI solution that focuses on intake and understanding of structured and unstructured documents. Much of the automation and AI we’ve covered on the podcast is for predictable processes with structured content. However, we haven’t really touched upon how automation and AI can be applied to content that is unstructured, which according to Forrester Research accounts for up to 85% of all enterprise content. Slater’s company focuses on that opportunity, intelligently automating processes with a unique framework that intriguingly requires a thousand times less data than traditional machine learning algorithms. It sounded like something we needed to do a deeper dive on. So Slater joins us today to share his insights. Slater, welcome to Intelligent Automation Radio. Slater Victoroff: Thank you so much for having me Guy, and thanks for the great introduction. You did a better job of hitting the nail on the head from a technical perspective than most do, so really well done on that.

Guy Nadivi: Slater, please tell us a bit about how you got into the field of AI and what led you to found Indico.

Slater Victoroff: I think, like most people Guy, I started because it was an incredibly fascinating place, right? I think we all grow up and we hear about AI and we hear about machine learning and these are these very far off and seemingly almost intangible technologies. And I was very, very lucky early in my tech career to get a firsthand experience with ML, and by the time I got to my sophomore year in undergraduate, I had actually had some success in the field. Now, of course, as any sophomore undergraduate with some success in the field, I believed that I knew everything. And so I said to one of my professors in 2012, “The war is over, deep learning lost.” And I’m going to remember those words forever because it’s hard for me to remember a time when I have been more wrong about something than that specific line. Not that it was a particularly strange point of view to have in 2012, but clearly I’ve built a career around deep learning, so I’m eating crow in a pretty serious way now. But the way that I really learned how wrong I was, that’s the story of Indico. My to be co-founder, Alec Radford reached out to me as again, we were sophomores in undergrad together, and we started competing in Kaggle competitions together. For folks that aren’t aware, Kaggle was kind of this crowdsource data science competition platform, where companies would come, they’d upload their data, and whoever got the best accuracy on a holdout set would win some cash prize. It turns out for various reasons, it didn’t make a particularly good business model for them, but I actually credit Kaggle with triggering a huge portion of the data science Renaissance that we’re still seeing today. I think they did a really, really good job of that. And so what happened was that for the first six months that we were competing together on these Kaggle competitions, I was feeling very, very good about my traditional techniques, right about my ridge regressions, right about my feature engineering approaches. But after the first six months, I actually never made another winning submission for our Kaggle team again. And what had happened was that deep learning really jumped into the fore and a lot of these techniques that maybe Alex Krizhevsky had managed to get working once on ImageNet started to become more accessible, right? GPU started to become easier, and there was this incredible moment when we realized that we could buy what would have been the most expensive supercomputer in the world in 2000, right? We’re talking a hundred million dollar Cray-type supercomputer. We would effectively get the same amount of compute from that in a $500 GPU card that we could buy on Amazon. And that was just an incredible, incredible thing. And so after the first three months, I would say, of just losing to these deep learning techniques left and right, trying every possible thing I could to get around them. I decided it was time to switch teams. No one likes losing, I’m a pretty competitive guy, so after three months I was pretty sick of it, frankly. And that’s where things got really interesting because we tried to bring this deep learning technique out into industry. And what we immediately found is that while these techniques work really, really well in an academic scenario, they were almost completely impractical in the enterprise. To give an example, say you want it to process invoices. What you’d see on the academic side is that, look with a beautiful data set of a million invoices, deep learning is going to be near human performance and it’s going to be heads and shoulders above everything else. The problem is that what enterprise has a million invoices meticulously labeled? We didn’t think that much about the problem when we first started, right? We assumed everyone would have this kind of stuff. But very quickly we learned that we were wrong. And that question of, “Okay, how do we get past this block, right? How do we take this technology that we’ve seen such incredible promise on, on the academic side, how do we actually make this accessible? How do we put this in the hands of industry professionals and how do we do it in a way that is transparent and responsible, right?” And that really was the founding motivation of Indico all those years ago.

Guy Nadivi: I understand that Indico’s intelligent process automation platform uses a technique called “transfer learning” for document intake and understanding. Slater, can you please tell us a bit about what transfer learning is and what differentiates it from other deep learning models?

Slater Victoroff: Absolutely. So this was really the way that we cracked through that problem. And at a 10,000 foot view, the way I talk about transfer learning is it’s the art and science of reusing portions of old models to solve new problems more quickly. And I think, as a lot of readers can appreciate, it’s not such a trivial thing as I’m going to… Well, I guess nowadays you can grab Bird, right? Like those things didn’t exist when we were getting started. But it’s still not such a trivial thing as grab Bird, throw some data at it and you’re good to go. And that’s a lot of why I say it’s a bit of an art and a science understanding what data you really want to transfer across domains, right? What problem framings are effective, and really this question of how do you let a human interface with this technology in a straightforward way? So what transfer learning allows us to do in effect is show up for a particular problem with a huge amount of knowledge already contained in our models. The analogy that we like to use with our customers is this model, it’s not a doctor, it’s not a lawyer, it just speaks English. But it turns out, because we don’t think about it, but a traditional ML model doesn’t speak English, right? And so it’s much, much, much harder for it to learn these patterns. And so what we find is if you start your large deep learning models off with this base understanding, it basically means that every piece of data you then feed through it is going to be that much more effective in teaching the machine what you actually care about and what you don’t, because it doesn’t have to learn as much and it’s really focused on your specific problem, as opposed to trying to learn English at the same time. So what that allows us to do is it means that we can deliver to our customers a platform where they can pick whatever intelligent process automation problem they’ve got, whether this is contract analysis or invoice processing. And the way I talk about documents is documents are images plus texts. So obviously we support them combined and them separately, right? So it can also be pure texts like sentiment analysis, pure image, object detection, right? We think of all of that as the unstructured problem. And we can allow people to make production quality, deep learning based models with a couple hundred samples here, right? And what that allows you to do is actually redefine the way that you do business in this space, is no longer do you have to go and try to hack around Amazon’s Textract, which gets you 80% of what you want. Getting that last 20% is really, really hard. The idea is really make it easier for you to build your own equivalent of Textract, make it easier for our customers to build something that is really custom developed. So we have targeted that 200 samples, right? That minimum volume to allow people to make those custom models really, really effectively, and transfer learning is really the engine that makes the whole thing go. And for our customers, it means that they get all of the accuracy and the profound efficacy of deep learning and the other benefits that it brings, without having to cross that massive data hurdle before they can start seeing the benefits.

Guy Nadivi: So can you speak about some of the more interesting use cases Indico’s transfer learning platform has been applied to and the results you achieved?

Slater Victoroff: Absolutely. There are a couple of public case studies we’ve got that I really urge people to look at, one that I’ll just use because they’ve been an incredible customer of ours and they’re really great partners, and I think that their entire approach to intelligent automation is actually very progressive, and that’s MetLife. So I think people are generally familiar with MetLife as a very, very large insurer, but I think a lot of people are not necessarily familiar with what goes on behind the scenes at an insurance company. And so just to give an example of one, one particular use case we had, there were a huge number of structured settlement agreements that they had to pay out, thousands and thousands and thousands of them. And they actually had an internal policy where whether or not these structured settlements were workers’ comp cases or not, they had to handle them differently. And they actually needed to get the workers’ comp cases over into one bucket. Now it turns out if you’re looking through a bundle of information on a structured settlement, which is going to be like 10 documents, it’s thousands of pages of stuff, figuring out whether or not it’s a workers’ comp case actually is a pretty complex process. So this was a case where very, very acute need, right? They already had Indico deployed, so they had folks that were experts in the platform, and in about three weeks end to end, they were able to process this massive backlog of documents they had, which they estimated would have taken them 11,000 hours otherwise, right? So it was such a great example to see because the whole pitch of our platform is you pick it up, you can become proficient in that. And maybe we helped a little bit just because there was a really, really tight timescale, but generally they were able to run that whole thing end-to-end. They saved over a half a million dollars by using Indico to do that review, as opposed to building a whole review function. And again, they were able to do that in a couple of weeks. And that’s now one of, I’ll say six use cases at MetLife, but it’s an account that’s rapidly growing for us.

Guy Nadivi: If I’m a decision maker at an enterprise looking to deploy a solution for document intake and understanding that handles structured, semi-structured, and unstructured documents, like Indico can do, what kind of ROI can I expect to achieve?

Slater Victoroff: You know, it’s a really, really great question. And I think that the way that our space talks about our why and efficacy is, or rather the way that the intelligent document processing space tends to talk about how well our models work, I think is usually divorced from ROI. I think that you have a lot of people that are really, really focused on what’s the accuracy of the model, right? Oh, what’s going straight through or what’s not? But the problem is when you focus on those metrics, you actually lose the forest for the trees, right? Because when you look at intelligent process automation in production, you’re going to have a mix of documents, right? You’re going to have exceptions that can’t be handled by the traditional flow, you’re going to have things that are processed 80% straight through, you’re going to have things that are 20% straight through, and then you’re going to have things that maybe a human has to tweak one field because the OCR can’t read it, right? So you actually have this really, really broad spectrum of different pieces. And whether you’re looking at STP or accuracy and, honestly even defining accuracy or an STP, that’s a whole other conversation. But whichever ones you take the question of where is ROI, you’re sort of still left holding the check at the end. So what we do actually is certainly you can get all of those metrics in our platform, right? We report them very, very transparently, you don’t have to ask us about them. You can see it at the platform. But what we really orient our customers to think about is time on task, right? And it’s a much, much more intuitive metric for people to think about, and what we like to target is this 80% reduction in time on task. So that means whatever mix we’re using to accomplish it, how much is augmentation versus automation, right? Our point of view is as a customer, you don’t necessarily care so much about that at the end of the day, what you care about is freeing your people up for higher value tasks, right? You care about getting an organizational understanding of this process and really making sure that different people aren’t doing it differently. And you care about delivering scalable growth, right? If you want to double the size of your business, you don’t want to have to double the head count in every single logistics department as you do it. And I would say that where our customers really, really understand those key trade offs, and they understand that unstructured is actually a focus area in its own right, that’s really where we see customers having the most success.

Guy Nadivi: Slater, your company Indico experienced 300% growth in 2020. How did the pandemic affect your business?

Slater Victoroff: The pandemic I think in the early days was kind of scary for everyone, and we were certainly included in that. And I would say that around April, May, I think everyone was wondering what was going to happen. And for a while I think everyone was in this boat of, “Oh it’ll be a couple of weeks and everything’s going to be back to normal.” But what happened is that when we get towards the end of May really, the magnitude started to set in for people. And they started to realize this is not a momentary disruption. This is a fundamentally different way that we actually have to do business now. And it worked out to shine a really crisp light on this problem because a lot of processes that were very manual, highly bespoke, probably not very transparent, but in a physical office, you lose track of that. It started to become really, really clear once everything moved into this digitized remote landscape, that there was not the kind of transparency on these processes that they needed, that there were these bottlenecks in the organization that they might not have even realized that they had. So what we saw over the quarter after that was this massive, massive influx of people to the space, right? People started to realize that, I shouldn’t say started, people have always realized, but there was a dramatically renewed focus on this problem of unstructured. And I think what’s different this time around… The unstructured data problem has been around for not as many decades as deep learning, but decades still. The difference is that people actually started to realize that these problems could be solved today. You saw all these massive, massive fundraises, and I think there’s definitely been wins and losses in our space. AntWorks obviously had that embarrassing fallout with some of their founders, but people do realize that there is a pot of gold at the end of the rainbow now. And that’s really what the pandemic changed for us. Practically, it probably accelerated our industry forward 12 to 18 months, but it also meant that we had people coming to us, not with, “Hey we want to play around with this one little use case. Maybe we want to do invoices, right?” And that’s what things were like very much ahead of the pandemic because people thought that this was experimental, right? We’re going to play around with this a little bit, we’re going to dip our toes. To after the pandemic, it was really clear that people were playing to win here, right? They said, “No, look we’re investigating 20 different vendors. We’re picking one to be the solution for our bank.” And that’s actually where we’ve seen customers that are getting really, really high ROI. They’re executing really, really well. And we’ve seen a lot of other organizations that tried to do it in this very distributed fashion, where they’re relying on individual business units to find bespoke solutions to bespoke problems. It has not really worked out well for them. They lack this layer of governance, and I think a lot of organizations that started down that path before the pandemic, when the pandemic hit, they started to realize the massive governance challenge that they had on their heads where maybe they’ve got 20 different very, very piecemeal unstructured solutions throughout the enterprise. And these are high risk applications, right? I mean, we’re talking mortgage applications, we’re talking payment systems. They’re very, very high risk. They’re extremely sensitive. And so a lot of our customers also, again, with the pandemic, there’s been that renewed focus on the governance of these models because they’re no longer seen as side projects, but organizations are realizing that if they don’t start to handle their document-based processes, this way they’re going to be left behind.

Guy Nadivi: What is your philosophy about artificial intelligence and how is that driving Indico’s roadmap?

Slater Victoroff: I think that there is a huge potential in artificial intelligence technology. At the same time, I think that our space has been beset by snake oil sellers and hype that ultimately has obscured some of the biggest wins in the space. And I think that a lot of that is because, and maybe I’m throwing stones in a glass house here, right? We did start because we love the technology too. I think the difference though, between Indico and where a lot of other folks went is that we got beyond that, right? We recognize that the technology is really, really powerful, it’s very important, but it’s only important so far as it enables the customer to do something different, right? That’s why you see us devoting such… And at the time, everyone thought we were completely nuts, people barely understood deep learning at the time. So the idea that we were, again in 2013, maybe now we’re saying to everyone, “Okay, deep learning is cool, but deep learning is actually not going to work for folks, and we need to be focusing on transfer learning right now.” 2013 people did not get that. And I think now people get that, right? Now people understand all of the gaps and issues with deep learning. And similarly, we always try to stay one step ahead. So I think the question next is now that people understand that there are issues with deep learning, they’re sort of like, “Okay, what do we do next?” We’ve already started, “Look transfer learning is the solution to that.” But actually more than that, one term that we’ve been using more and more frequently is this term “machine teaching”, as opposed to machine learning. And I really, really like it. I think it’s a good summary of Indico’s AI philosophy in so many ways, right? The idea is that the technology is important because it allows humans to create these intelligent automation processes more effectively. And so rather than focusing on a bunch of data scientists in the back room, and someone’s going to tell them what the thing should do, and then they’re going to scramble around and they’re going to tweak something. And then if you want some change to the output of your model, you have to wait two months or whatever it is, right? We kind of said, “Look, we’re not going to do it that way. We think that that’s just fundamentally, it’s a not scalable approach. It’s not what customers want.” And so instead our entire product is really oriented around this idea of we’re going to meet the customer where they are, right? Figure out what their process looks like today. They can presumably just take the product up and be very, very close to that, right? They’re going to have to change not very much about what they do. And then what we’ve done is we’ve back-filled the AI from that, and it’s forced us to deal with some very, very difficult problems. I won’t go too deep into this because I mean, you probably know this is the kind of thing I could talk for hours and hours and hours on. But I will just talk very briefly on documents. What you see across the industry is that people will tell you either the documents are visual data or the documents are text data. And they sold it in one of those two ways, right? And you have people that are good at text data. They’re like, “Oh yeah, you just OCR it and then you treat it as a text.” I think all of those ways of looking at the problem are wrong. You must, must, must, must have a single model that can rationalize, it can reason between text and image data, right? It can’t have the two be separate, right? You can’t talk about whether this is just a computer vision problem or just natural language problem, because it is fundamentally both, and it’s fundamentally both intertwined in a way that you can’t solve with people, right? And the one saying that we have at Indico is tables don’t exist. And I absolutely love it, right? Because you say tables don’t exist to someone and they look at you like you’ve got two heads. And I say tables don’t exist. What on earth are you talking about? I’ve seen a million tables in my life. I probably got a document with tables on it right on my desk. So I say, “Okay, tell me what you think a table is.” And they give a very, very normal answer. I’ve got columns, I’ve got rows. And I’ve got a label for the columns and the label for the rows. And there’s one value at each of those columns and rows. I say, “Great, perfect. When is the last time you saw a table that looked like that?” And that’s when the gears start turning. And that’s when they realize that we have this archetypical view of a table that never shows up anywhere. I promise you this, you look through all the documents you can find, every single financial filing you ever find, you will not find that anywhere. You’ve got nested headers, you’ve got things with headers and no column, oh, sorry, with columns and no rows, or rows and no columns. You’ve got nested structures within tables. And what it becomes really, really clear is that the document is not as structured as we like to think it is. In forms maybe, yeah, you said structured documents. So there are forms on left-hand side. But even when we’re talking something like invoices, they are visual ways of presenting textual information to humans, right? And that’s what a document is at the end of the day. And so without technology that can really reason between the two, I think it’s impossible to really solve the problem. And I think when you look at how humans think about it, they don’t want to have to negotiate with a data science team back and forth to figure out, “Oh what features should I use here? Oh you messed this up” kind of thing. So that’s where we start, we’ve done a lot of research on novel forms of supervision. We launched a paper yesterday, actually, Doc2Dict which for folks on the more technical side, that’s some pretty far future stuff, but it’s interesting.

Guy Nadivi: Back in 2015, TechCrunch published an article you wrote entitled “Big Data Doesn’t Exist.” In it, you provocatively stated that, “My customers always lie to me. They don’t lie about what they can afford. They don’t lie about how much or how little customer service they’ll need. They don’t lie about how quickly they can pay us. They lie about how much data they have”. You went on to say, “It’s a good rule of thumb to assume a company has 1/1000th of the data they say they do, and that the reason they lie about the size of their data is they want to feel like one of the big dogs like Amazon, Facebook and Google.” Slater, six years later, according to Statista, the world’s data volume has gone from 15.5 zettabytes to an estimated 74 zettabytes this year. There’s more data than ever, so I’m curious, are companies still lying to you about how much data they have?

Slater Victoroff: I will start by saying yes, absolutely. And in fact, if anything, I think that looking back on that article, I was too optimistic. But you raise a very, very good point, right? Because it is sort of this paradox of big data, right? Where you do see, in terms of raw volume, what do we have sitting on discs? Those numbers are crazy, crazy huge, right? But when you look at the amount of data that people actually have organized in a useful way that they could use for things, it’s drastically, drastically smaller. And I think the best example I can give of this is I mentioned Indico requires 200 examples. Now from the machine learning space, if someone really knows their ML stuff and they hear you can train a model with 200 examples, we always say, “If you believe us, you don’t understand what we were saying” Because it’s such an insanely stupidly low number that, again, the right reaction to that is disbelief. However, I could not tell you how difficult it is for our customers to get us 200 samples, right? And this is the thing that’s really, really fascinating, is that customers have a huge amount of data just sitting in piles, sitting in repositories, sitting on file systems. And they’ve been told this magical fairytale lie that if you just keep piling it up, someday it’s going to be valuable, right? Unfortunately it is a lie, right? It is not true, that’s not how data works. If you don’t have a really, really diligent strategy from day one, it doesn’t matter how much data you pile up, because you haven’t done it in a useful way. So I would say actually in most cases, if our customers had appropriately handled 1/1000th of their data, it would be very easy. Indico would be ten, a hundred times bigger than we are now, right? Because everyone would just be able to adopt us with a snap of our fingers. I would actually say most of our customers, most industry folks have zero data, right? I would say the most common situation for us is that people have not been storing their data correctly, they don’t have the kind of transparency on their processes that they think they do, and we have to start from scratch with them, right? And it can be very painful for folks to realize that the way that they have been capturing their data for decades in sometimes, it’s just fundamentally not useful, it’s fundamentally not transparent. That’s why we have oriented Indico in that way. That’s why it’s so important to be able to train Indico with such small amounts of data, is because we’ve realized that unless someone can build this data set from scratch in our tool, they’re probably just not going to get there. It’s going to be too big a lift. And part of that is just because there’s so much noise out there in the space, right? Indico, if we’re coming in behind 20 other vendors that you’re looking at, which happens, and probably each one of those 20 vendors has failed. For us to say, if we were to show up and say, “Hey, you’ve got a label 20,000 examples, or even a thousand examples,” it’d be like, “Absolutely not. Go pound sand. Not another one of you guys.” Right? Because that’s what other folks do. So that’s why we’ve really, really focused on what can someone do in a couple of hours to really see that this works, and this is valuable to get them over the hump.

Guy Nadivi: There are many flavors of automation out there, such as RPA of course, DevOps, IT Process automation, Runbook automation, Workload automation, and many more. Some people believe the industry is heading towards a convergence point. Gartner calls it “hyper automation”, and Phil Fersht from Horses For Sources refers to it as “integrated automation”. A recent guest on this podcast said that ultimately integrations combined with AI will mitigate the need for automation altogether. Slater, where do you think things are heading for the automation industry?

Slater Victoroff: You know, it’s a really great question. And I will say that I think the hyper automation pitch has some strengths. Gartner does call it that, but they did get the term from UiPath. So I’ve got to credit UiPath on that, both for coming up with a great term and for getting Gartner to use it. But even Gartner has started to realize that actually, if you think about all the things that are lumped under those umbrellas today, to your point, right? Automation has gotten so broad that the question of what is and isn’t automation, what does UiPath handle? What does any of their competitors handle? It’s getting very, very fuzzy for folks. So I’m not going to talk about the entire space, right? Because look, the simple fact that DevOps automation companies exist, and Automation Anywhere exists, that shows you that there are different slices of automation. So I’m actually just going to talk about the AI side. I think that the trap that people get locked into is that RPA has what seems like a very, very clean interface. No, no, no, it is a clean interface, right? I’ve got blocks and they’ve got inputs and outputs. What they don’t realize is that that is just a layer of abstraction, right? And here’s what I often say to people, right? Because I think IPA, Intelligent Process Automation, is very frequently people ask this question, “Is IPA a part of RPA? Is it part of hyper automation? Is it something separate?” And they’ll often point that if I’ve got an invoice processing module, let’s say, I can drop that in as a block in Automation Anywhere, Blue Prism, UiPath, whatever. And you can buy Indico this way as well. What people don’t realize is that it is just as true that IPA is a block in RPA. The reverse is just as true, right? RPA is also one block in IPA. And so when you think about that invoice processing process, and it’s something where I think management is often very, very confused, right? Because they have a person that handles this and to them the person is the ultimate black box, right? But they don’t realize there’s a whole process there, right? Just the same way that figuring out who has to get the document to the first place, it’s kind of an RPA process often. The question of what to do with the document, exactly what you need to extract from it, it’s actually its own process entirely. So yeah, we’re an activity block within RPA in some cases, but RPA is also just sort of a data source block from our perspective. What I’ll also say that I think is really, really interesting, especially as we look at RPA, is that I think that the narrative that the automation space is coalescing and collapsing. I think it’s really… Obviously the automation providers really would like to have you believe that, right? That’s very, very good for their business if it is collapsing. But I think that if we look at the evidence, the opposite is actually a lot more true. Certainly UiPath’s IPO, super successful. They’re doing awesome. They’re absolutely killing it. Good for them. But also look at how many new RPA vendors we’ve had pop up over the last 10 years. Right? I mean, look at Power Automate, look at Lightning Automate, right? Look at the fact that all of these major big tech players are starting to get into the automation business. And then to your point, it’s expanded really radically beyond this initial mission of just RPA to we need automation across our entire industry, across the enterprise, right? And this is another piece that I think people really, really miss, is that RPA actually solves… It is an important part of the problem, but it is a small part of the problem. And I think that people reasonably can say that most RPA solutions, they serve as alternatives to better APIs, better internal tooling. And that’s certainly true, right? But I actually think it doesn’t give RPA enough credit because RPA, like UiPath knows that, right? And they’re very intentionally playing into that. But what they’ve realized is that having the ability to develop more quickly and develop in a way that is not IT bound, because everything at a big organization is IT bound, right? That’s really what’s valuable about RPA. So I think it continues to expand, and I think that when you look at the increased goal around broad automation, again, whether you call that hyper automation, Whether you call it an intelligent automation, I kind of prefer that I think if anything’s going to become a more commodity, it’s the more robotic side of the spectrum, but however you believe that’s going to go to shake up, right? I think that we’re still very much in an expansion phase, right? I mean, RPA’s CAGR is still growing at a stupid number, right? And IPA is close behind it. To your point, Forrester says 85% of the data out there is unstructured. RPA only works on structured data. And so, in my view IPA and these other successive automation industries that are popping up are larger than RPA, right? And I mean, RPA is going to be a great business, but I think what it’s really done is open the path to the third wave of automation that now comes later. So I think it’s still very much an expansion space. I think that the main question for me to ask right now is when you’re looking at Google, I think in particular, they’ve got that partnership with Automation Anywhere. But I think the pressure is really on for Google to make a move in this space, because Microsoft and Salesforce have both made such strong entries and Google’s kind of got nothing. So that’s something I’m very interested to see what they’re going to do. But also I would say that when you look at the intelligent process automation offerings out of Automation Anywhere, I’ll say particularly, IQ bot has not been particularly successful. And I think that UiPath has done a really, really good job of avoiding a lot of the IPA problems, knowing that RPA shouldn’t try to do everything. So, that’s how I see things evolving.

Guy Nadivi: The automation industry and the greater IT industry in general, is aspiring to be more inclusive. That needs to start with the educational system though, by fostering more diversity in STEM subjects. Slater, what are your thoughts on making STEM more inclusive and diverse?

Slater Victoroff: I mean, it has to happen. I’ve got a very mercenary view on this, which is we’ve got a lot of stuff to do. We’ve got a huge number of problems that we have to solve as a society, right? And the idea that we’re in this arbitrary prejudice… Maybe arbitrary is not right, there’s plenty of historic reasons why they’re drawn along these lines, right? But we’re just segmenting so many people out from my perspective and sort of telling them, “Oh, no, like you don’t look like an engineer. You don’t look like a scientist to me.” And it’s the most frustrating thing in the world for me as a hiring manager, because I’m so, so desperate for high quality talent. And the idea that the K-12 education system has made all of these choices for me about what kind of people I’m interested in ahead of time. It’s sucks, right? It is a bad experience. You know, I think that obviously I’ve spent a lot of my time in K-12 education just personally volunteering, and Indico, we’ve done a pretty good job from a work perspective, right? Having certainly a more diverse company than is typical in tech, but that’s not a very high bar. I think that colleges need to be doing more. I think that when you look at the gender ratios in engineering schools, right? That to me, that’s another really, really obvious sign. My alma mater, Olin for folks that don’t realize, has an approximately 50/50 gender ratio. It’s clearly a doable thing, right? Olin has made really incredible progress in engineering education and is recognized around the world for it. So I think this sort of false narrative of a meritocracy, or this notion that it has to be that way, I just absolutely don’t buy it. I appreciate people holding companies responsible for this. I think we absolutely need to continue to do that. And companies do need to do more. I think that we, similarly, we need to do a lot more to hold K-12 and higher education accountable, right? To me, that feels like a massive, massive gap to me. And having seen what happens in a lot of these K-12 classrooms, by the time they get to college, if they get to college, it’s too late for me to help in a lot of ways.

Guy Nadivi: Slater, for the CIOs, CTOs, and other IT executives listening in, what is the one big must have piece of advice you’d like them to take away from our discussion with regards to employing intelligent process automation for unstructured data at their organizations?

Slater Victoroff: I would say that the most important thing for you, if you want this to be valuable is to toss away the science fiction notions that you’ve got, right? If I had to tell you, I can usually identify in the first five, 10 minutes of talking to a prospect, whether they’re going to be successful with intelligent process automation and get a lot of value out of it, or whether they’re not. And what it comes down to is how focused they are on ROI and the business problem as opposed to how focused they are on tire kicking the tech, right? And I think also really importantly, people have often, we joke sometimes at Indico, we call it the scrape-the-internet-and-do-magic-things use case, right? People have this belief, that AI is just this magic super brain in the sky and you toss it data and again, it does magic or it’s a genie in the bottle or some nonsense, right? And the people that will not be successful here are the ones that hold on to that way of thinking. It’s not what the technology’s for, it’s not what it’s useful for, it’s absolutely the wrong path. And there are hundreds of gravestones, right? Every project has gone down that path has failed, but you know, it’s appealing, so I get why people still do it. So the companies that we have been really, really successful for, the customers that when they adopt intelligent automation, it turns into a huge competitive advantage for them and they just start saving millions and millions of dollars overnight, right? There are people that realize that unstructured is a capability that they need to work on their business, and like any other capability, they’re going to get better at it over time. It’s not a magic wand, right? And I would also say that the people that treat it the most like regular software, right? They have governance processes, right? And they think about staging and production, right? And they don’t put it into this magic bucket of AI, right? That’s my number one lesson, and I know people out there, they pitch these notions of 100% accuracy, right? That absolutely doesn’t exist. If you think 100% accuracy exists, even for your human process, you will not succeed in intelligent automation. Period.

Guy Nadivi: All right, looks like that’s all the time we have for on this episode of Intelligent Automation Radio. Slater, it’s a bit mind bending to ponder that despite all the automation and AI being put to work a organizations today, it’s primarily only being applied to the 15% of enterprise content that’s structured. The other 85% of content which is unstructured reveals that automation and AI has barely scratched the surface of the monumental opportunity ahead. Thank you so much for sharing your thoughts with us and articulating how Indico is leading the way on that challenge.

Slater Victoroff: Thank you so much for having me and thanks for a great overview of the space.

Guy Nadivi: Slater Victoroff, Founder and CTO of Indico. Thank you for listening everyone. And remember, don’t hesitate, automate.

SLATER VICTOROFF

Founder and CTO of Indico

Slater Victoroff is the Founder and CTO of Indico, an enterprise AI solution for unstructured content that emphasizes document understanding. Indico’s framework requires 1000x less data than traditional machine learning techniques, and they regularly beat the likes of AWS, Google, Microsoft, and IBM in head-to-head bake-offs.

Slater has been building AI, machine learning, and deep learning solutions for the enterprise for the better part of the past decade, having worked with everyone from the federal government to two-person startups to the Fortune 100. Slater has educated hundreds of business users on successfully implementing deep learning through a simple framework that helps executives rapidly accelerate the adoption of the technology in their businesses.

Slater can be reached at:

LinkedIn (Personal): https://www.linkedin.com/in/slatervictoroff

LinkedIn (Indico): https://www.linkedin.com/company/indico-data-solutions/

Twitter (Personal): https://twitter.com/indicodata?lang=en

Twitter (Indico): https://twitter.com/sl8rv?lang=en

Indico: https://indico.io

EPISODE #68: Why Taming Unstructured Content Might Be AI’s Greatest Enterprise Accomplishment

SLATER VICTOROFF

Written by Guy Nadivi