dot Stop testing, start deploying your AI apps. See how with MIT Technology Review’s latest research.

Download now

Back to episode list

THE DATA ECONOMY PODCAST

HOSTED BY MICHAEL KRIGSMAN

THE DATA ECONOMY PODCAST / HOSTED BY MICHAEL KRIGSMAN

Optimize Machine Learning Models for Real-Time Financial Services

Scott Zoldi, Chief Analytics Officer  / FICO

https://www.youtube.com/embed/oshYnZYnvkI

“One of the biggest challenges that always constrains us as analytic scientists and businesses that make decisions, is essentially the SLAs around latency…If the latency requirement is 20 milliseconds and your decision arrives at 25 milliseconds, it’s too late”

Scott Zoldi
Chief Analytics Officer  / FICO

Scott Zoldi

Scott Zoldi is Chief Analytics Officer at FICO responsible for the strategy and analytic development of FICO’s product and technology solutions, including the FICO™ Falcon® Fraud Manager product which protects about two thirds of the world’s payment card transactions from fraud. With over 100 patents, Scott is actively involved in the development of new analytic products utilizing AI and ML technologies. Most recently, he’s focused on applications of streaming self-learning analytics for real-time detection of Cyber Security attacks and Money Laundering. 

In this episode, Scott shares how FICO uses analytic models and data to make real-time critical decisions around payments, fraud, and credit. He provides insight to help guide financial companies looking to provide real-time financial services including three tips for modernizing finserv applications including understanding the customer impact, optimizing infrastructure and machine learning models for real-time data processing, and fostering cross-team collaboration across the full product life-cycle.

Transcript

Redis – Scott Zoldi

MICHAEL KRIGSMAN: We’re speaking with Scott Zoldi, the Chief Analytics Officer of FICO. The discussion will cover real time data at huge scale, making decisions that affect all of us, our finances, our credit. Stay tuned for a fascinating discussion. Hey Scott, how are you? It’s great to see you. 

SCOTT ZOLDI: Hi, Michael, I’m doing great. How are you today? 

MICHAEL KRIGSMAN: Good, Scott, tell us about FICO. 

SCOTT ZOLDI: Yeah, so FICO is an analytics company, roughly 60 years old, who focuses on deriving analytic value out of data. And moreover, using those analytic models then to make really critical decisions based on the data that’s presented to them. Many of them in real time, many of them right at the right moment from a decisioning perspective. So it’s great to talk about what we do and also the importance of data here on this program. 

MICHAEL KRIGSMAN: Of course, we all know the FICO score, so that score is a household name. You’re Chief Analytics Officer, what does that role involve? 

SCOTT ZOLDI: So as Chief Analytics Officer at FICO, I’m responsible for our strategies when it comes to the analytics or the machine learning behind our AI solutions. And that includes purview for the types of data that we use and how we use it. 

So it would include situations such as understanding what are the right data elements that we need, when do we need them, what order do we need them in? And to ensure that we can get that all in one place so then our machine learning models can produce the scores and insights that are required to make some of these decisions, whether they be risk, or fraud, or compliance, or even marketing. 

MICHAEL KRIGSMAN: And you have a whole lot of patents in this area. 

SCOTT ZOLDI: Yeah, so I’ve been with the company now for 23 years. It’s a huge passion of mine and you know I was drawn to this company for FICO, because of data. As a researcher, I was originally at Los Alamos National Lab. So I’m constantly inspired by the types of machine learning patents and data usage patents that we can generate based on solving our business problems for our customers. 

MICHAEL KRIGSMAN: Scott, I think we need to start at the highest level. So can you tell us how does data come into play in the business of FICO? 

SCOTT ZOLDI: So the data is really the blood of our business. So to drive the decisions that our customers expect us to help them with, we really need to focus on the data that we need for these models. And being a machine learning scientist, and that being a core part of FICO’s business, getting the data right and having it in the right place at the right time is 80% to 90% of the problem to make sure that we have the right data and that it flows through. 

So in some sense it powers all the decisions that we have to make at FICO. So it’s of critical importance, because if we don’t get that right it’s almost the garbage in, garbage out. We need to make sure that there’s the high quality data coming in. From there then, we can drive the value out of our analytics and consequently our decisioning software, too. 

MICHAEL KRIGSMAN: What kinds of data are you gathering, collecting, working with? 

SCOTT ZOLDI: So the biggest part of our business and the one that we’re very well known for is our fraud solutions. And so this is where we get payment card data. So imagine that you go and you want to purchase something online, there’s an authorization that occurs. And so that authorization comes into our decisioning software and then a decision gets made on, well, let’s say, the probability of fraud. 

And that’s a lion’s share of a lot of the real time data that we have and where we have these sort of constraints around having fixed real-time requirements around the use of this data. But we have other areas, such as we’ve done work in cyber security, we’ve done work in marketing analytics, we’re looking at the items that someone’s purchasing within a grocery store to make predictions around what they may be more apt to buy in the future or want to be proposed in terms of incentives. 

And so there’s a huge variety of this data that we are gathering. It’s really dictated primarily by these use cases in fraud, and risk compliance, marketing, and others. 

But I think for FICO what’s really unique is that we’ve been using this real time data at scale for more than 30 years. So it’s really part of our bread and butter. And in fact, you know you mentioned the patents, of which there’s more than 100 that I’ve authored. A lot of them are focused on the unique challenges you have in the real time environment, working with data and generating meaningful results from models in that environment. 

MICHAEL KRIGSMAN: Which of course then begs the question, what are the unique challenges that arise when you’re working at data of this scale? 

SCOTT ZOLDI: So one of the biggest challenges, I think, that always constrains us as analytic scientists and businesses that make decisions, is essentially the SLAs around latency. In today’s modern cloud compute, many people are interested in scaling behaviors. And they talk about horizontal scaling and they talk about the total number of transactions processed. 

But in a real time application, there is a latency requirement. And so if the latency requirement is 20 milliseconds and your decision arrives at 25 milliseconds, it’s too late. And so these highly constrained, you need to get the data, you need to do whatever data checks you need. 

You need to process that model and generate a decision within, let’s say, 20 milliseconds, it’s either yes or no, you did or you didn’t. And that requires a lot of software engineering in terms of how you orchestrate working with that data and persisting what you need from that data to make the decision in those low latency environments. 

And I think that is one of the key things that are so interesting about real time versus streaming or batch, which is that one needs to always work within a very tight constraint and be very clever on how to execute machine learning models and these decisions where latency is a big, big challenge. 

The other major one, Michael, is around ordering. So in a streaming environment when we work with streaming data you can’t necessarily have as much control over the ordering of data. And that’s critically important. I’ll give you an example, if you think, let’s stick with fraud. If the order of the transactions get mixed up when the model processes them you can have a very, very different impact. 

So if we saw that you changed your email address associated with your credit card and then we saw a strange transaction occurring in Kazakhstan, that could be very different than seeing the same transaction occur and not seeing the information around changing your credentials with your account. And so order really matters when it comes to real time. It also matters for making accurate and responsible decisions based on that data. 

So those are the constraints, I’d say for me, is the latency requirement, and unique engineering, and then the ordering requirement. Because it does matter. Because you’re trying to faithfully reproduce what occurred, let’s say, with the consumer. And it’s not good enough to be approximate, you want to make sure that you have all the information in the right order so you can make the right sort of decision around the state of that customer. 

MICHAEL KRIGSMAN: And you mentioned that your software engineering resources are heavily applied to addressing these two issues, the latency and the ordering. 

SCOTT ZOLDI: When we work with our software engineering teams, what we’re focused on is unique databases for the real time environment, basically key stores. Where we can have a key and we get a store of information back. And how we effectively update that and ensure the consistency of that is a really big part of the engineering effort that we have. 

And being an analytics team, we work and provide those requirements to our development team in terms of how we make sure that those technologies will work appropriately in the real time environment. 

MICHAEL KRIGSMAN: You raise an interesting point. How does the analytics team work with the software engineering team? And what kind of overlaps are there? 

SCOTT ZOLDI: For me, what I generally view is that we are part product managers in some sense. Because we have a view of what we need to meet the business objective and part of that is to basically focus on, what is the art of what we have today from an analytics execution perspective? And then what are the specialized software that we need to develop around it? So very much so, we played this sort of role, two roles. 

We play, part role is we’re playing software developer roles. Because essentially analytics in this environment become very much software. We’re not going to necessarily take a commodity, open source model and be able to execute it under the constraints of real time that we need. And that’s why we have to change the way we develop our models around the constraints imposed from the software perspective. 

And then we have to advance the state of the art of how our software deals with these key stores in terms of getting back the information that we need associated with a particular, let’s say, cardholder at scale. And then at the end of the day, the model and the software become THE software. Because one of the hardest problems that we see, and many in this field see, is that operationalizing machine learning models is really hard. It’s like 10 or 15 times harder if you’re doing it in the real time environment, given the fact that just some technology that’s out there might be good enough for relatively high latencies. But when you get to low latency and where you have SLAs around the performance of these models, and where it matters, that’s where we really have this very, very tight interlink where the role of us as data scientist, and a product manager, and a software developer all kind of get a little cloudy at times. 

Because we’re all trying to solve these problems and it gets deep into architecture which is really interesting. 

MICHAEL KRIGSMAN: And what are the latencies that you’re working with? And also, what’s the scale of the data? 

SCOTT ZOLDI: So the latency we generally will target is something in 10 to 20 milliseconds. That’s generally the bar that we’re trying to hit. Depending on the application, it could be we have a smaller window or a larger window. Very often a lot of it has to do with how the model will be instrumented, like will that part of the model be on-premise? Will it be in the cloud? Are there additional latencies into that chain? 

And so that really defines the latency windows and also the constraints for these models. We’re generally looking, in most of our environments, at having these low latency decisions made at anywhere from 1,000 to 10,000 transactions per second. And so that gives you a sense of the sort of volumes of data. So you know, that gives you a sense around the volume of each transaction each on its own can be solved. 

But when you combine latency and the throughput, the TPS, things get really challenging. And then my challenge, as the chief analytics officer, and really what drives me to find this attractive as an area of research and development is, and then we have to put a really high value of machine learning model in that where the decision matters. And if you think about it, it’s relatively straightforward to take a few data elements on an incoming data stream and add them and compute a relatively simplistic score. 

But to go and have like a fully loaded neural network based model, let’s say which is the foundation of a lot of these fraud and compliance solutions out there today, along with a behavioral profile which we’re storming that, that key store database which has to be tremendously efficient and reliable, it all gets really complicated. But when it works, it works well. And we get this sort of really tremendous sort of value. 

And for a company like FICO, that’s kind of what our customers are looking for. Because each of those customers on their own may not want to invest in that level of research and development, whereas companies like FICO that have a passion for this and have a history of doing it, they’re focused on how to continually improve that while maintaining things like latency and throughput. 

But at the same time continuing to enhance the value of the decisions that get made for the downstream decisioning strategies that our customers employ. 

MICHAEL KRIGSMAN: So your customers are hiring you to make these decisions based on the machine learning models. But it’s the infrastructure that makes the magic possible. 

SCOTT ZOLDI: Correct, and so in the old days, and old days being two decades ago, when our software from FICO ran, we would have the window. And if you miss your window for compute making that decision or might be more accurately, providing the score so they can make their decision, then they do a fallback. And a fallback is a much, much lower sort of analytic value decision. It could be the previous score. It could be a set of rules. 

And so generally the entire industry has gone to a point in the product compliance space, as an example, where all this can be done in real time with proper software. Whereas in the old days, people had to make decisions. Like what are the 4%, 6%, 10% of all transactions that would be real time? And the rest would be what we call online, which would be a delayed decision. Technology has evolved. 

And so now many, many are 100% real time. But it’s absolutely correct, Michael, the companies work with companies like FICO that have this sort of specialized software to meet those requirements, given the fact that it’s so critical to things like fraud detection but also commerce in general. 

MICHAEL KRIGSMAN: And it’s interesting how earlier you described your product as being the bundle of this decisioning, together with the infrastructure or the software engineering. All of that package together is essentially the quote, software that you sell. 

SCOTT ZOLDI: That’s correct. So we want to make sure that we have an ability to bring that data in to update the databases associated with this to generate that score, and then once that score is generated, very often the rules and the strategies are also attached to it. So that automatic decision, so for example, in the payment card space they’ll be an approve or decline decision that gets made. 

And that’s why sometimes your credit card transactions may not go through if it’s something kind of new or risky associated with you. I can only imagine, if you’re at a point of sale and it takes two seconds for you to wait for your credit card to clear, no one has patience for that. And frankly that’s the larger thing that I think is super exciting, Michael, is that everybody is having an expectation of a valuable decision driven by a machine learning model in a real-time milliseconds sort of environment. 

And I just think the use cases are going to explode. I mean, fraud is one of the earliest, most successful examples that dates back for 30 years. But there’s so many more where I think this will just become so commonplace that there is going to be an expectation of real-time insight and decisioning made. And we’ll see more and more of these decisions made across the entire customer lifecycle. 

In addition to that, I think frankly, looking at how data is evolving also get really interesting from a making sure that all the data consents are in place and that we have the controls in place in terms of the data that gets used. And so we don’t need more development here. It’s not a solved problem but it’s clear where the industry is moving. It’s going to be moving to a set of expectations around, I want my machine learning decision or my machine learning intelligence now, not later. 

Because I’m going to differentiate myself because I’m going to interact with Michael through the digital channels in a more effective way. Where real time and latency will matter and also the accuracy of decisions made in that small, small window to make the right decision at the right time will really matter to businesses. 

MICHAEL KRIGSMAN: Scott the decisions that you make are fraught with potential risk and liability that can have significant impact on your customers and their customers, the consumers. So can you walk us through the kind of decisioning that needs to take place in that 10 milliseconds that you were describing. 

SCOTT ZOLDI: So there are a lot of very significant sort of decisions that need to be made. One of the things that I focus on is that the analytics that we develop is for the entire lifecycle. Which means that when we develop a model we have to have full view of the fact that that same model will be in operation for some period of time. 

And as a consequence, we need to make hard decisions about what are the right data sources that we need to make a high quality decision? What are the data elements that are going to matter that decision? And I’ll take iterations. Sometimes it will take working within the environment to understand that basically building models and figuring out what are the right elements. 

But then persisting that through the entire chain, meaning that we’re going to have to monitor those elements, the distributions of those elements, to understand whether they and maybe even the latent features that are in our models that are driving these decisions, are they within the proper ranges? 

And so part of this is what I like to call this concept of auditable AI where you go from the very beginning of what data is being used, and why is it being used, and how often will come in, and what are the assumptions around the consistency of this data? What is the importance of each data element? And what that impact on the decision is, all the way through to monitoring to make sure that those elements are being used properly by the model. 

Meaning that we’ve done the ethical testing, stability testing. And we see shifts. And let’s say, how does being presented to the model? Because data shifts and so those are the environments we operate in, having strict controls on where we alert the users around the fact that maybe the model is maybe degrading or maybe it’s less accurate on a certain subset of customers. 

And that’s a big part of this also, which would be this concept of essentially what we call humble AI. Which basically says that the model has an ability to say, OK, I might be a little bit off the rails here. I need to alert those that are making the decisions so they can incorporate that into the decisioning that they do. 

But this is all part of larger responsible AI, sort of sets of decisions, and conversations, and frameworks that are being discussed today. Because we need to be making sure, especially in the real time environment, when decisions are going to be made in tens of milliseconds, that we have the right sort of alerting in place if a decision needs to be rethought or maybe ignored it at the current time and fall back to a safer infrastructure when those cases occur. 

MICHAEL KRIGSMAN: Before we turn to responsible AI, which I think is an extremely important topic, you mentioned cloud and on-premise. And can you just break down for us how you think about the cloud based versus the on-premise elements of your infrastructure and how the pieces fall together that way. 

SCOTT ZOLDI: Yeah, it’s an interesting sort of environment, cloud versus on-premise. Our business was started like most, on-premise. The cloud is a newer concept. And there’s a lot of great benefits that come from cloud computing. Being able to have access to scalable compute, having access to a large amount of data store, and ability to orchestrate data. And so it’s an important part and critical part of our strategy at FICO. 

At the same time, when we look at low latency environments it poses challenges. If we have to actually figure out the amount of energy or time that is spent communicating to the cloud, that is not useful time spent in many instances. If I chew up 10 milliseconds just getting to the cloud and getting back from the cloud to make the decision, where the decision is sitting in an authorization system which will not be sitting in the cloud, that adds overhead to the overall sort of value proposition. 

And so there is decisions that need to get made in terms of, is the increments to the latency acceptable? And we can work within those constraints. And sometimes that means that our customers might increase their latency requirements for a decision. In other cases, it might mean that we constrain the analytics. So there are values to the cloud compute based on the fact that we can kind of aggregate this information. 

And our clients don’t need to stand up, let’s say, on-premise sort of applications to execute models. But it really comes down to counting all that latency that gets incurred for these decisions and understanding what it means from an SLA perspective. On-premise, though, you have the ability to run it as close to decision as possible and as close to the data as possible. 

The data doesn’t sit on the cloud. The data has to get to the cloud and the decision has to get off the cloud. Whereas versus, if the data is originated at a client site and the decision is being made at a client site, or an authorization environment, having the model close there is really important. And this is where I think we are looking more and more at what we call edge solutions. 

And an edge solution would essentially mean that you have a component of the processing that is done effectively and efficiently in this on-premise environment. Then you have other types of operations that are efficiently done in the cloud environment. An example of that could be the following, in parts of our business we maintain risk profiles for merchants. So we understand where fraud is occurring at merchants. 

And so when you, Michael, make a credit card transaction we understand whether something is odd about your transactions. But we’ll also have an understanding of the broader view of who you’re transacting at. And has there been a fraudulent activity or things that are suspicious at that merchant? 

That merchant aggregation can be done very effectively in the cloud and it doesn’t need to be real time. It can be an asset that is updated in near real time or even once a day in terms of providing that incremental value. And so I think cloud and on-premise are kind of going to merge. And there is a big sort of focus now on what parts of the decisioning software or the analytic needs to be close to data and which pieces can’t? 

So I see a hybrid model is going to be really the way moving forward, particularly for these types of applications where we’re in the decisioning space and one where there’s a latency requirement and real time, or near real time, in the matter. 

MICHAEL KRIGSMAN: So for you, the cloud versus on-premise architecture decision is driven primarily by the latency and the efficiency of returning that result as quickly as you can, as opposed to many businesses make this cloud versus on-premise choice considering security and wanting to hold the data close to themselves. It sounds like that’s not the case for you. 

SCOTT ZOLDI: We’re pretty satisfied with the fact that the security of the cloud is sufficient. There’s been a huge amount of work that we do, but also cloud providers do around that. Each of our customers have their own preferences and their own views into the transport of data, but that’s not the primary concern. I think that is well understood and under control. 

But yeah, it’s going to be more about what is technically feasible? Kind of like when we talk about software, how we have unique requirements for software development, we have unique requirements for the environment. That’s usually the larger driver. The other aspect of this, Michael, is a lot of the cloud environments– you’re right– unless you have specialized software running there some of the commodity functionality you’d find in a cloud based environment may not be really meant for real time computing. 

And so we still see sort of risk-based calls and other things that are really not meant for real time. That’s the other sort of perspective, is sometimes you’ll go into the cloud environment and you’ll see a commodity view of what you need to work with data and to solve analytic problems. And it may work for a large, large, swath of analytic problems, whether it works for the real time decisions is sometimes questionable. 

And so that’s the other aspect, is sometimes having your own software either run in the cloud like we do, or not, quite the additional investment. But that’s the other sort of aspect is, do we have all the sort of componentry within the cloud that would support that real time processing and decisioning. And if not, that’s where these hybrid models would make more sense. 

MICHAEL KRIGSMAN: It sounds like so many of your decisions are being made around these architectural choices driven by the scale and the real time aspect of the data. Which is to say, the type of business that you’re actually in. 

SCOTT ZOLDI: Correct, I mean, we looked at the consistency of my analytics team. You know I have PhD data scientists that really focus their energy on the theory of databases and how to ensure that we can lock records while we update and we have consistency. 

And so yeah, we definitely get into a lot of this sort of deep sort of architecture discussions around this. And I think that’s what makes us successful, frankly, is that we have data scientists that are really focused on that architectural aspect of how we drive a differentiated execution of these sort of models to respond to those constraints. 

MICHAEL KRIGSMAN: I love hearing how you’re focused on every part of the stack from the decisioning models all the way down to how the database functions. You’re looking at everything. 

SCOTT ZOLDI: Yeah, we look at everything. And you know, I think this is. It’s kind of a nice analogy for what real time is. if you look at the real time, think about it like you have one dollar. And you have to stretch $1 and you have to somehow get, buy an entire meal. It’s going to be very, very difficult. You make really hard decisions around what you’re going to have for dinner or how you’re going to orchestrate that. 

And I think if the smallest piece of that chain is inefficient, the whole value proposition is impacted. And I think that is that’s why we’re so invested in that. But I also think for a data scientist, like you know I’m 23 years at FICO, I’m continually challenged. And as I see the software development environments change and new products be available, or new capabilities in open source, allows us to question how to improve the piece parts. 

So this incremental improvement of the processes is something we are continually doing. So I think it also helps with the data scientists really being a big, big part of the overall business success. And so for like a data scientist and having a data science team, I mean nothing could be better because it’s not one of these sort of things where you drop the model and no one can execute it or we can’t meet the SLAs. And we’re all responsible for that. And I think that’s what is really fulfilling. 

MICHAEL KRIGSMAN: Scott, earlier you used the phrase, responsible AI. You used the phrase, humble AI, so where do these ethics decisions come into play? And why is that so important in your business? Because at the heart you’re dealing with real time data. And so what’s ethical or unethical about that? 

SCOTT ZOLDI: So the ethics is a really important part of our thinking here and as is responsible AI. And one of the things that we have to focus on is the fact that in many parts of the world already this concept of being profiled, which basically means that you have a system that is generating an analytic profile of your past behavior, taking in a current transaction, and then making producing a score and making decision can be challenged. 

And when it gets challenged, the consumer, whoever is impacted, has an opportunity to challenge that decision. And so we need to be able to produce the reasons associated with the decision so that both the analyst, who’s talking to this customer, and the customer themselves understand how the decision gets made. And so it’s critically important that we understand what drives these models. 

So we build models that are interpretable. We don’t carry around data that is not needed and that’s good principles in general for real time systems. But they’re tremendously important when it comes to bias and ethics. We don’t want to impute bias by bringing in additional information that a model could leverage, but to learn noise or maybe to bias for one subgroup versus another. 

Treating data as a liability I think, is the best way to look at this. And to say, well, each data element I bring into a solution adds more and more liability to the decision. And treating that with a proper level of respect when we build the models that we understand the importance of each of these data elements. We understand how those are combined by machine learning model and we make sure that we understand whether that could be biased or not towards groups of people or unstable, all important. 

And then driving that on the output to produce those that are reason codes such that the customer can then have a discussion. And in some cases, and I think this is going to be really critically important to this environment, to say I don’t agree with this decision based on this reason. And I think the data is wrong. 

Now that’s a problem, right? Because that requires that we need to have a distinct sort of lineage of where that data came from so the consumer has an opportunity to potentially rectify some of that information. And so all of that comes into when the decision gets made, what drove the decision in terms of the score, what were those reason codes, what drove those reason codes? 

And maybe eventually a discussion with the consumer gets down to what data was used and whether it is accurate or not. And that’s the responsible AI sort of focus, is really getting down to even down to the lineage of what was used to make that decision from a data perspective because it might be challenged by the consumer if it’s deemed to be inaccurate along the way. 

MICHAEL KRIGSMAN: In some cases the analysis to get to an AI generated decision can be almost opaque. How do you ensure that your decisions are lacking in bias to the extent possible, given the complexity that may lurk behind the scenes? 

SCOTT ZOLDI: Yeah, it’s an important point. One of the things that we insist on at FICO are interpretable machine learning models. Which basically means that we have an ability to get past some of the opaqueness of certain types of methods that we could use but choose not to use. 

And so very often this means that the number of types, the types of machine learning models you can use to build a model in terms of algorithms are going to be reduced to a subset, a subset that FICO has sanctioned as interpretable. 

And then we have this responsible AI framework for what we call a model development governance blockchain, where we record how we built that model and tested that model. Many of our models take months and months to build. And so sometimes we’re seen as a dinosaur because they say, well in the cloud you could just throw your data there and flip a switch. And then in two minutes you have a model. 

That’s where you get the opaqueness and the lack of understanding. And the industry is driving down this path of applying explainable AI and other methods to try to explain those models. It’s not sufficient, where you could just build a model that’s in trouble from the get go and you have a proper process. 

So whether it be, let’s say the FICO score, or whether it be some of these real time applications that we build, we go through the steps to build it right. And to have the interpretable models in there and do the ethics testing, and the variable importance, and whatever else we need to do from a stability perspective to have all that built. But also to monitor when it is going to go wrong. 

And I think, I really like this concept of data. I mean and models, and one of the quotes that I really like is essentially it says, data is useful. It essentially says that all models are wrong. But sometimes they’re useful. And I think I paraphrase that one. I say, well, depending on the data, the model is going to be more or less wrong and useful. 

And so this concept of where the data is headed and is it within the parameters that we believe it should be, based on how we developed the model, really determines whether or not this model is one that, and the scores are ones that we’re going to have a lot of confidence in. Or to the point you made earlier around, do we step down to a humble AI or a different strategy there? It’s a big part of how we build these. 

Obviously in real time environment, it’s even more and more critical. Because you’re not going to have the time to interrogate that. You need to have the tools in place to kind of call out where there might be fouls. Or we need additional sort of introspection by a human to make sure that, as we automate these decisions at scale, we’re not automating bias or bad decisions at scale also. 

And that’s why it’s not just the model. It’s not just doing it quickly. It’s also having the checks and balances in there to throw up errors or warnings along the way for certain types of customers so that we make the right decisions. And we’re doing that in an ethical way. 

MICHAEL KRIGSMAN: Is there a tension that sometimes arises when you have an idea for a model that you are very confident will be more effective in one way or another, yet you can’t fully explain it? So you must be tempted to say, let’s short circuit that. Let’s just put that into production and we won’t worry about it now. 

SCOTT ZOLDI: So I’ve never been tempted in that regard. I mean my firm belief is this, our customers, and consumers, and we shouldn’t be guinea pigs of experiments. And I think that occurs far too often. And we see this in the news where models get deployed and they do terrible things and people write news articles about it. 

So that’s why we have the model governance standards that we do it at FICO, so we don’t allow that to happen. What we do though, Michael, is we run extensive sort of research outside of the development. So when we talk R&D, it is research, and think of it as a swim line, and development as a swim line. 

And so before we bring any algorithm that I might, let’s say I think something is acceptable and I can explain it, it would go through at least a year of testing on real data that we’d have in a research environment. And only after a kind of extensive review and testing as a committee would we decide to bring a new technology in. 

So very often like our customers will hear us talk about some new technology but it might take a year or more to bring it into the software. And that’s sometimes related to the fact that it’s hard to implement things in real time, let’s say in that environment. But more often it’s around we’re not going to have our customers be guinea pigs for that type of analytic. 

And that’s where we do that extensive research and testing ahead of essentially prioritizing it for use in the development of models. But nope, never tempted. I think it’s far, far too dangerous for a company like FICO or any company to just put it in there, and see what happens. 

MICHAEL KRIGSMAN: So the governance framework and the adherence to those governance frameworks is central to the DNA of how you run the business. 

SCOTT ZOLDI: Exactly, right. I think it takes some getting used to maybe for new scientists at FICO, but as soon as you kind of think a little bit about the impact that a bad model can have at scale, so one of the great things is FICO is a tremendously successful model at getting models out there at scale. 

But you can imagine that if we develop a model that impacts 80% of all credit card transactions in the country and that model is bad, then you have a huge amount of consumers that are all impacted by that set of bad decisions. And it just becomes far, far too costly. And so we focus our attention on the fact that the impact to consumers is far too great. Same with ethical considerations. 

We’re not going to throw ultimate data at a problem without extensive review and ensuring that the relationships learned are effective. We just can’t do that because AI machine learning is tremendous. And the data that’s flying at us, it presents all these opportunities for us and for businesses. But it also presents an opportunity to be tremendously calloused in decisions where people aren’t going to question the machine learning model, even if we tell them they should. And we may even warn them. 

And there’s this inherent sort of challenge that we can automate bias or automate mistakes at scale. And so the same sort of grand vision for how we can drive a lot of benefit at scale can generate harm at scale. And I think that’s one of the more scary things, that when we look at it as data scientists they all very quickly understand that they don’t want to simply put out a model that has a mistake, that hasn’t gone through governance properly. And then we have something that impacts customers at a scale like that. 

So yep, the governance is core. And I think, very quickly, even that the best of our research scientists at FICO understand why that’s so critically important. And as long as they see a path, Michael, that they can get their research considered for deployment and they understand what that looks like, then they’re fine with it. It’s not so draconian around, these are the rules, thou shall not. Yeah, we do have rules around that when it impacts our customers. 

But there is another process for how innovation occurs. And to the point about patents, we are a phenomenal company when it comes to innovation and patents. We just want to make sure that we fold it into the development fold at the right times and safely for our customers. 

MICHAEL KRIGSMAN: Well there’s no doubt that FICO plays an important societal role. And so as a consumer, I’m thrilled to hear about the safeguards that you’ve put in place. 

But Scott, as we finish up, where is all of this going? Where is the use of real time data? And where is this headed? 

SCOTT ZOLDI: So where my view is right now is that we’re going to get to hopefully within five years, a customer consent model. You know I get very concerned about data assets. And when I talk to third parties or other companies I always ask about consent. And you know, I think what we’re going to see is essentially a tighter ownership by the consumer of their data and the types of decisions that get made with that data and how that data gets used in decisions. 

And I think they’ll have unique ramifications for how models will operate overall, but in real time also. Because we need to make sure that the consent chains are set up and that is part of what flows. Generally what happens today is there’s not Michael’s consent to use for this purpose floating around in an API. It’s somehow handled in someone has a documentation of consent hopefully along the way, and that’s maintained by one of those data providers. 

It’s going to get a lot stricter there. And I think that’s great for consumers. Because that allows each of us to be in more control of our data and how it’s used to make those decisions. So I think we’ll see a big sort of emphasis on these, that the provenance of the data and the consent chains involved. And frankly, the consumers will start to understand a little bit more that they have more control around how their data gets used, to their benefit, in the use of these models. 

And I think that will generate new types of architectures, new types of orchestration sort of that’s required here. And unique challenges that have to be solved from an API perspective around maintaining that consent. So that when the decision is presented, it’s understood that all the sort of consents are in place. And all the sort of work is there such that if somebody wanted to have a conversation about that decision we could get all the way back down to the core data that drove it. 

A lot of those frameworks will get hardened and formalized in the coming years, I think. And it’ll just be part of this sort of maturing of us all becoming much more digital than we have before. And we’re just starting this journey, frankly. We’re seeing what all the technology can do from a cloud perspective, a premise perspective, a real time data storage perspective. 

But now the sort of frameworks around how we’re going to make sure that we have the proper controls in place, the RegTech, which is another huge passion of mine will start to play more and more role. And those that kind of architect for it, back to the conversation around architecture, will be the ones that will be the best sort of positions to do this responsibly and to meet the data sort of constraints, but also consents that our customers will expect. 

MICHAEL KRIGSMAN: Transparency, explainability, consent, and the ability to redress potentially wrong decisions or poor decisions on the part of the AI. 

SCOTT ZOLDI: Exactly, exactly 

MICHAEL KRIGSMAN: Scott, what advice do you have for business leaders who want to use data more effectively in their company? 

SCOTT ZOLDI: So my advice would be to really focus for each business problem that they’re trying to solve. What is the data that they require? What is absolutely critical or necessary? And they may have to partner with their analytics teams to figure that out. But then to really focus on can we make the decision in batch? Could it be done in streaming? Would it be done in real time? 

And each of those come with different sets of kind of considerations, things like latency windows and things like SLAs. And that will really focus them on the technology path that they need to take with respect to data, is around tying the data that we have access to the decisions and the constraints around those decisions that need to be made. And just really laying that out and getting a framework for that. 

Too often I see companies that will focus on, oh we’re just going to use a streaming analytic, a streaming data environment, to solve a problem. But it won’t solve all problems. And so I think really focusing on some of those as three different areas of how we work with data, as a first principle and getting the requirements down for how and when decisions need to be made, would be the very first step in kind of really focusing their journey of how to better leverage that data. 

And again, only the necessary data to limit what’s being ingested that may not be needed for certain types of decisions. 

MICHAEL KRIGSMAN: Great advice. Being clear about the business goal and the types of data that you actually need to solve that data as opposed to getting inundated with everything and the kitchen sink. 

SCOTT ZOLDI: Exactly, yeah, making sure that we understand the decision framework and what’s needed to support that. 

MICHAEL KRIGSMAN: Scott Zoldi, Chief Analytics Officer of FICO, thank you so much for taking the time to speak with us in this very fascinating conversation. 

SCOTT ZOLDI: Michael, my pleasure. I think this topic is tremendously exciting. So I’m pleased to share a little bit of what we’ve done in this area and will continue to do. Thank you. 

MICHAEL KRIGSMAN: Thank you. A huge thank you to Redis for making this conversation possible. Thank you, Redis.

Gain insights on how to use data to drive business growth

Your peers also viewed

Aerial view of a intersecting highway

EBOOK

Data Innovation in Financial Services

The digital economy is challenging bankers to re-evaluate their business models. Learn solutions for the four common challenges that arise when making the shift to real-time financial services.

Stay up to date on the latest data content