Online Payments Risk Management
Part II. Organization and People
The Goals and Functions of a Payments Risk Management Team
What Does a Payments Risk Management Organization Do?
Every organization is different in its setup, the types of people it has, and its historical constraints. This is why I’m focusing on a list of functions that the RMP team should be in charge of and contain, as well as the relationships between those functions, rather than who reports to whom. In describing these functions, I think of a RMP team that deals with a wide array of problems: merchant on-boarding, fraud, abuse, credit issues, and more. As you decide to focus on specific domains, you can trim and only use some of these functions; however, all of them should exist to some degree, even if your problem domain is more limited.
A RMP team needs to be able to quickly identify threats and loss drivers, quantify and understand their origins, and use or develop tools that will allow it to manage those threats while monitoring them. It needs to be able to cater to its own needs quite independently because it has unique needs and because standard development cycles don’t work well with the everchanging landscape of user behavior. Finally, it needs to grow and foster domain expertise, making sure that it is documented and shared across the organization.
In its broadest form, a RMP team should contain several functions:
Operations, Decision Automation, Analytics, and Product.
Payment Risk Operations: Making Sure You Run Smoothly
Operations is the team in charge of day-to-day work. It is the one dealing with customers, using a set of tools to detect trends and provide stop-gap solutions. Ops is the gateway to your organization. This is the team into which you can hire inexperienced employees, train them, test them on the job, and promote them if they show talent. This is where domain experts grow, but the bulk of its work is making sure that behaviors that get past your automatic systems are detected, reviewed, understood, and dealt with. Its main delivery is loss and false positive prevention through various manual decisons (since we’re trying to predict and prevent a loss-causing event, a false positive here is an application wrongly classified as “bad” and rejected). Accordingly, it is measured not only by prevented losses but also decision accuracy, speed, efficiency per decision analyst, and response time to detected trends.
Operations contains various subfunctions that have different expertise:
Consumer and Merchant Risk Ops are the two functions in charge of tracking and making manual decisions regarding your customers: underwriting, placing and releasing limitations on their activities, etc. These functions provide insightful root cause analysis and serve as domain experts for automation purposes. They are the ones that are operating detection tools and responding to alerts, and they operate both on application (purchase, on-boarding) and through the customer’s lifecycle.
Fraud Prevention Support serves as RMP’s link to the world. Answering customers’ questions about possible and actual fraud cases, working with your company’s legal department and law enforcement agencies (submitting Suspicious Activity Reports in the US or their equivalents worldwide, enforcing Anti Money Laundering controls), and investigating fraud rings. This is not a customer-service function although it may serve as a place to which senior customerservice agents can advance; at times, they should serve as third- or fourth-line support for incoming inquiries — since this function’s expertise is understanding and communicating fraud activity and getting external stakeholders to assist your company in preventing them.
The Recovery Team (in consumer lending, sometimes called
Collections or Credit Operations) is involved with recovering losses
from customers who don’t pay. A lot of discussion about RMP is focused on quick and accurate decisions at the time of purchase or when on-boarding a merchant; in fact, strong chargeback management firms demonstrate 60%–70% success rates in recovering chargebacks after those have been filed, and Collection teams report success rates (in canceling chargebacks) of up to 95%, depending on geography. This is the function’s responsibility — by calling, emailing, and mailing customers, by challenging chargebacks with the acquirer, and sometimes by taking legal action, this function is measured by its ability to get you money that you thought was lost.
Decision Automation: Allowing You to Scale
The Decision Automation function deals with creating, maintaining, and improving automated decision and decision support systems. This includes models, feature engineering (the important processes of hypothesizing, designing, testing, and using the indicators (features) for the type of behavior we want to detect), and additional detection systems. In addition to modeling, this function should be in charge of most of the prototyping activity in the team: new tools for Ops, linking and velocity systems, and alternative data sources. It is also expected to be actively involved in data infrastructure and delivery, working with DBAs (database administrators, in charge of managing your data infrastructure — making sure it is up and working properly), or implementing a data warehouse (a database optimized for analysis and reporting rather than for reading and writing speed, a common requirement for production databases). As such, it must have some rudimentary engineering capability of its own in addition to working closely with (or including) the engineers in charge of RMP’s production code. This function has the biggest impact on your team’s major KPIs — losses and rejection rate — and should be measured accordingly.
Analytics: Making Sure You Know What’s Going On
The Analytics function is your top-down eye on what’s happening in your portfolio, looking at it from various dimensions. This is the function into which you’d hire Finance folks and MBAs, and indeed, in some organizations, its function sits within Finance. Analytics is the function that measures, analyzes, and presents your KPIs and performance, and it’s expected to identify trends and their drivers in an accurate and timely manner while making correct projections. Its major responsibilities are reporting current performance, forecasting future performance (and adjusting your provision — the funds you reserve on your income statement to offset future losses), and anticipating portfolio behavior development based on purchase inflow.
Product Management: Bridging a Rather Narrow Gap
The Product function is a rather established one and doesn’t require introduction. What’s important to understand is where it fits into the RMP team. Domain experts working with engineers on automation serve as product owners of sorts and do not require product support. Product managers are required, though, and they should focus on three areas:
Internal tools: Due to the complexity of the job and general lack of suitable tools, product is required to lead the work to develop new tools or adapt off-the-shelf tools, according to ever-changing needs. Review tools tend to have many distinct use cases, and they change rapidly; being able to generalize and use development resources wisely is important.
Data and decision infrastructure: Either off-the-shelf (rare) or home grown, data delivery (fetching and summarizing external and internal data for decision use) and decision infrastructure (real-time models) need to be developed and integrated. While seemingly a straightforward task, often this is complicated due to performance (on the decision side) and regulation (privacy, access control, and discrimination on the data-source side). As a result, being integrated properly to data sources and decision systems that work well for various countries is a strong competitive advantage and barrier to new entrants.
Customer interaction: RMP teams often focus on real-time detection and some after-the-fact processing of claims. There is a whole world of opportunity in customer interaction that can help curb losses and improve customer satisfaction. From front-end authentication flows that challenge suspicious users to prove their identity to automated dispute flows, a savvy product team has to tackle the design and optimization of user interaction.
Hiring for Your RMP Team
How do you start a RMP team? How big should it be? When do you hire engineers, statisticians, and others? These are very common questions when approaching RMP for the first time. Even for experienced risk managers, the question of the ideal employee profile still stands. How do you build the team?
When discussing hiring, I constantly emphasize the need for domain experts. By that, I refer to people who understand the customers’ behaviors, needs, and their results both generally for the industry and specifically for your business. They have deep understanding of what this book is about — detecting, analyzing, and solving loss-causing problems. Most importantly, their understanding is anchored in actual work experience, having looked at and solved a large number of problems over a long period of time. Domain experts can be hired with experience, but a large number of them will grow in your organization, as they learn through operating your product and talking to customers.
Some Important Comparison Points
Every team is slightly different, but the following metrics are the most common and should drive your team composition:
Your eventual loss rate should be determined by your business model. For most online businesses using credit card, that’s under 1% across your portfolio.
Your review rate — the percent of purchases your team manually reviews — should be under 1% for matured segments and markets, and no more than 30% for new ones. If you aren’t close to these numbers, your automation effort is falling behind.
An individual reviewer should reach 100–200 reviews a day, depending on the type of cases they review. If you are not close to this number, your tools and procedures are lacking.
One decision-automation analyst can process insights from four or five review agents. You may need more analysts to support your general analytics activities. If you need more than this proportion, either your data availability is bad and requires a lot of manual work from the analysts to gain access or your analysts are not technical enough to overcome simple technical and analysis automation issues (which could be as simple as removing all non-alphanumeric characters from a long list of phone numbers or splitting email addresses to username and domain).
One analyst can provide insights and feature requests to keep two or three engineers busy. If more engineers are needed, you may have an engineering efficiency issue. Since risk code is often in the core of any legacy code, this is quite common.
If you run an established team and find that you are operating at completely different levels, you may have a unique business model and way of operation; however, most likely you are suffering from one of the issues I noted above or others. Fix them before throwing more bodies at the problem, as more people compound operational and infrastructure problems in the long run rather than solve them.
What If I Don’t Have Anyone?
If you are just starting, you should start with two people: an operator and an engineer. The operator’s role is to be the first of your review staff, and one that will evolve to be a domain expert and an analyst with time. Taking this approach guarantees that the person in charge of automation will understand the ins and outs of your business before moving to develop rules and flows for it. The operator needs to work for about 6 months in full capacity — hire additional people if needed — before you hire the first engineer. The engineer doesn’t have to be a machine learning expert — actually, it is preferable not to hire one at this stage — because most of the work is going to be infrastructure and incremental work on developing features for detection. What this person does need to have is good understanding of data and how to structure and store it for future analysis as well as some product grasp to be able to work with the operators, who will have a hard time articulating their needs at first. With time, the operations team will grow while the more senior people will become decision-automation analysts, and the engineering team will expand to additional practices. Most teams do not need statisticians and machine learning experts before a whole year has passed.
Hiring Your First Operator
While hiring engineers is a highly debated practice, hiring operators is not as much. There are some experienced operators in payments, but the requirement to grow into analysts and participate in decision automation is often beyond what most operational people are willing to take on. How do you find the right people for this team?
No other area is as ripe for hiring young talent and getting them onthe-job training as RMP. You can provide these people with a structured development process that will benefit your whole team, while reducing your need for experienced talent.
Look for people with basic technical understanding and some statistical intuition; some, but probably not all, of your customer-care team members demonstrate this capability, and some can grow into Recovery/Fraud Support roles. Your domain experts must be able to generalize on trends they identify and translate them into features and rules at a reasonable level; you can test for that by asking candidates to devise ways to steal from you. The answers may surprise you. Balance well between quants and team members from diverse backgrounds. Quants do a great job building models but often not a good job in identifying and articulating phenomena, especially in corner cases or small numbers of occurrences that require a lot of assumptions and some intuition to connect the dots and explain a customer’s behavior. Social sciences and Humanities graduates will contribute to this aspect of your team. Some of my best hires were majors in music, theatre, psychology, and biology; that is what allowed us to, for example, build a repeat-offender system modeled after virus behavior and an analytic method based on sociological and criminological elements.
Try to hire engineering talent into your operational team, and do so early; hire ex-engineers looking to make a change into operational roles in the Consumer/Merchant risk Ops teams.