Andreas Weigend | Social Data Revolution | Fall 2015
School of Information | University of California at Berkeley | INFO 290A-03

Session: FinTech


This session focused on the current use of machine learning technology in the field of Finance as it relates to lending credit and preventing credit card fraud. We had speakers from 2 startups based in San Francisco Bay area part of the Silicon Valley - UpStart Network and Sift Science. UpStart Network is a credit lending agency that lends loans to young individuals with low FICO scores or nonexistent credit history based on several fringe data parameters specific to individuals. Sift Science uses machine learning algorithms to help small businesses reduce the credit card fraud and abuse.


UpStart Network: Jonathan Eng and Paul Gu

Upstart Network is an online lending platform that goes beyond the FICO score to finance people based on signals of their potential, including schools attended, area of study, academic performance, and work history. Their proprietary underwriting model identifies high quality borrowers despite limited credit and employment experience. Upstart offers 3-year fixed interest loans. Funds can be used for almost anything, including starting a business, paying for a coding bootcamp, eliminating student debt, or paying off credit cards.

Jonathan Eng: CTO at UpStart Network

Previous: Google Knowledge Graph, Brotankery
Jonathan Eng
Jonathan Eng

Jonathan laid out some of the interesting technical aspects that are used by UpStart in evaluating the credit worthiness of an individual. UpStart uses
machine learning techniques to train models that predict the capacity of individuals to pay back loans. While traditional financial institutions mainly rely on credit scores and cash flow, UpStart aims to find alternative predictor variables such as SAT scores, sign-up-to-application time, and link-source to better model credit-worthiness.

The way Jonathan sees it, Upstart is essentially predicting an individual's future income, as it is highly correlated with repayment of loans. Since many young individuals either do not have a credit score or do not have enough years of credit history, they make short-term career optimizations due to the lack of money. If an individual is able to use loan money to either invest or develop a skill, Upstart is essentially providing an opportunity that the individual would normally never have.

Paul Gu: Co-Founder and Head of Product at UpStart Network

Previous: 404Market, DE Shaw & Co., Arizona Microcredit Initiative
external image 009d9d9.jpg

"Algorithms are cheap but data is expensive"
Paul believes that big, established financial institutions are too conservative with their risk models, restricting themselves to models that only take into account traditional metrics such as a credit scores. He believes that these institutions are a classic example of the innovator's dilemma, where novel approaches to the modelling risk are being turned down because members of review boards who were innovators of their time believe that little can be done to improve on the systems currently in place. As a result, prospective borrowers without credit histories such as recent college graduates are (perhaps unjustly) viewed as high-risk borrowers simply because the current predictors have no explanatory power for them. Upstart hopes to fill this void of uncertainty by aggregating unconventional data sources such as education, area of study, and job history. Despite the claims about "massive amounts of data" being available, finding and joining usable data sets has been their biggest challenge.

Sift Science: Jason Tan

Sift Science has built the an advanced fraud detection system. Using fast, large-scale machine learning technology to predict fraudulent behavior with unparalleled accuracy, Sift Science leverages a global network of fraud data. Catch the fraud that is unique to businesses and train the customized model to stop fraudsters in real time. Their flexible, adaptive, and automated solution helps businesses of all sizes detect and prevent fraud before it even happens.

Jason Tan: CEO and Co-Founder at Sift Science

Previous: BuzzLabs, Optify - Marketing in Real-Time, Zillow
external image jason-tan.jpg

Sift Science provides JS snippets that track user activity for their clients to embed on theirs website. Favoring real-time solutions, they use simple but powerful machine learning methods, such as the naive bayes classifier and logistic regression models to evaluate the behavior of credit card users and classify transactions into 3 classes - genuine, ambiguous, and fraud. However, there is one kind of fraud that they are still not able to handle properly and that is friendly fraud, also known as chargeback fraud, where the abuser is related to someone whose credit card is being abused. In an effort to solve this multi-faceted problem, Sift Science uses a two-layer ML method with one network model and a model for each customer to rate every transaction with a "fraud score." Business logic then determines how to react to the fraud score (e.g. auto-accept, manual investigation, auto-reject).
Sift Science.jpg

As Sift is one of the many companies part of the machine learning/big data revolution, it was interesting to hear his thoughts on the current state of machine learning in the tech industry. Jason mentioned that Sift Science is company that subscribes to the "machine learning for money" philosophy, which means that they prefer simple,very fast algorithms over more accurate,slower models. He reasons that many state of the art techniques like deep neural nets provide an increase in accuracy that is marginal compared to the increase in latency. He also went on to say that in modern industry machine learning, algorithms matter least and that data quality and feature engineering matters most. This resonates with what Paul said about algorithms being cheap and data being expensive. This presents an interesting prospect, as an increasingly data-centric world suggests limitless possibilities for machine learning in the future.


Fair Lending Practices

Do Alternative Credit Scores really help to prevent Financial Exclusion?

Credit scores are really important when it comes to applying for personal loans in order to afford houses or automobiles or other expensive household items that could be of substantial necessity to individuals and their families. However, these credit scores can prove to be a big hurdle to those who either don’t have a traditional credit history or have a blemished record due to circumstances beyond their control. We discuss an outline for alternative credit score determination techniques that are emerging as alternatives to traditional credit scores. We also discuss the implications of these techniques as they deal with socioeconomic issues and how FCRA should be further broadened in scope to make sure that laws are not violated by these emerging trends.
Traditional credit scores (e.g. FICO) consider only baseline credit data. Alternative credit scores are broken down into mainstream alternative and fringe alternative credit scores. Mainstream alternative credit score models make use of at least some mainstream alternative data, such as monthly utility bills, phone bills, and other regular payments. Fringe alternative credit score models, often labeled alternative credit decisioning tools (ACDTs) in industry, make use of at least some fringe alternative data, such as government records, shopping habits, social media profiles, location data, and analysis from web tracking. Several large companies, including the three major credit reporting agencies (CRAs) offer ACDTs that include data inputs such as criminal history, employment, tax records, property assets, and residential stability. Various start-ups are also using ACDTs to make their own lending decisions and in some cases to provide underwriting for other lenders. One of these start-ups, ZestFinance, has garnered significant media attention and over $100 million in investment. According to ZestFinance’s founder, his vision in founding the company was “to make lower price [lending] products possible by turning the world of underwriting on its head, with Google-style big data analysis.” ZestFinance uses data that includes “everything from financial information to technology use” to generate credit risk assessments for use in their own lending as well as by third party lenders.
The FCRA regulates credit bureaus and other entities who are are “assembling or evaluating consumer credit information or other information on consumers for the purpose of furnishing consumer reports to third parties.” These consumer reporting agencies that sell data for specific decision making purposes are required to maintain relevant, accurate data along with ensuring the provision that such data is used only for some specific purposes that are permissible under the law. ECOA can be summarized as follows:

  1. Lenders cannot discriminate against protected class (e.g. race, gender)
  2. Lenders cannot intentionally use proxies for protected class (e.g. zip code as a proxy for race)
  3. Lenders cannot, through use of variables, have disparate impact on protected class unless there's a valid business reason.

While the first two points are fairly clear-cut. The third seems rather nebulous, and, indeed, the wording was left ambiguous so that the Fair Trade Commission could tune the parameters as needed to strike a balance between consumer protection and free market values.

Section 607(b) of the FCRA, 15 U.S.C. §1681e(b), requires consumer reporting agencies to follow reasonable procedures to assure maximum possible accuracy of the information concerning the individual about whom the report relates. It seems questionable that companies like ZestFinance can make reasonable and accurate predictions about an individual based on noisy data and unproven assumptions. There could be thousands of parameters to consider and non-parametric machine learning based approaches to data mining provide general accuracies that might not hold valid and have a margin of error that is not acceptable for creating fair credit scores. Additionally, because of the opacity of the models, ACDTs may not be able to provide compliant disputed accuracy procedures under FCRA. Section 15 U.S. Code §1681q prohibits obtaining information on a consumer from a consumer reporting agency under false pretenses. There is a possibility of violation of these provisions if banking start-ups are obtaining some non-essential, perhaps even non-permissible, information from such consumer reporting agencies, such as people capitalizing letters in forms or the time being spent by an individual on reading the terms of a service, and using that information in their credit score models to analyze risk potential of investing in such individuals.
The use of traditional credit scores traps consumers who do not have credit histories by requiring them to build credit in order to gain access to the very products they need to build it. Because of this, alternative data in credit scoring has the potential to extend financial services to people who have not previously had access. However, there is also the potential for alternative credit data, in particular non-credit-related data used in fringe alternative models like ZestFinance, to discriminate against consumers and perpetuate inequities in the financial system. Not only could ACDTs run afoul of the FCRA but they also raise important policy issues. Unfortunately, racial profiling has the power to take myths about race and turn racial differences in criminality into reality. Similarly, if people who are members of minority groups are assessed as higher risk they will be more likely to get high interest rate loans thus making them more likely to struggle to repay those loans. This interplay does not even consider other areas of discrimination, such as employment, education, and housing. There is an urgent need because of the decision support systems and their unintended consequences to force preemptive regulation to identify and address these consequences before significant harms occur. There are several levels at which we should question the use of these systems: the data may be wrong, the model may be wrong, or the data may in fact be correlated (due to historical discrimination) without any proof of causality.
While ACDTs do have the potential to provide financial services to people who are excluded from lending based on traditional credit scores, they must be thoroughly assessed to identify legal and policy issues. Even though ACDTs may not directly use variables widely viewed as discriminatory, many of the variables used in their models may serve as proxies for race or other unfair factors. ZestFinance CEO Doug Merrill said himself that he has no idea why people who fill out forms in all capital letters are higher risk (though they can imagine this may have to do with people who have lower education levels or have not filled out many forms due to limited internet or computer access). The models’ inscrutability, even to their developers, makes it difficult for consumers, lenders, and regulators to understand the risk scores they generate. The FTC should take a proactive approach to monitoring ACDTs for FCRA and ECOA violations as well as considering the need to expand coverage of these regulations given the changing climate of credit risk assessments

Thought Experiment: The Oracle Lender

Consider the situation in which you had all the data in the world and a perfect model. What would be the most fair interest rate for people who you knew would not default? Upstart co-founder Paul Gu suggests that these people deserve a rate as close to the federal funds rate as possible. Based on his current data, he estimates that roughly 83% of people fall into this category. The question remains, then --- what is the most ethical interest rate to charge the others? Do they at least deserve a loan with a high interest rate? Or do the people who, based on present data, are known to default on future loans be denied them and the onus falls on them to change their circumstances and prove their worthiness otherwise?

Fraud Detection

Fraud detection encompasses all techniques used in identifying individuals or groups who intentionally and secretly deprive another of something of value. Traditional methods of data analysis have long been used to detect fraud. They require complex and time-consuming investigations that deal with different domains of knowledge like financial, economics, business practices and law. Fraud often consists of many instances or incidents involving repeated transgressions using the same method. Fraud instances can be similar in content and appearance but usually are not identical. CEO of Sift Science Jason Tan estimates that 70% of fraud is organized crime.

Examples of Identifying Features of Fraudsters

Fraudsters tend to work on a large scale; therefore, while they try to seem as normal as possible, in the long run, their activity tends to show trends uncharacteristic of the everyday user. Many times, scaling their operations lead to recognizable patterns such as:
  • Claiming to live in Alaska (AK is the first option in most drop-down menus)
  • Claiming to be born on January 1st
  • Having more numbers in their username
  • Having strange word combinations in their username
  • Navigating straight to the checkout cart
  • Not capitalizing items in a form
  • Having a history of fraud-like behavior fingerprinted on their machines

Fraudsters also tend visit some websites and apps more than others
  • Charity websites - due to the ease of donating, fraudsters use these sites to check whether or not a credit card is still active
  • Gift card pages - cashing out gift cards is fast and often hard to trace
  • Dating sites - fraudsters prey on love-seekers to use as reshipping mules
  • Referral pages - fraudsters use their stolen identities to game referral systems