Andreas Weigend | Social Data Revolution | Fall 2015
School of Information | University of California at Berkeley | INFO 290A-03

Tuesday, October 13

Class3: Identity and Trust

Live stream:


• 3:30 Reddit - Steve Huffman (co-founder and CEO)
• 4:30 Uber - Silvanus Lee (in charge of product analytics)
• 5:30 Airbnb - Alok Gupta (head of Trust and Safety) and Riley Newman (who joined Airbnb as its first data scientist back in early 2010 when the company was still based in Brian and Joe's apartment. In those 5 years his role has broadened from analyzing the business to building the data science and data engineering teams. See his op-ed about what he’s learned through this experience)

Part A - Reddit

external image 1280px-Reddit_logo.svg.png


Reddit was created as the "frontpage of the internet", essentially combining the mechanics of Delicious (which interestingly invented the concept of tagging) and the content and community of Slashdot.

external image 22718v1.pngexternal image
Screenshots of Delicious and Slashdot circa 2005, two of the major influences on reddit.

Originally, issues of identity and data were not a concern in the founders' minds. In the early days, reddit took an attitude of "don't ask for info we don't need". This worked well since email did not actually provide reddit with too much value. Also, this lead to two early decisions would impact the future and define reddit forever. The choice not to require email addresses, and the choice to use usernames instead of real names.

Personas and Identity

Because an email is not required to register an account and accounts do not have to directly correlate to your real life identity, any single person can make a multitude of identities. For example, as reddit was growing, three of the top five redditors were the same person. But, from looking at each usernames' posts, you would never know as each username had its own unique persona and identity.

external image copywriter-multiple-personality.jpg

Because this is possible, many people's real life identity and reddit identity are vastly different and because there is some level of anonymity (but not complete anonymity) conversations are often more genuine. This creates a healthy dynamic of having little consequences in your real life, so people become more authentic, but there are some consequences since you are not completely anonymous (like on 4chan, for example).

Flexible Identity

As mentioned before, reddit does not force you to use your real name so your identity is a lot more flexible. Your identity is simply defined as the age of your account and your posting history. Furthermore, because creating a new name is easy (and does not require you to give additional information) throwaway accounts are popular because it lets you disconnect some of your more revealing thoughts with the reputation and identity of your more serious reddit accounts.

Part B - Uber



Silvanus Lee, Head of Product Analytics

Driving Data Science @ Uber

Data that influences decisions made in Uber:
1) The Network
2) The Traffic Patterns
3) The Demand
4) The Supply
5) The Pricing

Using Data to Drive Innovation

1) Allows Uber to utilize resources more efficiently
2) Drives price & costs down while driving value up
3) Creates opportunities for more Uber partners
4) Improves transportation for more Uber customers

Rating System

The way Uber builds trust between the drivers and passengers is through a rating system. Each passenger rates the driver after each ride out of five stars and each driver rates the passenger. This rating system builds trust and allows the driver and user to make decisions on who to share the car with.

Part C - Airbnb

  • Humble Beginnings:
    • Chicken & the Egg Problem: You need demand from guests, but you also need a supply from hosts. How can you get people to be willing to host others in their own homes
    • Domino effect in markets: Once you won bigger markets, smaller markets saw the potential and gave in quite easily
  • How do you get two strangers to trust each other with your own home?
    • Early adopters were far more adventurous, but Airbnb needed a way to appeal to the general public.
    • In the beginning, they trained models from people's past behavior to see what they'll loo for in future bookings. Through this, they could predict the more popular neighborhoods. For example, when someone books a home in San Francisco, the team knew to direct their searches to the Mission, than to the Tenderloin
    • While it seems that they must focus on guest happiness, they needed a way to prevent the problem of having good hosts get bookings while newer hosts not getting any
  • Airbnb thinks of self as a trust company instead of travel
    • How to categorize trust? If trust could be measured via a graph, would it be a property of a node, or an edge?
    • Trust in Airbnb:
      • 1. New users need to trust in the Airbnb platform (first impression given by the website)
      • 2. Need to trust the host and their space
      • 3. Hosts need to trust the guest. This is perhaps the most precious piece of trust as they are trusting a total stranger into their own home/sanctuary
    • New users need to be verified and vetted:
      • Verify: Does their offline profile match their online profile?
      • Vetted: Can we trust this person in the Airbnb Community? Is he/she a safe person?
      • Four sources of information: Reviews and history of interaction, social networks, web searches, and private data stores (ie. government records)