Andreas Weigend | Social Data Revolution | Fall 2015
School of Information | University of California at Berkeley | INFO 290A-03

Monetizing Data


This second class was divided into three thematic parts:
Part A - Data and Decisions (video) (audio)
Part B - Social Data Business Models (video) (audio)
Part C - Data Ownership (video) (audio)

Part A - DATA AND DECISIONS


For something to be monetized, it obviously has to offer value to potential purchasers. Data only has value when it impacts your decision-making; otherwise, it's a mere curiosity.


external image Vdqwt.png





Social data can impact decisions that we make on a regular basis.

  • Which cafe is less crowded? (Yelp, Foursquare)
    • Other than voluntarily shared geolocation data, credit card swipe data (MasterCard) can also reveal this information
external image f2cdb838d2024c9eb71dd6f421b6d01a.gif


  • Which company should I work at, and how can I pass the interviews there? (Glassdoor)
    • Is is ethical or permitted to share specific interview questions?

external image cartoon6589.png

  • What should I buy? (Amazon, Airbnb, Zillow)
    • Real estate deals used to be predicated on asymmetric information controlled by agents; now the balance has shifted
    • "Sponsored" reviews may induce bias
external image retail-customer_feedback-customers-internet_shopping-customer_service-customer_review-rden52_low.jpg


  • Which news-site/blog/paper should I read? (Twitter, Facebook)
    • Your online friends may put you into an ideological/partisan echo chamber


  • Which clothes should I wear?
    • No real "social appraisal" based on data, for now. Could there be one?
    • Style is subjective; how do you know/decide what looks good?
    • Fashion changes quickly. Branding used to be fashionable, now not as much.


  • Should I obey the speed limit? (Waze)
    • In other words, will I get caught if I speed?
    • Is it fair to require everyone to obey the same speed limit, regardless of driver skill/focus and car capability?
    • Would it be practical to determine an individual speed limit for each car based on OBD-II and GPS data?
      • Similar to dynamic auto insurance pricing based on some "safe driving" metrics like sudden stops

external image tornadoguard.png


Given the massive amount of social data in the world, computers may be better than us at making key decisions.

  • People are very good at seeing patterns...unfortunately, even those that don’t exist.
  • Traditional appraisal of products is virtually obsolete due to how effective modern machine learning models are on huge social data sets
    • It's hard for individual consumers to compare two similar products on Amazon with equal review averages, but machine learning analysis can quickly reveal key differences between the products. See comic above for an extreme example.
  • The balance of power has shifted thanks to social data, between individuals vs. government, employees vs. employers, patients vs. doctors, students vs. teacher, children vs. parents, etc.
    • The Chinese government only began to issue an accurate daily pollution index report after U.S.-embassy-sourced air quality warnings previously dismissed as "propaganda" were independently verified by individuals sharing DIY air pollution readings on social media (WeChat, Weibo, etc.).
external image pollution_in_china__nayer_129312e8122d.jpeg


  • "Nudge": We like the illusion of control. The world might be shaped by how Amazon/Facebook/Google and other data refineries want us to see it through metrics that they control, but we like thinking that we can shape our world ourselves.
external image hormer-simpson-cartoon-nudge.png

    • A bit more on nudging: you see choice architecture everyday without realizing it: from the shampoo you buy at the store - where it's placed on the shelves relative to other products to which link you first click on when googling something. In recent years, the literature about choice architecture has been taken more seriously. Some cool stuff:
      • The UK government implemented a "Nudge Unit" to help people make better decisions.
      • The Obama administration recently started a new initiative to create our own nudge team.
      • The Singapore government got more people to pay their bills on time simply by printing it on pink sheets of paper rather than the usual white ones.
external image challenge-jan2014-80-a-tip-for-policy-making-nudge-not-shove.jpg?sfvrsn=4


  • About 30 years ago, we literally gave our lives to a circuit in the form of ABS brakes - we trusted some engineers from somewhere to put a system in our five-thousand-pounds-killing-machines cars, making it easier for the average driver to stop suddenly without locking the wheels and lessening the likelihood of us getting into a deadly accident on the freeway. It is not too unrealistic to imagine a day where data can be used to make our lives better in some sense without the stigma and aversion associated with it today.

Spam

external image NigerianEmail_SteveKelley.jpg

  • It’s easy to make spam look like a regular non-spam message on an individual level. Many people subscribe to commercial e-mail lists (e.g. for coupons), after all.
  • But at a social graph level, you can see how many similar messages are sent across a network. How many other people are receiving a similar or identical message, and how do these people react to the message? Do they delete it instantly or read it?
  • “We are at the mercy of refineries” with regards to what messages they choose to filter out and what they choose to deliver to our inboxes.
    • What if Google delayed the delivery of all Apple e-mails to Gmail inboxes?

Part B - SOCIAL DATA BUSINESS MODELS

Everyone makes money somehow. There are 4 types of money makers:
1. Prostitutes: those who sell their time (e.g. consultants).

external image consultant4.jpg
2. Pimps: those who sell other people’s time (e.g. hedge fund managers).

external image 2010_06hedgefund.gif


3. Product people: people who sell a product. Two subtypes:
  • Pornographers: high production cost (performers), low distribution cost for each quantity sold (digital distribution)
  • Drug dealers: low production cost (plastic baggies), high distribution cost for each quantity sold (more drugs)

external image cartoon6165.png
4. Gamblers: people who buy and sell risk (insurance companies)
external image I3.gif


What is Andreas selling? Knowledge & the future & hope? (5th type of business - professors? In a way, are professors similar to priests, in that they are professing something to instill in us knowledge or hope for the future? Or is it a subset of type 1: prostitutes, because professors are selling their time?)

Which one is Andreas?
Which one is Andreas?




Discussion:
  • If you have all the data at Amazon, LinkedIn, Google, or Facebook, what would you do with it, or in what new way would you monetize it?
    • What hasn't already been done? Probably anything that is newly labor-intensive on the part of these refineries; the economies of scale have to make sense to dovetail with their core business models.
    • What rules must these companies abide by to be compliant with legislation? (What is stopping them from taking over the world?)
  • Would new ways of selling data to other services change the behaviors of users?
    • Privacy concerns rear their head once again!
    • What is the line between privacy and convenience? For example, targeted ads can make your life easier (it's like having a personal assistant who only presents you things that you would want! For free!), but at what cost? Do you feel comfortable having that personal assistant know everything about you? Definitely, the more s/he knows about you, the better s/he can do his/her job. What level of privacy are you willing to live with, though?
  • Should we view data as a product or an ingredient? It's a continuous cycle: data always spawns more data.
    • Is data itself the core revenue generator or does the revenue flow from the application or analysis of the data? The answer is both. That's why what was once data has become big data--there's a lot of it! There's so much data that the focus is now not only on data, but also on meta-data.
    • Facebook vs. Amazon, respectively.
external image big-data-social-media-comic.png
Group presentation:
  • Amazon: Repackage the data they have and sell it to other companies who could make use of it (e.g. product manufacturers). Or financial market information for the future stock prices based on current search trends (is this illegal?)
  • Facebook: Engage in internal reorganization / refactor into a larger company, with Facebook being a child company of this larger company. This larger company will specialize in a lot of things, such as Peer-to-peer insurance & finance because it has abundant data on social interactions (Andreas - “friendsurance,” the kind of people you’re friends with on Facebook might one day affect your insurance premium and credit score. Not an unimaginable policy, considering it is often said that one is the sum of his five closest friends!)

  • LinkedIn: Job application through LinkedIn. Incentives for recruiters and individuals to gain reputation for their hiring/referral actions.

  • Google: Using stored location data (Maps) in conjunction with newly collected data from workers of some corporate client moving about production facilities (Project Tango) to determine supply-chain inefficiencies for the client to act on

Joke: How do you make money on the internet? Gather 'round, youngins: freelance a website from someone in Russia, then go to China, get millions of people to click on said website, and finally call Coca-Cola to advertise on your website. A circle of outsourcing and passive income!

Part C - DATA OWNERSHIP

What is the difference between data and a banana? Data is not consumable or finite; I can share it with you and not have any less of it, and it never disappears. However, it only has value in large quantities (sample sizes!). Also, you can do almost anything you want with a banana, but you can only read or write data and have to stay in compliance with privacy laws.

external image privacy_opinions.png


Privacy is consumable, however. The more data is compiled about you, the less you have: "privacy burn rate"

  • Your search history reveals one aspect about you, your location history reveals another, your purchase history another, and so on.
  • The less privacy you have, the more valuable your individual social data contribution is.
    • Ad targeting can become eerily specific!
external image 140121.internetofthings.jpg

In case you have no idea what the joke is


Is data an individual good or a social good? Who owns it--the producer or the refinery? What does this ownership mean?

external image maxresdefault.jpg

  • Presumably, if you have access to the data and control over who sees it, you own it; data is always co-created. This question is subjective.
  • What are the relative limits of ownership, though?
    • Suppose that someone creates a Facebook post. You comment on it. They delete their original post; what happens to your comment? Is it deleted too? Or is it just left bodiless, sitting there all alone?

    • external image 2011.10.30.negative.png


  • Depends on the situation:
    • One employee may be unwilling to share your data with you; another may be happy to do the same.
    • You may feel that you own all the images you post on Instagram, as you control how they are distributed and attributed...unless Instagram decides to remove one of your images on the grounds that it contained "objectionable content."
      • Recent controversies - breastfeeding photos, etc.
  • Schrodinger's Cat analogy--perhaps data ownership is not definitively determined until it is tested?

external image complete1_1.PNG

  • Not polemical; rather, multidimensional.

TAKEAWAYS:
  • Decisions in the modern world all depend on social data.
  • Today, a company's business model can be based on social data. We’ve moved from data being used for optimization to data being used as the product itself (Facebook feed, for example).
  • The life cycle of data: Produce -> Refine -> Distribute -> Consume

external image 140113.bigdata.jpg




Administrivia:

  • HW2, due Sunday, October 11 at 5pm. One assignment, two choices:
    • Formulate five non-trivial hypotheses about problems in the area of identity and trust. Describe what experiment you would design to test each hypothesis.
    • Formulate five non-trivial takeaways from the data safari that take into account your learnings and surprises.

Contributors:

Abhi Sharma, Chau Nguyen, Yuan Yuan