Andreas Weigend | Social Data Revolution | Fall 2015
School of Information | University of California at Berkeley | INFO 290A-03

Safari Info


Link to the Agenda:
https://docs.google.com/spreadsheets/d/12HBwjPVmmlgrxCdxVeRwnmUEJ4ChX5ao8_VI5UpXt7M/edit#gid=0

We will have the discussions based on the PHAME framework:

PHAME(copied from HW1 plus the email from Andreas)

The last century was all about the physical sciences, running experiments on particles to see how they interact. This century is about the social sciences. More and more we will now be running experiments on, and then instrumenting interactions between, humans. PHAME is a mnemonic for remembering the steps:
  • P - Problem statement:
    • what the Problem is and why (s)he should care
  • H - Hypothesis:
    • their Hypothesis ie the key idea behind the solution (or Hypothesis)
  • A - Action:
    • the action they took
  • M - Metric:
    • how you Measure whether it is any good
  • E - Evaluate
    • the actual Evaluation (ideally with some surprising stories).

Screen Shot 2015-11-15 at 11.21.56 PM.png

On the bus, Jeff Chung shared some of the exciting stories in AME Cloud Ventures. Within the story and recent history, we can see how venture capital makes impact on the tech frontier. Here we go!

What does AME look at?

Being a venture capital, AME is standing with the tech frontiers. Since Yahoo starts from search, AME inherits the data-centric mindset. In 2012, they were looking at the cloud infrastructure companies. Now some of them become market leaders and have great impact in the industry.

Hypothesis

Accurate and actionable data will be vital in the future economics, while value creation process is largely driven by Machine Learning.

Action

They focus on seed to later stage companies that devote to building infrastructure and value chains around data. Now they've invested in over 57 companies.

Metric

The progress of these startups.

Evaluation

The impact brought by the startups.


Intel

intel.png
In Intel, we explored the topics from the strategical perspectives from Bob, chances and challenges in the IoT trend, automobile and the issues in healthcare. Now, let's go for more details.

What does Intel care about with the data?

Being an IC-cored company, the revolution of software-defined economy is now hitting the giant. For example, in terms of optimization, it's not only about manufacturing but also how to get better performance for some specific applications work on their chips. Since Intel is providing the computational power for a large portion of the IT industry, the problem is, how can Intel get a predictive strategy and provide a proactive service for the existing cooperations to utilise the power of data as well as those emerging startups[1].

2015.10.2 DataSafari_Summary.001.jpeg

How do they facilitate the hypothesis?

As they mentioned in the talk, in terms of IoT, if Intel can empower the 85% unconnected smart devices to be connected, these data can be utilised to create huge value for people. This value creation process is actually beneficial to Intel. Nowadays, Intel gets strong competition from ARM in mobile devices. However, when you get more data to deal with, you must need the cloud infrastructure, where Intel is actually the monopoly of the game.

What do they do to verify the hypos?

They built an open-source platform called Trusted Analytics Platform that features in high performance and security that enables software engineers to create value based on data-centric methodologies. Also, they devote efforts looking into healthcare devices and services within their smart wearable devices. Since healthcare data is more likely to be longterm and streaming, that means people will need more computational power to deal with the huge data volume.
IntelTAP-Diagram-011.png

What's the metric?

Since the growth of mobile device market keep reducing the demand on PCs, the challenge for Intel is that they need to increase the power on its server market leadership[2]. Thus the market share on the server market along with their data service will become an important index for their investment on innovation.

What is the actual Evaluation?

The heartbeat response test needs to take very long time and lots of efforts to complete before. Nowadays, people can get the data with wearable devices. The rich dataset can bring better decision for the treatment. Also the longterm monitoring can help people to understand the dynamics between some specific syndromes and some meta data like age, or behaviours.

Reference
[1]Big Data in the Cloud: Converging Technologies
[2] Intel: Semi-Custom CPU Can Defend Its Server Market Share


google.png

Host/Speaker: Qing Wu, Senior Economist at Google.

Google is arguably the biggest name when it comes to social data. Their ubiquity among Millennials allows them to collect enormous amounts of social data which they use for various purposes. That paired with their ability to sift through petabytes of data within milliseconds means that they can <incorporate data into every facet of their business>. Data Scientists and Statisticians at Google work across all divisions to ensure that this happens. According to Wu, this takes place at two levels:
  • Product Level
  • Non-Product Level

1. Product Level: As most people would anticipate, Google uses its data for its products —Google Search being the most obvious example. At this level, Software Engineering meets Data Science. Data Scientists work closely with engineers to work big data algorithms into their products, for instance Google Search uses indicators such as use search patterns, history, Google+ account behavior, etc. to anticipate a user’s search query. Naturally, this means that the more data that is available, the better the prediction algorithms. Hence it is paramount that Google uses as much data as possible —data is to them is as fodder is to cattle rearers.
In-depth look at Google Search (Infographic | Source: VerticalMeasures)
google-big-data.jpg

2. Non-product/Business Level: This involves working with the Sales/Marketing/Legal divisions by providing them data to improve performances. Since a lot of companies use their online presence to better their businesses, Google has the perfect vantage point to aid such companies by sharing the data about website traffic, ad responses etc. For example Google held a ‘Google: Ignite Real Estate’ event in September, 2015 to help real estate agents/ businesses that use Google’s APIs to maximize their profits by providing pointers helping them build their online services to bring more buyers into their business. [Articles discussing key takeaways can found here and here ]