Andreas Weigend | Social Data Revolution | Fall 2015
School of Information | University of California at Berkeley | INFO 290A-03

Social Data Revolution

INFO 290A-03, CCN 41638
Fall 2015 (First class: September 22)
202 South Hall, School of Information, UC Berkeley

This course is about the use, the importance, and the future of data. It is taught by Andreas Weigend , former chief scientist at Amazon.

Info Session

The info session was held on April 20, 2015.

Course Description

Welcome to the Social Data Revolution: Data of the People, by the People, for the People! This course will reveal the implications of the emergence of data with particular emphasis placed on the future of health, work, finance, and education. The class invites a diverse group of students to participate in the course - both technical and non-technical students will be able to learn from the course as long as they have a passion for data and are willing to put in the work required for the course. In fact, people with a diverse background are encouraged to take the course to provide different perspectives for the topics covered in class. This is not an easy one unit course - there will be challenging assignments which serve to help students gain a greater understanding of data science. There are five homeworks and one wiki this semester. The course will expose you to the multiple fields which data will influence in the future and the present, so join us on this the journey through the Social Data Revolution! At a glance, here is the list of class sessions:

The class meets in 202 South Hall (School of Information),+University+of+California,+Berkeley
The classroom opens at 3:30pm. We start sharp at 3:40pm. There will be a 10 minute break (or in some classes a 30 minute breakout session instead). Class ends at 6:30pm.

If you are early, spend some time at Caffe Strada, 2680 Bancroft Way, Berkeley, CA 94704.
It takes about 10 min from the café to the classroom (walk towards the Campanile).

Wiki Contribution

The course wiki is an important element of the class. In the rapidly changing world of social data, there are no good textbooks that are the result of teaching and tweaking a course over years. So, every year, the students create collaboratively a snapshot of what we learned. At the beginning of the class, it is quite barren. Two good examples for what the wiki pages will grow into during the term are
Please note that the wiki pages are not supposed to be transcripts. Also please keep the underlying wiki language clean. Copy paste from Word tends to have many <span…, which makes it quite hard to fix things in wikitext editor.
Students will form into a team of 3 people, and each team will be responsible for creating a wiki page to cover a specific topic in class. The team which signs up for the topic each week will need to complete the wiki page before Thursday 9pm, two days after the class. One of our teaching assistants, Andrew Ho, will assist you to create a wonderful wiki page and will be there to answer any questions. It's truly a collaborative effort that enables you to internalize and share your learning in class. We hope you will love it too!



Today, in a single day, humans today create and record more data than all of mankind managed to produce from its beginnings to the year 2000 The early generation of Internet companies such as Amazon and Google pioneered algorithms including item-based collaborative filtering and PageRank to refine these data to help users make better decisions, changing how a billion people buy items and find information. In this class, we will examine the birth of the Social Data Revolution in the cradle of e-commerce and online advertising, and consider its present and future.
The physical world today is permeated with sensors: mobile phones, wireless routers, payment systems, traffic cameras, electronic door keys. The class focuses on what can be learned about people through the massive amounts of data these connected devices are feeding the cloud with, allowing the same analytical techniques built for a digital world to be applied to our physical world, turning academic exercises into daily reality. We look into the ways in which data have entered, created by, and changed our lives, and consider a future filled with networked sensors.
Greg Tanaka, CEO of Bay Sensors, will share how physical stores are using cameras, microphones and other sensors for decisions ranging from staffing (taking into account everything including the scheduled national TV advertising campaign, weather forecast, and football schedule, how many people are needed in the store) to planogram (where to put what on the shelves), and pricing. If we had all the data available, how would you use the micro-emotions on the customer's face that the camera is picking up?

When we think of using data to make better business decisions, we think of social networks and the promise of online services customized for every individual. It is easy to overlook many other forms of data that are already being constantly generated through the course of doing business. Consider MasterCard, a company with the data on billions of credit card transactions. How can we apply new techniques and ideas to these existing stockpiles of data to help people make better decisions, whether in combating fraud or determining retail trends? We look into unleashing the hidden potential of existing information silos by transforming them into data warehouses.

The nature of the relationship between employers and employees has remained largely unchanged since the Industrial Revolution. Corporate employment is an ingrained part of the cultures of many developed countries. However, the Internet is creating new opportunities and redefining work in the post-industrial economy. In the online outsourcing model demonstrated by companies like oDesk and the flexible work schedules of Silicon Valley tech workers, we see the physical constraints of employment being eroded in various ways. At the same time, availability of new forms of data is transforming how workers are evaluated and how they evaluate their employers. Are all forms of data fair game for the employer, or do we still need time to establish new norms of what personal data belonging to the workers should be protected? We consider the balance of power between the two sides and how that is shifting in ways that are enabled by data.

Learning --> Collaboration (People Analytics)
In many ways, our current education system evolved to answer the needs of the post-Industrial Revolution economy – to train disciplined workers adept in a variety of standard skills. But just as corporate employment is being transformed by the Social Data Revolution, so will education. The present Massive Open Online Courses (MOOCs) are still not too far removed from the industrial model of mass-produced unidirectional teaching. When the goal of making all course materials freely available online has finally been realized, what comes after? With all the personalized data we have, how can we make education more effective? Is it possible to objectively measure the potential of a student, and if it were possible, should we do it? How will social data transform the relationships between students and teachers, and between students and students? Which functions of the institution of college will remain, which will be transformed, and which will cease to exist?

Our well-being as humans is being augmented by data. Take a look on Kickstarter and you will find dozens of projects proposing wearables and sensors that will help us live better lives. Companies like 23andMe are applying analytics to understanding our genes and making the results shareable and social. Even the healthcare insurance system, cumbersome and full of legacy baggage, is gradually succumbing to the avalanche of data-driven business decisions that will transform the industry. As wearable sensors provide a deeper look into our daily routines, will understanding data better help us live healthier lives? How do we tell the difference between data that are easy to collect and data that actually help us understand the state of our health better? Is some information better than no information?

The advancement of human civilization can be correlated to technology improvements that make us more mobile. With the data that we collect, can we build smarter cities with connected communities? We learned how to optimize digital networks, but with the rise of self-driving cars and smart sensors, we now have the opportunity to do the same to our physical environments. Companies like Uber and Airbnb are attempting to improve the utilization of existing resources, while at the same time data-powered applications are helping people make more informed decisions about their travel plans. From flight loyalty programs to logistics networks, we see how the application of data is changing the way we get around and helping us travel more intelligently.

Data Ownership and the Future of Data
At the heart of the future direction of the Social Data Revolution lies the critical concept of data ownership. Who is in control of the data that we produce in our digital wake? And in our understanding of privacy as it relates to our digital data, what kind of implicit trade-offs are we really making? How much are we willing to pay for privacy? We also consider copyright and its manifestation in the digital world. Do we own the data that we inevitably create in our daily lives? In what sense can we exercise meaningful control over what we produce, both intentionally and incidentally, online? In asking these questions, we look towards the future of data.

Before each class, I am holding office hours from 2-3pm in South Hall, Room 302.
After each class, we will go out for dinner with ~6-8 students (and guest speaker, if there is one).


I will briefly discuss the syllabus in the first class.Do come prepared with any questions you might have!


There is a total of 100 points.
50 points total for homeworks 0-4 (10 points each)
20 points for homework 5
30 points for wikispace contribution


Videos are uploaded within one week of class to
Audio recordings (mp3) and corresponding transcripts (docx) are at


I am excited to teach this class! It builds on my experiences as the chief scientist at Amazon. Like Google and Facebook, Amazon at its heart is a data company. I worked with Jeff Bezos on topics ranging from data strategy (how do we get customers to write reviews and add photos when useful?) and customer behavior (what experiments can we run to understand how people make purchasing decisions?), to discussing with each team their "fitness functions" (including, of course, the question of how a recommendation algorithm should be evaluated).

I left Amazon in 2004 and came back to the Bay Area to work with some startups and teach, in the fall this course at Berkeley and in the spring at Stanford (where I also got my PhD in physics). My thesis was actually more in the area of what now is called Data Science: I developed neural networks for time series analysis and prediction. Between my postdoc at Xerox PARC and joining Amazon, I was full time faculty, first an assistant professor in Computer Science and Cognitive Science at CU Boulder, and then an associate professor in Information Systems at NYU's Stern School of Business.

I am very fortunate to be an advisor to some great startups (including RocketFuel, see the panel Sketchy Data with its CEO before their IPO), and to work as an independent consultant with innovative companies including Alibaba, Lufthansa, and MasterCard. Five years ago, I founded the Social Data Lab. If you take this course you will learn about some of the exciting problems we are working on. And in the last class, we will have some outside guests join as for the Fall 2015 Social Data Summit.

In the last years, every company seems to want to have their Big Data Day: I spoke at ATT's in Dallas, Walmart's in Bentonville, and Tencent’s in China, and at top international conferences, such as the World Business Forum. Big Data has become the topic of highest interest to most delegates. If want to know what I talk about, view the slides of a keynote Transforming Big Data into Decisions at an IBM event last June. And if you are interested how I talk about it, you can listen to the audio recording of my talk Data is the New Oil.