Andreas Weigend | Social Data Revolution | Fall 2015
School of Information | University of California at Berkeley | INFO 290A-03

1) Intro from Andreas Weigend
2) Future of Retail Analytics from George Shaw
(Previous VP of RAND at RetailNext)
3) Harnessing data to optimize the future of physical retail from Greg Tanaka
(Founder and CEO of Percolata)

!) Internet of Things
In the world today, many ordinary objects are getting linked to the new internet with embedded sensors in a network called the "internet of things". Although these objects can range from home temperature systems, to wifi based washers and dryers, the majority of sensors are located on our cell phones. With a rough estimate of one billion mobile phones globally, and 10 sensors on each phone, we arrive at a base estimate of at least 10 billion sensors. By 2020, gartner predicts that the world will have over 13 billion sensors.
Body cam case study
Although it is indisputable that these sensors generate a vast amount of information, the question still remains: will the new sensors affect the current balance of power? Bodycams, personal cameras that can record police interactions, are commonly proposed in order to increase police accountability.__1__ A study published in Journal of Quantitative Criminology found that "use of force was reduced by roughly 50%" in Rialto, California during a body camera trial.__2__ some of these videos are viewable on__/r/bodycam__ and range from routine traffic stops, to shocking incidents where a suspect is shot on cam. However, for every public video of a suspect shot on camera, there are even more that police departments refuse to release. Los Angelos Police denied access to footage of a homeless man being shot on camera, and subsequently adopted a new policy that never allows any footage to be released__3__. Critics argue that body cameras cannot generate police accountability, because police departments still control access to the footage and can restrict public access. In other words, there exists "__asymmetric information__", a situation where one party (the police departments), have more information (the body cam videos) then other parties (the general public).
Individual recordings
On the other hand, we may not have the same desire for recordings made by other entities, besides the bodycams on police. During class, Andreas explored our uneasiness with being recorded by wearing a Google Glass, and making intense eyecontact with two students. Some of us were concerned with the persistence of the data, others did not like the possibility that Google Glass wearers might not be paying attention and others were concerned with information asymmetry because modern cameras are even able to detect heartbeat speeds.
Paradoxically, governments (and its agents) also share the same aversion to being recorded, despite the fact that they are already recording us. As an example of the power of information, Chinese netizens successfully proved corruption charges against a food safety official because the official wore different Rolexes in different newspaper pictures __(5)__. In this case, the official agreed to be photographed, and essentially gave his permission to evidence that proved his guilt.
Next, we come to the issue of intrusiveness. Are audio recordings of us at grocery stores more or less intrusive than soundless video recordings? Audio initially appears to be less intrusive, because lip readers can generate transcripts from soundless recordings, yet we cannot recreate movement or actions out of audio. In actuality, audio can be used to reverse engineer geolocation if there is a network of cameras. In oakland, california, the police department placed microphones on building roofs in order to rapidly detect gunshots, without resident calls. The__technology__works by "measuring the difference between the times that the sounds arrive at the sensors.

One student also pointed out that politician's speeches are taken out of context and used as propaganda, which raises another concern with recordings: authenticity. In fact, having a combination of recording sources is still insufficient to guarantee truthfulness. A woman named Zilla van den Born conducted an experiment where she pretended to go on a month long vacation to Southeast Asia, but really never left the country (__6)__. She convinced friends and family by staging pictures of thai food/buddhist statues with locations in Amsterdam, and conducting Skype sessions in front of a staged backdrop in her own apartment. In this situation, her friends and family only had one news source (Zillah), and they had no reason to distrust this source.

Now we move to another question, "in this digital age, is it harder or easier for a dictator to commit genocide"? Some thought that since there are more sources of information, it is harder to prevent information from leaking. Governments are unable to "put the genie back in the bottle", because once information goes on the information, the past can't be retconned. Not only do dictators have to stop individuals from posting things on their phones, but they also have to prevent satellites from recording the atrocities.

On the other hand, others pointed out that geolocation data from the same phones could allow governments to target specific groups of people. Governments can also use overwhelm individuals with different news stories, until citizens force about important issues. For example, on facebook in 2012, activists ran a failed campaign to get the world to care about the invisible children in africa. Yet it failed once the next big news article came along, in a phenomenon known as the "__24 hour news cycle__". This incident is also another example of plato's__allegory of the cave__, because just as the cave people thought of the world through shadows on the wall, people learn about issues through the filter of facebook and google. Roughly half of all millennials use facebook as a news source (__7)__
After considering all of these issues involving information, "what do we want in society for the future"? Fundamentally, we are no longer able to block others from recording us, so the only thing we can demand in return is mutual data access. If we demand the freedom to record back, then this serves a check against government agencies. Andreas proposed that in the future, we each have our "recording preferences" on a device, and conversations between two people can only happen if there are overlapping preferences. If both people are willing to have audio recordings, but no video, then that is the type of conversation that happens. In contrast, if one individual wants full recordings (audio, visual) but the other wants no recordings, then no communication can happen since there are no overlapping preferences.

2) Future of Retail Analytics
George Shaw, VP of Research and Development at __RetailNext__, kicked off the second part of the class by showing the faces of eight shoppers in rather close detail, considering these were cameras installed in stores. This demonstrates the changing landscape of physical retail stores, who need extensive analytics to remain competitive with online retailers like Amazon (although they still account of 92% of sales).

Magic Metrics
Metrics have evolved from rudimentary, like conversation rates and average transaction value to tracing shoppers precisely throughout the store (using stereoscopic overhead cameras).
Detailed tracking allows for a multitude of experiments, from measuring why certain customers spend more time at certain shelves, to even tracing efficiency of employees (time spent going back and forth from breakrooms to bathrooms).

George used a children’s Ralph Lauren store to illustrate his points, mentioning how his company was able to see that purchasers spent more than double the time interacting with others in the store (372 vs. 161 seconds), and had five times the interactions (15 vs. 3). Clearly, RetailNext and other companies are unlocking hidden profits for retail companies that they otherwise would not have access to, as the hardware they supply (cameras) are commodities, but the intellectual property they have (understanding the data) is invaluable.

Data Visualization is Key
One other interesting thing George pointed out is he is able to track what the most profitable path for the retailer, mentioning a watch store that boosted profits by spreading out inventory throughout the store and lowering the display of expensive watches that customers encountered upon initial entry.

As an example, George decided to present some path visualizations of different types of customers(i.e. female/male, people who bought vs people who did not, etc.) and discussed how analysis of these different paths can present some interesting discoveries. One of the methods of analysis that George discussed was creating a connected probability graph of different positions within a store. The graph would encode the likelihood of a customer moving from one point in the store to another, and would update based on information from new users. To build off the discussion of these methods, Andreas chimed in with a point about how a metric like entropy, formulated so long ago, can be used for cutting-edge analytics, which presented an interesting point about the culmination of well-worn methods and modern data solutions.It is easy to underestimate the value of traditional statistical methods.

Finally, George emphasized that every metric has a trade off, so they must be used in synthesis, but that analytics ultimately provide a wide and incredibly useful dataset.

3) Greg Tanaka, Percolata
Greg tanaka, CEO & Founder of Percolata started his presentation by raising a question "What's the primary function of a retailing?", and the answer he gives is "Retailers act as a link between the manufacturer and the consumer." So far there are two kinds of Retailing process -- Web and Real world. Greg did an interesting experiments in the class asked how many of us only shopping online or shopping offline. Of course, nobody does, and this backed up his point that physical store and online store won't be separated even in the future. Later on, Greg talks about two different data sources. One is active data source which is defined to be required by active user participation, and the other one is passive data source which is collected automatically by sensors in the environment. They both have pros and cons and the detail information is listed in the slides, but the key thing is we need to use them well and combine them together to get more accurate results. For example, as a student mentioned in the class, we can use the Facebook login location as a reference of where the customers come from, and then by the pressure sensor hidden in retail store floor we can detect their paths in the store. For the question of how to determine the customer sources, Andrea came up with a better idea that stores can make a deal with credit card companies so the credit card companies can provide a detail information about where did the customers used their credit card before entering the stores, but it left a question that would credit card companies share this data? We don’t know the answer but it is a good idea that is worth to try.