Andreas Weigend | Social Data Revolution | Fall 2015
School of Information | University of California at Berkeley | INFO 290A-03

HW1 link:

Top HW1 Experiments:


1. The researchers at the Facebook R&D department wish to see if users's medium of post is affected by the types of posts shown in their News Feed. For example, would the average (Digital) Joe post more photos to his timeline if his News Feed was flooded with his friends and acquaintances' photos? What about shared links, or even status updates? This is very similar to a study carried out by psychologists from Facebook and the UCs that looked at the ability of Facebook status updates to cause emotional contagion in the mass scale (results of the study available here: While that study only looked at the emotions of users, this experiment focuses on the medium of the post(photos, status updates, questions, comments, notes and the like) and how it affects the users. This study could point towards ways Facebook could affect an increase in a particular medium of post. For instance if on a particular day, photo posts are lacking, Facebook could alter people's News Feed to show more photos of their friends.

2. Facebook increasingly creates a space for debate around current events. As people share articles, "like" the views of others, and start extensive back-and-forth in the comments on controversial topics, there is opportunity to gather new levels of data on how people engage in these conversations and under what conditions different types of conversations spring up. This is also profoundly affected by the algorithms that determine which posts rise to the top of most users feeds. When you click on someones articles and like their posts, you are more likely to get more of their views later. Thus, Facebook creates a space for you with little ideological diversity. In this experiment, I would want to see how the range of ideas that you see in your feed affects the types of behavior that users engage in when it comes to sharing their views and potentially absorbing information. I would use a particular example of a highly charged current event--the debate over defunding Planned Parenthood--and alter the ways in which those posts are promoted. For some users, posts with views similar to their own would be promoted, for others, posts with opposing views would rise to the top.

3. There is research indicating the echo-chamber effect on Facebook is self-selected. Some sizeable subset of users are likely to have changed their opinion on various issues over the years. When a user prepares a post on a topic, if that topic has previously been addressed by the user with the opposite stance, this could enter our study. Some of those instances would be controls and others would be tested with an intervention. The intervention would simply notify the user of the previous post with wording akin to: "Previously you posted: [insert post]. Would you like to edit your current post or link to your previous post?" This may prompt the user to change the wording of the post post or add a link. The changed wording or the addition of a link may create higher quality discourse on Facebook. You should care because there is a great amount of political discourse on Facebook and this may improve the quality of that discourse

4. It would be interesting to see if the analysis of your likes, page views, events subscribed to, friends, and messages can predict your life events in the near future. By life events, I mean where you will work, where you will live, and who you will be in a relationship with. Although there is some bias with people who post all their life events on facebook, it would be interesting to see the results for those who have a distinct difference between their real-life decisions and what they display on Facebook. It's also a cool inference problem,as you are making a prediction based on causal dependencies between nodes in the social graph. I think it would be awesome to visualize which nodes in the social graph had the most influence in the prediction and how the difference in social graph influences change over time.

5. I have read that people tend to make "social comparison" when they're on facebook, and when they see any signs that peers are having a good time, they might get depressed (of course there are people more at risk than others). I want to run an experiment to find out who are those at higher risk of depression than others based on their facebook habits to pinpoint the ones at high risk for depression and other mental illnesses, and find ways to help: for example, change the algorithm slightly so that these at risk people see less of what might depress them (if their family just past away, do they want to see EVERYONE's condolences? if their pets just died, do they want to see everyone else's pets?) and more of other contents.

6. I would like to see if there is a connection between success and certain actions on Facebook. Specifically, I would like to see if people with large social circles on Facebook are more financially wealthy than those who do not, as well as if they are more happy.

7. After finding out what type of ideological thoughts a user has, facebook can randomly scatter articles from different points of view on newsfeeds, and see If users will click on it or read it or mark as not relevant or block the news source. I want to see how biased facebook is as a media source, since Facebook has moved beyond being just a social platform, and became a news platform as well. According to a recent survey, "88%" regularly use facebook as a news source", and 70% of facebook users will click on news stories posted by other people or suggested in the newsfeed (api). Out of the millenials that use facebook, 70% surveyed believe that their news feed is an "even mix of similar and different opinions", but I want to verify this claim.

8. Facebook users are sometimes hacked – sometimes by friends, but it could also be people with more nefarious goals. Best case – hacker posts a stupid status and logs out, Worst case – hacker steals user’s identity. Also with new payment system being put in, the user being hacked could end up with an empty bank account… In this experiment, I would test whether locking accounts after suspicious or unusual activity catches and deters hackers. More explanation: If user sets off certain triggers (to be explained below), lock account and send old user text/email to alert them of the hacking and telling them a temporary new password that they can use once to set up a new password. User can also report whether it was actually them that triggered the hacking response. Possible triggers: Over three tries to pick correct password (especially if user typically gets it right on the first try) User logged in for a while (20 minutes?) with no activity Typing pattern is different than user’s usual pattern. (See article about each person’s typing “fingerprint” being different) Main email is changed Main phone number is changed Large payment is attempted Each of the triggers would have a different weight. Facebook can then sum the weights of detected triggers and lock the account when it gets over a certain level.


1. Google is now beginning to tap into its massive repository of user-linked search history, indexed e-mails, location information, etc. to provide unsettlingly relevant information through Google Now (e.g. flight times, sports scores, etc.). The prediction model that I see reflected in my Google Now cards seems to be largely deterministic based on my user data, though. What if a probabilistic model, based on "similar users'" (similar searches? locations? e-mails?) activity, was used to attempt to predict what information I would want next?

2. I would like to see how search data leading up to elections can be used as a factor for determining the outcome. I want to do this at all levels of federal and high state offices. You should care because google could be used to predict outcomes of all elections, not just presidential ones (which we are already pretty good at.

3. I want to run on experiment on Google to see if we can create an algorithm that measures "daily demand" for certain products. The experiment will consist of monitoring daily searches by people, and from that see if there are certain trends which we can use to create an algorithm that measures "daily demand" for products. For example, when a new product is released there are a lot of searches online for it - showing heightened demand.

4. (1) Show a random sample of google users a list of alternatives for the current dangerous site when the site has a malware/phishing warning. (2) Try showing users a cached copy of the compromised site, so they can see for themselves that the site is faulty or dangerous without having to see the actual site. There are many dangerous sites on the internet (that are either infected with malware, or phishing scams, etc) and google attempts to warn us with "safe browsing". However, the click through rate (CTR) is still higher than ideal (yahoo tech). I want to examine what kinds of reasons would make someone ignore the warning and visit the site anyways. If google can perfect the quarantining of the danger website then more users will be safe.

5. What ads do better for certain industries? Ads contribute to consumer's choices to buy certain products, and knowing what ads are popular could mean the difference between one sum of profit and another.

6. If I was an outside developer and I had access to all of Google, I would try to reverse engineer their proprietary search algorithm. Websites frequently focus on search engine optimization, and while this does work, I believe that there is an easier method. Understanding the algorithm and how various parts of it interact with other elements would allow outside parties to waste less time on optimizing their words and focusing more on producing great content. Without worrying about SEO, sites could create longer form content that is much more informative than a 200 word blog post that is perfectly optimized, and the world would be better off.

7. Google knows enough about me that in a not insignificant number of cases it can likely create pages of search results before I start searching. Users are, in some cases, now wasting time typing into the search bar. Creation of a new button that links users to a dashboard of highly personalized resources may increase the engagement users have with the information and reduce the amount of time and cognitive effort that goes into foraging for that information. This page and truly anticipatory-search could also create a "now trending" or newsfeed-style platform for interaction.

8. What are the things that motivate people from traditionally underprivileged groups, and how do they differ from things that motivate people from traditionally privileged groups? Do the habits between these two groups of people differ? Is the differentiation between underprivileged and privileged binary, or is it a spectrum? Ultimately, do people really have the power to choose their future (as evidenced by the results of their habits or motivations), or is their future already chosen for them by situational circumstances? You should care because we're taking the traditional biological debate of nature vs nurture to data scale. This is no longer evolution of genes, but the fine tuning of humans as a species.

9. Could someone's search history accurately predict an attempt of a crime? Such as if someone's recent searches are a gun, dark clothing, police activity, etc would it be able to accurately predict whether someone would soon commit a crime?