Please note that this page is broken down into two companies (please read the email).

external image reddit-logo-01-674x501.jpg

Reddit is a platform for users to essentially submit content, such as text posts or direct links to websites. Such content can be factual in nature, or a user’s thoughts regarding a specific topic (in which case he or she will comment on a thread or begin a thread). Because users are only required to give Reddit a username and password to create an account and not an email address or any other forms of personal identification, accounts are essentially anonymous in nature. Thus, it’s only natural for the comparison (made by Steve Huffman) between Reddit and true human reflection to be made. When an event is recorded and uploaded to the Internet, that footage is preserved forever. Thus, interactions among us humans on the Internet – such as on Linkedin or Facebook, are also preserved forever. Keep in mind that these communities have user accounts tied to actual personal identities. In order for us to maintain a positive image, many of us tend to be very restrictive regarding our behavior online, resulting in our online presence being not very accurate of our true beliefs/selves. Thus, many believe reddit to be a much more accurate representation of the current worldview.

external image Sentiment-Analysis.png

Problem/Measure: How can we use reddit to affirm the general public’s consensus on events that impact the public – such as an education institution (like Cal) passing a new administrative policy, or the US government passing a new law, or a social movement (#blacklivesmatter)? This would be an amazing way to another angle of data backing (or going against) an event (instead of traditional methods of gathering opinions, such as poling).

Hypothesis (Solution): When an event occurs, we can create a system that scans the Reddit site for new threads regarding the event. There’s way too much data to scan past information and sentiments in the past regarding similar matters may be different from those of today. Before using text-based sentiment analysis, we must first filter spam using various anti-spam detection algorithms (including, but not limited to identifying if a poster is a known spammer or not).

Action/Evaluation: This data can be manipulated in various ways – by using a flow chart against time to see how the sentiment progressed as the event moved further along (maybe the public sentiment changed over time?). We can also analyze specific micro-sentiments – (maybe a specific group agrees heavily with 75% of this policy change, but not the 25%). This data can be used as as feedback for the specific agency/organization sponsoring the event/movement. If we can provide another outlook of what the people believe, we can provide another method to stem against the influence of big corporations (ie. the NRA?) in today’s politics. How will this new data improve the end user experience? It depends on the situation. In some cases, none at all. In some cases, it could dramatically provide another check against big politics (big corporations dominating today’s politics). If we can provide a way (that has a small error margin) to combat spam and irrelevant posts, we can really be able to monitor how the nation really feels. Polls are becoming increasingly irrelevant as the younger generation (ie. the future) is particilating less and less. This method doesn’t require one to go out of his or her way to speak his or her thoughts and guarantees anonymity, which guarantees honesty.

Measure: We are directly measuring sentiment.

external image palantir-logo-e1370933318746.jpg
The purpose of Palantir’s products is essentially to consume enormous collections of raw data and draw conclusions that people would not otherwise think of drawing from such data. Palantir’s founders started with an idea from Paypal, figuring out how computers could spot activity (ie. a series of random payments to a brand new account) at a global scale, flagging for human employees to investigate. The founders then thought that the same approach would work for national security. Fast forward a couple of years and the company has been involved with cybersecurity jobs for the CIA as well as national banks, such as JP Morgan Chase, for cyber security, fraud detection, and other work.

In practice, Palantir’s software gives users tools to explore connected data and tries to visualize the information (ie. maps that track people’s thinking). Think of it as an analysis tool that allows users to act on their own insights about suspicious activity in vast piles of data rather than wait for automated systems to discover it. It’s like a human-computer symbiotic relationship.

Palantir’s Instances have been used to help solve many crime cases, find terror suspects, and other classified cases. As a nation, America values human freedoms and living in such a place grants us the ability to practice many of our natural human freedoms that other people living elsewhere (ie. North Korea, ISIS controlled territory) cannot practice. There are a lot of things in the world, such as terrorism spawning from the Middle East, that was caused by small Western acts that soon blew up. For example, there have been allegations that the CIA trained Osama bin Laden and his fighters for a specific mission. These same people ended up becoming the founders of al-Qaeda. It’ll be interesting to see a Palantir Instance used to analyze the complete history of the United States (or even the world, if we had enough processing power), mark specific trends and causations of such trends. We can then run both major and minor policy decisions by the US to give a “data-driven” analysis on any possible future ramifications of such decisions.

Much of Palantir’s software, architecture internals, and specific use cases are classified (as the company is a private company in contract with various government agencies). As thus, it’s very difficult to create a proper PHAME analysis. Even the data sources are uncertain as often times Palantir Instances are customized for the needs of a customer and deployed to the customer using the customer’s existing data sources.