Cambridge Analytica � A Case Study

Objectives: This study discusses how Facebook users’ data has been harvested, used to formulate an algorithm to understand users’ personality traits and in-turn use the process to influence the outcome of US Presidential Elections. Method: A Quiz application was developed to collect Users’ data. Their activities on Social Media were analyzed, patterns were detected, OCEAN scores were given, and user groups were made on the basis of their political affiliation. And finally, they were targeted with suitable ads and news to achieve desirable results. Findings/Application: Use of Internet is increasing day-by-day and so is our Digital Footprints. Companies like Netflix started using these data to understand their Customers’ behaviour and to improve their Customer Oriented Marketing Strategies. It has been observed that activities on Social Media say a lot about the users and sometimes it is used by companies harvesting user data in unethical ways. 
Keywords: Cambridge Analytica, Facebook, OCEAN score, US Elections


Introduction
The use of information to manipulate people is an old concept referred to as 'theory of mind [1][2] . Though this concept has various perceived uses, an important use in today's socially connected world is to deceive. Deception is an evolutionary trait in some animals E.g. Camouflage or acquired gradually by learning in case of humans. This combined with the human intelligence makes the level of deception sophisticated or rather scary.
This study deals with a recent scandal involving Cambridge Analytica, Facebook and the US Elections, where known information was used to deceive and manipulate people in order to change their political views and votes.
Cambridge Analytica was found to be using Facebook Data sourced from a Cambridge University professor to work for a US Presidential Candidate. This malpractice was exposed by the former director of Cambridge Analytica, Christopher 'Chris' Wylie. The estimate of user data breached ranges anywhere from 30 million ~ 80 million profiles.

Literature Survey
According to 3 Cambridge University professor built a Facebook Quiz app that exploited a loophole in Facebook API that allowed collecting the App user's data as well as their friend's data. Although Facebook prohibited selling of user data collected via this method, it was sold and misused.
Medium in its case study has explained what and how psychology and behavior along with propaganda can manipulate people to do things intentionally or otherwise. They have linked it to the Cambridge Analytica Scandal by quoting how data driven marketing techniques can change behavior in target populations irrespective of the domain.
In 4 study has defined what the data actually looked like and explained the process of how it worked from Data Collection to Prediction. He has also mentioned the use of Machine Learning, role of AggregateIQ and OCEAN score.Thenextweb, in its article has explained how Cambridge Analytica made the use of profiling to swing neutral voters in favor of a particular candidate 5 .
The Guardian noted that a scientific study conducted in 2013 revealed that liking curly fries related posts on Facebook gave clues about intelligence, similarly, liking hello kitty indicated political views 6-7 .

Data Collection/Mining
A Cambridge Professor created a Quiz App for Facebook for understanding user's psychology. The terms of service stated by Facebook at that time mentioned the developer was permitted to harvest user data for research purposes but did not explicitly mention if the developer was allowed to collect the user's friend's data as well. Whenever a user would sign up to do a study, they would be given a survey to complete. The survey contained a Facebook Login Button using which the user logged into the app to do the survey.
As soon as the users logged into the App, they would have to authorize the App to have access to their user data. Although the authorization was only for their user data, inadvertently they authorized data collection of their friend's user data as well.
The data that was collected included their name, gender, location, ethnicity, education level, the pages they liked, the brand of clothing they wore, etc.
Starting with 250000 nodes, the professor/CA was able to collect data of approximately 75 million nodes.

Data Management (Categorization and User Profile Creation)
Based on the data that was collected, user profiles were created for each person and categorized accordingly. Based on the geographic area they were residing in the US; user profiles were shaped. E.g. Users near border areas were concerned with immigration, so they were clubbed in anti-immigration voter profiles. Users residing in hinterland were concerned with reduction in manufacturing jobs and construction of oil and gas pipelines through Native American villages, so they were clubbed together during profile creation. Ultra-High Net Worth Individuals residing in posh suburbs or downtown areas were concerned about tax breaks so that was the key factor for their profile creation.
Based on the brand of clothing users wore, their presumed political affiliations were derived. E.g. Denim brands like Wrangler and L.L. Bean have been historically associated with conservative voters while Kenzo, another denim brand, is associated with liberal voters.Therefore, the user profiles were further refined according to their thoughts i.e. conservative or liberal. See the Figure 1.
Classification methods were used for categorizing user profile based on OCEAN scores. OCEAN scores stands for degree of Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism of the user see the

Machine Learning
Regression used for training Machine Learning model on available data sets and predicting a new user's political affiliation.
User's Profile with psychographic data, their activities on Facebook (likes, posts, shares) and OCEAN scoreall were used combinedly to predict information related to political affiliation. Machine Learning Model also considered the user's concern over some policy and the changes user wants to see the government to implement.

Micro-targeting
The result of the Model is used to create a valuable Campaign.
Users are presented with the advertisements keeping in mind their political views. Psychographic Microtargeting Advertisements are presented in such a way that either it will enhance the belief of a user or it will reinforce the pre-conceived notion related to any party. E.g. If a User is concerned with immigration policy (Anti-immigration User Profile Class) then he/she will be targeted with the ads explaining how a particular party is working in that area. Or, a rich class will be targeted with ads how well they are working on economic policies.
The ads are well customized to fit user's psychographics, aimed to yield maximum results.

Observations
Based on the above analysis, it can be observed that Social Media platforms such as Facebook and analysis companies such as Cambridge Analytica and unethical developers can use simple, trivial things such as posts, comments, likes and shares to gather sensitive information about user such as race, gender, orientation as well as provide a hint as to the political affiliation of a person. A few random likes can form basis for weirdly complex character assessment.