Thursday, September 24, 2020

The societal implications of AI- Seeing is believing — or is it?

We are used to believing what we see. When we see a leader on the TV stating that their country will engage in a trade-war with another country, or when a well-known company spokesperson announces an important business decision, we tend to trust them better than just reading about the statement second-hand from the news written by someone else.

Similarly, when we see photo evidence from a crime scene or from a demonstration of a new tech gadget, we put more weight on the evidence than on written report explaining how things look.

Of course, we are aware of the possibility of fabricating fake evidence. People can be put in places they never visited, with people they never met, by photoshopping. It is also possible to change the way things look by simply adjusting lighting or pulling one’s stomach in in cheap before–after shots advertising the latest diet pill.

AI is taking the possibilities of fabricating evidence to a whole new level:

Face2Face is a system capable of identifying the facial expressions of a person and putting them on another person’s face in a Youtube video.

Lyrebird is a tool for automatic imitation of a person’s voice from a few minutes of sample recording. While the generated audio still has a notable robotic tone, it makes a pretty good impression.

Changing notions of privacy

It has been long known that technology companies collect a lot of information about their users. Earlier it was mainly grocery stores and other retailers that collected buying data by giving their customers loyalty cards that enable the store to associate purchases to individual customers.

The accuracy of the data that tech companies such as Facebook, Google, Amazon and many others is way beyond the purchase data collected by conventional stores: in principle, it is possible to record every click, every page scroll, and the time you spend viewing any content. Websites can even access your browsing history, so that unless you use the incognito mode (or the like) after browsing for flights to Barcelona on one site, you will likely get advertisements for hotels in Barcelona.

However, as such the above kind of data logging is not yet AI. The use of AI leads new kinds of threats to our privacy, which may be harder to avoid even if you are careful about revealing your identity.

Using data analysis to identify individuals

A good example of a hard-to-avoid issue is de-anonymization, breaking the anonymity of data that we may have thought to be safe. The basic problem is that when we report the results of an analysis, the results may be so specific that they make it possible to learn something about individual users whose data is included in the analysis. A classic example is asking for the average salary of people born in the given year and having a specific zip code. In many cases, this could be a very small group of people, often only one person, so you’d be potentially giving data about a single person’s salary.

An interesting example of a more subtle issue was pointed out by researchers at the University of Texas at Austin. They studied a public dataset made available by Netflix containing 10 million movie ratings by some 500,000 anonymous users, and showed that many of the Netflix users can actually be linked to user accounts on the Internet Movie Database because they had rated several movies on both applications. Thus the researchers were able to de-anonymize the Netflix data. While you may not think it's big deal whether someone else knows how you rated the latest Star Wars movie, some movies may reveal aspects of our lives (such as politics or sexuality) which we should be entitled to keep private.

Other methods of identification

A similar approach could in principle be used to match user accounts in almost any service that collects detailed data about user behaviors. Another example is typing patterns. Researchers at the University of Helsinki have demonstrated that users can be identified based on their typing patterns: the short intervals between specific keystrokes when typing text. This can mean that if someone has access to data on your typing pattern (maybe you have used their website and registered by entering your name), they can identify you the next time you use their service even if you’d refuse to identify yourself explicitly. They can also sell this information to whoever wants to buy it.

While many of the above examples have come as at least in part as surprises – otherwise they could have been avoided – there is a lot of ongoing research trying to address them. In particular, an area called differential privacy aims to develop machine learning algorithms that can guarantee that the results are sufficiently coarse to prevent reverse engineering specific data points that went into them.



