The case to Decentralise AI computation and save privacy

Utkarsh Saxena October 29, 2018 11 min

Beginning from the Hellenistic period up until the first world war, land was the most valuable asset in Europe. Powers that we would fight each other to capture more and more land. More land meant more power.

In the Industrial Age, after the first World War, machinery became a source of power. By the Second World War, the machinery and the oil that drives it had become the most valuable asset. They say that Data is the new Oil. Since data drives the new industries in the information age, it has become the new asset that’s being subject to the land grab.

The new economy is driven by providing value in the digital space. The five most valuable companies in the world today are tech companies. Companies like Facebook and Google have lots of data on you, whether you use their services or not.

They need this data to serve their personalized advertisements. And these companies control a majority of the internet today. Naturally, the questions arise around the misuse of data, both in terms of privacy and in terms of manipulation.

A recent example is the dystopian display of manipulation pulled off by Cambridge Analytica where it used Facebook data on individual users to manipulate two important votes — American Presidency, and the Brexit Vote — by pushing propaganda at a micro-level. This misuse of data is what we need to worry about at the moment.

But for a moment, let’s forgive Facebook’s careless handling of their user’s data, let’s forget the immediate danger posed by political manipulation, and peek into the future. The next big problem is corporate manipulation.

But for a moment, let’s forgive Facebook’s careless handling of their user’s data, let’s forget the immediate danger posed by political manipulation, and peek into the future.

These targeted ads can indeed be manipulated to influence our decisions, on what to buy and where to spend our money. When corporations start to tell us what is the right investment, we get ourselves a world without choices (or a world with forced choices).

In this ‘world without choices,’ we are told what is the best educational course to buy, where to shop, where to get our food from, what services to use, without ever knowing that there might be better alternatives that provide more value. Arguably, those small businesses will not have enough purchasing power on this targeted ad space. Not only does this create inequality, but it also breaks the spirit that the internet was founded upon, it gives more and more power to the already powerful entities.

Also see: When advertising agencies take over the world: A huckster’s utopia of the future

And now, let’s take a step back and focus on the bigger problem that is quite political in nature. There are cases of increasing mass surveillance powered by AI, China being the leading example. AI gives us the power to process large amounts of unstructured data and gain valuable outputs from it. This makes it easier to do psychological profiling of an individual using all the information she or he generates online. This psychological profiling can be used to censor dissent, profile individuals and then subtly push the propaganda at them by controlling the information they see. This is called big nudging.

Centralising data of people in one place is hazardous, and scary for the future of our societies. But for internet monopolies to use our data, to give us the services they give for free, it’s necessary. Using AI enables many more services to be implemented that make our lives efficient and organised. But the in the current paradigm, most of the data used for AI is stored on a server. But storing all this data in centralised servers and with near monopolies, puts us at the risk of an authoritarian future. But then, the benefits of AI are far too good to give up. So what’s the solution? Can we find a middle ground? Before we get to that question, let’s look at what the term ‘Democratised AI’ means and how it is being misused by big companies.

Also see: The Big Eye: The tech is all ready for mass surveillance in India

What Democratisation of AI means today and what it should mean

Democratisation of AI has been promised by many organisations. Google and Microsoft are at the forefront of democratising AI, but they mean different things. Both the companies democratise AI by inducing AI into their services to make them better and more personalised for each user. Google takes it a step further by publishing most of its AI research online; they also introduced Tensorflow (a Deep Learning Library). Keep in mind that Tensorflow was largely made efficient by the open source community.

However, this is not true or complete democratisation. They publish their AI research, but the data to power that research to make something useful out of it is Google’s property. So, the training data is not democratised. It’s accessible only to the company. Same goes for computing power. Only these large organisations have the compute power to experiment with AI, to make it strong and useful for people. And most of the AI talent acquired by these companies are focused on solving tasks specific to the company’s interests.

Also see: Interactive: 100+ Indian AI Researchers, Ranked by H-Index, Citations

The openness policies on AI research that gives these companies good brand value — both in the media as well as the research community — does not actually pose a risk to these companies because AI research today suffers from reproducibility crisis. A majority of AI research is not reproducible.

If an independent researcher were to follow the guidelines and the methods described in the research, they will not be able to get the same results as mentioned in the paper. And the results will be generally poorer, even if this independent research is a collaborative effort, as is usually seen on Github.

Smart, independent people coming together to reproduce a research paper on open source platform usually fail to get impressive results. There are multiple factors behind this, but this is why it’s okay for companies let their research work out in the open because it is hard for anyone outside these powerful labs to transform research into efficiently working products.

So the idea of democratising AI as put forth by these giants is not true democratisation. It doesn’t give any individual in any part of the world the ability to use AI and design solutions. And this also ensures that most of the AI being actively developed and worked upon is used in the products and services of these giants, so that they can keep their users engaged. AI is too powerful a technology to be used only to make digital services more efficient.

Also see: A top Googler talks about the ethics of AI and job losses

Coming back to the question of user privacy. How can we ensure complete privacy of our data? The current internet is not designed to keep data private. The efforts in the direction of Web 3.0 vouch for true ownership of personal data. In such a paradigm, the data is not stored on the cloud, but rather in the owner’s device. No central authority, or a corporate, or a Cambridge Analytica can pull up this data together and do whatever they intend to. The smartphones and tablets that we use are getting powerful every day, and so the ideas of Web 3.0 revolve around decentralisation. All the applications and their computations will happen on the devices, with no central authority pushing or pulling the data. The user-generated data doesn’t leave the device. This ensures complete privacy of the user’s data.

If we keep the data private, how will the research organizations around the world access this data?

But we need this data. Millions of humans own and generate data that holds the secrets to cure diseases, make life better, personalise and enhance education, distribute resources better and other things that will push humanity in the right direction. If we keep the data private, how will the research organizations around the world access this data? Moreover, most of the user-generated data online is involuntarily given to the collector, and thus the insights that can be gained from this online data are usually limited to the digital space; targeted ad campaigns, personalisation of services only benefitting the corporates, and internet giants, but not necessarily the people. As an example, there is data available on Twitter that can be analysed to build psychological models of the users’ likings, preferences, and predicting what they will likely buy if advertised to them. But it is hard to find data about the sleeping patterns of users along with their diet intake, and their geolocation data recording their daily locomotion to build AI models that will help these users understand their bodies and habits better.

Also see: India’s big data hunt for cures to its mental, ageing-related diseases

This is primarily because providing this kind of data requires the users to have some incentive, and also this data is sensitive in nature making it hard for people all across the world to share this data and to benefit from the information processing that the AI will do. If this data were available publicly, any good-willed research scientist can apply her or his skill set to build an AI model.

So, how do we glean insights but at the same time protect user’s privacy?

Decentralising Data and computation for Artificial Intelligence

The solution lies in decentralising the AI computation. The AI solutions today are built by first collecting the data in a silo owned by a central authority. Then these data points are fed into an AI model in batches. The model learns and becomes more and more intelligent as more data flows through it. This process is overseen by a data scientist. These computations are quite expensive to perform using cloud services provided by Google Cloud or Amazon AWS, and buying the hardware to perform these computations is even more expensive. And even if the computation is made affordable, the privacy is leaked by default because the data storage is centralised. How do you decentralise AI?

Google introduced the idea of Federated Learning where instead of storing all the data into the central server and performing the computations there, we send the machine learning model to the individual user’s devices where the locally stored data is used to train this model using the compute of the device itself. Then, these updates to the model are sent back to a central server where the updates from multiple devices are combined to update the intelligence of the global model. This newly gained intelligence is then broadcast to more users who train it on their local device with their local data. The cycle keeps going, until the AI model has reached the desired criteria for intelligence.

Each individual contributes their computation capacity and their data to a central intelligence. There are many benefits to this.

With this paradigm of AI model training, we remove the central authority for computation and data storage and decentralise the AI computations. Each individual contributes their computation capacity and their data to a central intelligence. There are many benefits to this.

First, the data remains on the device of the user, enabling privacy by default. Second, the computation costs for training the AI model in a decentralised manner are significantly cheaper than the centralised way as the data scientists do not need to buy expensive hardware or rent out expensive cloud servers. The people offer tiny compute power just enough to make their little contribution to the model’s intelligence. The modern devices are capable of running these computations without harming the device. And third, the intelligence gained from this method is guaranteed to be at least as strong as the intelligence gained through centralised computation, however, with varying use cases and data science methods, the intelligence to be gained in a decentralised manner can be made more accurate than if done in a centralised way.

If the people are given assurance that their privacy will be maintained with some incentives to contribute, they’re likely to allow AI models to access data that we can extract from the user.  We can extract out intelligence from even the most sensitive data this way, without infringing their privacy. This opens new doors for the kind of intelligence we can extract, we can unearth new kinds of data and solve bigger and truly important problems. The cost of AI computations will reduce, and the applications of AI will be bounded by the limits of our imagination.

In this paradigm, a data scientist can come up with a design for an AI that personalizes educational content according to the attention span and the learning ability of an individual, have multiple people all around the world contribute to the intelligence of the AI by providing their personal data and compute, and this intelligence can then be used for the good of humanity. This AI model can remain public to be used by various educators around the globe. This will be true democratisation of AI: Built for the people, by the people.

About the Author: Utkarsh is a self-taught AI engineer and an independent researcher. He has a keen interest in information theory and building creative AI solutions. He dropped out of college to pursue his interests in AI. He has previously designed models for speech and art. He’s currently building a decentralised AI platform at Eder.