Andrew Trask, founder of the OpenMined project, will be talking at Newsweek's AI and data science conference in New York.
People don't know it, but they are producing the most valuable naturally occurring resource on the planet – data. This value isn't realised, either from a personal perspective, or in terms of the vast societal benefits which data can deliver in the future.
We perhaps like to think this is all a big conspiracy pulled off by technology platforms like Google, but there are in fact a clutch of structural obstacles preventing data democracy and utility.
OpenMined, which combines artificial intelligence with homomorphic encryption, multi-party computation and blockchain, is one of handful of new projects (see also Numerai and Ocean) working at the technological cutting edge to solve this problem.
In order to work effectively, artificial intelligence needs lots of data. The standard machine learning business model takes data from individuals in one way or another, aggregates that into the ownership of a company, which then trains a model and sells that model.
All the data assets must be held by a single entity; machine learning startups working in healthcare, for instance, will buy patient data, such as MRI scans, aggregate this into one location where a data scientist will train a model and sell that back to the hospital system. In Silicon Valley, firms acquire data, train a model and sell the use of that model in the form of an application.
Andrew Trask, founder of OpenMined, has been looking at ways to decouple ownership of the data and ownership of the model. Earlier this year, he blogged about how to train a neural network using homomorphic encryption – a special type of encryption that allows someone to modify the encrypted information in specific ways without being able to read it – thus protecting the data and fundamentally altering the AI ownership power balance.
"The interesting thing about hiding the model in these various different ways is that it gives me the ability to allow you to have limited access to add value to the model, without you being able to steal it," said Trask.
There are a few rather complex moving parts in the OpenMined model, but where it leads is simple and unprecedented. This combination of technologies will allow anyone to rent access to their data with the potential to create a revenue stream in return.
"The revenue stream for data is some sort of statistical aggregation needing machine learning; it might be a simple statistical model, it might be an advanced neural network. But that's where the value is being created. Data is like the oil, the natural resource that's going into it.
"Now for the first time we are within spitting distance of private machine learning being real, where people could actually reap that level of revenue without ever sacrificing anything.
"I think that it's a big enough idea to power a new kind of internet and a new level of expectation of consumers from the products that they want to be able to receive."
Trask said he would love for OpenMined to facilitate a universal basic income-style revenue stream, where people can be compensated for this immensely valuable natural resource they are generating without ever having to give up their privacy.
He said the truth is that right now people don't know that their data is that valuable and they don't have the ability to capitalise on it. "That's one of the things that OpenMined exists to change. The biggest hurdle is not technology, it's actually convincing consumers to do it, to change their ways, to aggregate their data into one spot, and train a model."
There has been some tailwind with healthcare in the US, where every individual consumer has a right to demand access to their data from any healthcare provider. This data becomes significantly more valuable as more information about the same person becomes aggregated. For instance, adding to you location data, what food you are eating, whether you are sick or healthy; the more of a 360 view, the more amazing things you can do with machine learning, said Trask.
The problem is that no hospital network wants to give up their most precious asset to another hospital network. What they lack is the technological expertise to aggregate it, and the platform with which to take advantage of it.
Solving this problem paves the way for a whole new wave of possibilities that were previously too personal for anyone to even dabble in, such as training a classifier to predict things like the onset or recurrence of mental illness, or the likelihood an individual might attempt suicide.
"In theory, using things like homomorphic encryption or multi-party computation, we can protect the model and send it out in the wild," said Trask.
"So a big healthcare network could try to take various aspects of your life and predict whether it's increasing your chances of cancer: everything from where you go, to where you eat, what part of town you live, how far you live from a gas power plant – all these personal things, lifestyle issues, that could be predictors."
An essential tool has presented itself in the blockchain, a type of immutable, shared database with no central governing entity which offers a decentralised way of storing value and providing incentives. In this case, the vision is decentralised ownership of data and intelligence; by pushing the control of the primary resource for AI into a decentralised format, the benefits will propagate in a decentralised manner.
"It's interesting," mused Trask, "we didn't start by looking into blockchain. It surprised a lot of people; like we are not looking to ICO, we are not doing any of the kind of normal blockchain things.
"The core technology for us is machine learning. But blockchain is our way of avoiding an awkward situation. We are interested in decentralisation because we are attempting to aggregate the most valuable digital asset ever.
"We need to implement hard structures that ensure it is owned and leveraged in a decentralised manner, so that every individual person is sort of voting for their own benefit, and maybe of their local community.
"So perhaps 50 people can train a model that does something useful and a whole other marketplace can come; the blockchain can host it and remunerate the individual who owns that model in a fully transparent way."
In terms of blockchain flavours, OpenMined is planning to dual-release its alpha version on both the Ethereum and Tendermint testnets. The platform requires fast transaction throughput, and has formed a useful partnership with Tendermint and its proof-of-stake implementation. OpenMined is using the Ethermint protocol to run Ethereum code (Solidity) on the proof-of-stake Tendermint blockchain. Trask said this can do something like 2000 transactions per second.
On the subject of performance, homomorphic encryption is known to be computationally onerous. Trask said: "We are looking at two technologies – homomorphic encryption and multi-party computation; each better in different scenarios.
"Homomorphic encryption is great if you've got small models; most data scientists use models that are actually pretty small on structured data, like linear models, SVMs, logistic regression, that kind of thing.
"However when you get to the big models, it's a lot more feasible to use something called multiparty computation, which basically trades off the computational complexity for network overhead. We know certain things about our participants and that allows us to have a really low number of people who are doing the sharing in multiparty computation, which allows it to be extremely performant."
Andrew Trask, founder of the OpenMined project, will be talking about homomorphic encryption and deep learning at Newsweek's AI and data science conference in New York.
People don’t know it, but they are producing the […]