This MIT Technology Review article on “open source AI” argues that Chinese companies, with governmental backing, are embracing an open source approach to AI. They don’t, it is typically only open models that are being provided, not full-blown open source AIs. However, the story is appealing. Still the underdog when compared with the US, China is leveraging an open model strategy to unseat the incumbent, breakdown moats, win hearts and minds, and take the lead.
In software, open-source software played well by companies who know what they are doing can achieve this. Open-source software is a strategic weapon: Open-source alternatives keep the incumbent’s pricing power in check, increase pricing range through lowered costs to reach more customers, and establish new ecosystems that grow a company’s addressable market. Trying to transfer this to AI models won’t work if these aren’t true open source AIs but only open models, because you won’t establish the (commercial) community needed to carry this forward.
That said, given the high cost of foundation model work, many are only too eager to use a cheap Chinese alternative to US work, much like some people like to use cheap Chinese EVs or Laptops or mobile phones. Cheap does not mean inferior quality. Cost savings serve to overcome the mistrust of products by companies under the thumb of a government that has little good in store for the Western world.
So far, mistrust of Chinese AIs appears to focus mostly on code AIs and the worry that these might introduce backdoors into software being developed with their help. I’m not privy to insights as to how serious this threat is today, but it is a believable threat to me.
There is another threat lurking in open models created by adversaries: Cultural colonialism. As foundation models get trained on data, that data and their processing informs the worldview that gets embedded in the models, open or not. This in turn then informs any answers that an AI gives as part of processing their tasks. You may remember the automatic soap dispensers that worked only for white-skinned hands and not for black-skinned ones? Make a guess as to what data they have been trained on. A quick search for Chinese-specific health concerns provides a long list of issues that are less common in the West. Imagine a health AI trained primarily on data that caters to people of Chinese descent. Will a Western person get good advice?
Beyond biased products, where product management and testing should capture such inadequacies, cultural colonialism goes even further. Powerful government and players already require that political worldviews (subtly or not so subtly) be embedded in the foundation models of companies they hold sway over. In response to the success of Wikipedia, Elon Musk announced Grokipedia, apparently as an “alternative reality” to the alleged wokeness of Wikipedia. Chinese models (pre)tend not to know about the 1989 Tiananmen Square protests and massacre, even though it has proved to be hard to completely remove this from the training data. I suspect we are only at the beginning of understanding how to imbue foundation models with a world view.
Ironically, this is a chance for Europe. With mistrust at an all time high as to the intentions of the superpowers, Europe’s democratic reputation may save the day. If there were three equally capable foundation models, one from the US, one from China, and one from Europe, I know which one I would choose for work, and not just because I’m a European.
Sadly now, there aren’t many such models. Some, but not a lot. And it is a high-stakes, high-capital-requirement, arms race.
Which brings me back to true open source AI, not just open weights, but all components that allow anyone to fully read, modify, run, and distribute the work. Assuming there is little chance for individual companies of catching up to US-based and Chinese companies any time soon, European companies should join forces to develop such foundation models. The members of such an open source AI consortium may not make money on the models directly, but they can take the air out of the money the US and China are making, much like Intel pitted Linux against Windows, shifting revenues to adjacent layers and components.
Any such consortium can only use open data for the open source AI work and hence will already be at a disadvantage to the Anthropics and Googles of the world who buy your facial, identity, and health data to train their AIs, something which an open source AI consortium can’t do. For this reason, the license of an open source AI must be a permissive one, allowing individual companies to build their own proprietary extensions, reprehensible or not.
If you think that these European companies would be serving the world and not themselves, losing even further in the battle for AI supremacy, think again. A second-order, often overlooked or at least undervalued benefit of open source consortia with a geographical focus is that collaboration and its employees (a) tend to stay in that geographical area (b) yet move between employers, diffusing the gained knowledge faster and better than any educational outreach could do. Such skills build-up on all practical levels is critical if European companies ever want to catch up.
The EU Parliament plays a role here, but it is not to fund the development. European companies need to solve that problem for themselves; if the EU was to offer money it would turn honest problem solving into a opportunity to siphon of money from the government with no relevant outcome. Instead of funding such development, the EU could do what it does best: Regulate to protect its citizens. How about a test suite or a certification requirement that AIs used in products do not promote a superpower’s cultural values? Not harm citizens because of incompatible built-in world views?
The world already has a common denominator where opinions and biases and points of view are discussed until they are settled: Wikipedia. I would try to use Wikipedia as a ground truth to develop a comprehensive test suite for foundation model AIs to see whether they have been imbued with opinions and cultural values that run counter to the outcome of the continuous vetting processes of Wikipedia. If it does, we can safely assume that the model has been tampered with for ulterior motives.









Leave a Reply