there’s no doubt that those of us with the privilege of the internet, English language, and of safety from horrors unfolding in different parts of the world, are currently witnessing a technological revolution not seen before. Artificial Intelligence, emerging from the decades-long effort of Information Technology, and sweeping over our understanding of what it means to be human, what our education models mean for the future, how economies get reshaped in the face of such massive change, is set to pivot the general direction of human history as we know it.
This kind of sweep happens once in a few hundred years, perhaps thousands.
For example; the move from oral storytelling to the written word was a new technology. And all kinds of tablets and scrolls were just tweaks from there. The printing press was a new technology because it changed how the written word was distributed – from small clusters and villages to national and regional scale. After that, every newspaper, tabloid, book format was a tweak. Video was a groundbreaking technology. Movies, reels are all an extension of it. Sound got added to it eventually, and it changed how we consumed a story (in many ways, returning to our more instinctive forms of communication). Then came the digital sphere – a whole different beast because it did what the printing press did – but at a far vaster scale, taking distribution from national to international territories.
With every legacy step forward into a new technology, the thing that has changed is the existence of long-term data, available in more hands at any given point of time. What do I mean by that? When oral stories were the technology, data, that is the stories, could mostly be held by a few people at a time with authenticity, before the chain of information altered and diluted that piece of data.
Then with writing, you could increase that number to a few hundreds. The first pieces of information written down were hand-written, so the speed of data-conversion (from speech to text) was incredibly slow, as well as labour-heavy. Certainly a vast chunk of knowledge must have been deleted in this data-transfer. Then came books, and suddenly, we were writing down a lot more and sharing it more widely. it was a faster but more expensive technology to begin with, so who wrote the stories in book, changed. And who had the capital and the vision to distribute it to the world. What got written got believed (unlike oral storytelling). We began to prioritise the written word over “hear say”, shifting the dynamics of power in regional, national, and international order. because printing press came with the age of explorers and then colonial oppression, and because Europe pioneered those efforts – the stories that got told at scale were, and have been, largely euro-centric.
When the printing press was gaining traction in Europe, north India was under the rule of Mughals and the south and east functioning as separate units. they all continued to resist its use and hand-wrote their stories into books and paintings. For at least a 100 years, the printing press did not get widespread traction in the Indian region, and even when we did get it in 1556, it was through Christian missionaries in Portuguese-Goa. This is not about religion, but the stories that got printed were largely to the benefit of European masters, where at the time religion and the State were not separated.
It has long been established that one of the things India as a region lacks the most is documentation. What systems we have were British-introduced, and in mass culture at a global scale – it was their narrative that got accepted. What headway we made as a country was when we adopted these technologies in the fight for freedom. We suddenly had our newspapers, magazines, pamphlets. But I think it was too little, too late.
Even in literature, we took 100s of years to write down our stories – our culture of oral storytelling, while rich, did not suit the demand of the day – mass production and distribution. Because we had a general dearth of written content and non-standard systems of organising them as a country, we were at a HUGE loss when the next wave of technology was being built.
It is old enough now to not remember, but the early internet was built because in the archives of American documents and images, millions of media assets were being digitised and uploaded. Goodreads did not build a multi-million dollar database through magic – it required years of patient effort of cataloguing and using existing written content to become what we now know as ‘the vastness of the internet.’ The early internet and large parts of it today was also in English – the lingua franca of the world and so, in a double-disadvantage, knowledge in the Indian languages remained un-datafied. So think about it – you have this whole system of knowledge that has nothing of your own stories. You become slowly and surely invisible in the grander scheme of things.
Architecture, medicine, music, arts, literature, maths, food – everything that gets recorded and studied has been by and large – euro-centric, and now, america-centric (which is largely and extension of euro-centricism).
Over the last 10 years, that has been changing. You see a lot more scholarship on indigenous cultures and peoples. We developed indic fonts for the web for instance, advanced OCR technology for non-Romanic scripts, recorded pieces of our tangible and intangible heritage on social media, told our stories, shared vastness of our histories. But just as it seemed that we are catching up, we have a new technology.
The big beast of the moment. AI.
And we stand the risk of repeating every mistake we have made in the past.
when you don’t write the rules of a new technology, don’t use it enough, you write yourselves out of history.
In the field of Digital Humanities, we learnt an eternal truth – creators of technology write their biases into the tech they make. AI was developed over decades by mostly white men. It’s trained on vast amounts of data. The diversity of that dataset is more today, but not enough.
Yet, every interface’s default settings still remain English. Unless prompted, it will not even bother to look at documents outside the English language. Unless specified, it will use white-ness as a default. It is not taking the burden of fact-checking data about India, it does not know what is a stereotype and what is true cultural representation, it needs to be prompted to get that information out. And how many of us are actively doing that?

Which means, who and what gets written out of histories tomorrow depends on how we build and use AI today.
Already, studies are showing that women are widely unrepresented in AI spaces. “While having more women and underrepresented groups in AI won’t automatically fix existing biased data sets, it will drive a push for more diverse and representative data collection over time.” (Source)
Today, the cost of AI can be afforded only be a privileged few among the minority. And among them, how many of us are really putting out OUR stories and knowledge? How much of the data in non-English (and to an extent, non-Hindi) languages being captured? I don’t know and I am happy to be corrected. “It is important for the Global South to take an active role in managing its data and narrative – through collaboration, standardization, and investment,” said Dima Al-Khatib, Director of the United Nations Office for South-South Cooperation.” (Source)
If you try to generate AI voices and videos today, you will know what I mean. The aesthetic, the vocal texture, the accents – everything is hyper anglicised. All software that helps you in writing will flatten your voice to a global standard, one set by white people. Speech to text has seen a lot of improvement over the years, but it’s still not as seamless as if you are an American. I don’t know how well it works on say a heavy Bengali accent or a heavy Malayalam one. If you have insights, I would love to know.
***
The thing is; AI has definite ethical concerns – most universally of them all – the environment. But why must the burden of ethical workflows fall on the shoulders of the already marginalised? Why are our stories an afterthought in data-collection? Who takes the responsibility of digitising our stories for future data training? And why must we be shy of doing that?
We need to be using more of AI. We need to be building our own software, that recognises us. We definitely first need to be building a lot more truly representative datasets. As history proves, legacy belongs to those who can write their stories into a new piece of technology. Someone who can translate the stories, culture, and languages of the land into the most wide-spread medium in use.
Today, and for what seems a long time, that is the screen, and that tech is the AI.
Disclaimers
I use only the perspective of being Indian and a woman, because that’s what I understand and know. That’s my lens. It does not mean to invalidate all the other intersectional minorities and their representation challenges.
in this piece i explore a more civilisational effect of how technology moves communities into the next generation. early disclaimer, this generalises a lot. but I am here tracking trends at the scale of 1000s of years, so zooming in on specific cases and outliers is not easy.
This is by no means an exhaustive analysis. that is the work of many years. I will continue to write and build upon it. but this is a call to action – to reclaim our cultural, linguistic identities online. to build usable datasets.