FLAME University

MEDIA

FLAME in the news

The Intersection of Data Privacy, Multilingual Accessibility, and Economic Strategy in India

www.cxotoday.com | February 13, 2025

India’s  AI Mission

In March 2024, the Indian government unveiled the ambitious IndiaAI Mission, allocating ₹10,372 crore over five years to bolster the nation’s  artificial intelligence (AI) capabilities. A significant portion of this budget-₹4,563.36 crore, or approximately 44%—was earmarked for developing compute capacity, specifically through the establishment of a high-end AI computing ecosystem comprising over 10,000 Graphics Processing Units (GPUs). However, recent developments have cast doubt on the government’s commitment to this mission. The revised estimates for the 2024-2025 fiscal year show a drastic 69% reduction in allocation for the IndiaAI Mission, dropping from ₹551.75 crore to ₹173 crore. Conversely, the budget for the following fiscal year, 2025-2026, has been significantly increased to ₹2,000 crore.

This change in funding pattern raises critical questions about the strategic priorities in India’s AI development. The initial heavy investment in compute infrastructure suggests a belief that computational power is the primary bottleneck in AI advancement. Yet, experts argue that while compute capacity is essential, the true linchpin of effective AI systems is high-quality, diverse datasets. The recent challenge of DeepSeek, a Chinese AI startup, that outperformed existing assumptions that AI development needed high investments, propelled questions around how open-source LLMs can be developed. Data serves as the foundational bedrock upon which AI models are trained and refined. Without access to robust and representative datasets, even the most advanced computational resources cannot yield meaningful AI solutions. In this context, the IndiaAI Mission’s allocation of a mere ₹199.55 crore to the Datasets Platform appears disproportionately low compared to the substantial investment in compute capacity.

Moreover, the ethical implications of data privatization in India’s digital landscape cannot be overlooked. The government’s plan to develop a unified data platform aims to provide seamless access to non-personal datasets for Indian startups and researchers. However, the specifics of how data quality will be ensured, the processes for curating or anonymizing datasets, and measures to prevent potential privacy violations remain unclear.

Given these considerations, it is imperative to reassess the allocation strategy within the IndiaAI Mission. A more balanced approach that prioritizes the development of high-quality datasets, investment in skilling initiatives, and the fostering of innovative algorithms could yield more impactful outcomes for India’s AI ecosystem. Such a strategy would not only address the ethical concerns surrounding data use but also ensure a more sustainable and inclusive growth trajectory for AI development in the country. 

Infrastructure or Data 

In conclusion, while the establishment of robust compute infrastructure is undoubtedly a critical component of AI advancement, it should not overshadow the equally important need for quality data and skilled human resources. A recalibration of priorities, with a focus on ethical data practices and comprehensive skill development, will better position India to harness the transformative potential of AI in a responsible and impactful manner. In today’s digital landscape, as information becomes more abundant and accessible, our vulnerability to data exploitation also intensifies. As explorers of digital humanities, we are positioned at the convergence of technological innovation and responsible data governance. The recent draft formulations of the DPDP gain a deeper understanding of the ethical implications of data usage and advocate for transparency in data-related decision-making. As  AI takes over our phones, cars, homes, and anything that is an important part of our daily routine, it becomes our fundamental responsibility to safeguard our information. 

AI systems thrive on what we create. We contribute to the so-called ‘user-generated’ content in a way where everything we do on the internet is tracked. The ads you see on Instagram are not just a coincidence. Algorithms consistently analyze and follow our behavior and preferences to make us see what we want to see. Platforms like YouTube and Meta continuously capture snippets of our digital footprints and tailor personalized content (you don’t have to be surprised when you see another article on data privatization tomorrow without searching for it). This poses a bigger issue underneath. In 2023, Meta planned to introduce an ad-free subscription version of Facebook and Instagram in Europe. Users can pay a monthly fee to avoid being tracked and targeted with personalized ads. They forced users to pay with money or privacy (Tech Crunch). There was a vast debate on whether Meta’s approach meets the EU’s legal standards. The EU’s Data Protection Commission also fined Meta $1.3 billion for improper data transfers from the European Economic Area to the United States in violation of the EU’s General Data Protection Regulation (GDPR). (Mega Fine for Meta: $1.3 Billion Penalty Imposed for Data Privacy Violations) 

Ethics, Language, and Law: DPDP 

Meta AI holds the largest market share in India, and usage has been rapidly growing, with billions of queries already asked within two months of its launch in June 2024. With India’s population, the threat of data leakage is exponentially greater. Data generated by Indian users is currently processed in Singapore, which makes it essential that we work on laws for inter-country data movement alongside privacy laws. The Digital Personal Data Protection Act, or DPDP Act, passed in August 2023, is legislation in India that balances the rights of individuals to protect their data. Meta also plans to use posts and captions to further train their generative AI models as per their new privacy policy. The billion users of Instagram never signed up for this. AI is taking over the world and our right to privacy. Companies extract large amounts of personal data to profit from targeted ads, including sensitive information to maximize engagement and revenue. A “data cow” refers to a country, like India, that produces high volumes of data used by tech companies for profit, without giving any benefits or protection to its users, raising concerns about exploitation and unequal gains from digital activity (The Secretariat). 

When issues regarding data privacy arise, we must understand the importance of local languages in communicating informed consent and related laws. A user’s right to privacy is reduced when AI systems cannot even understand their language, leaving them unable to control how their data is processed. The IndicNLP research introduced IndicCorp, a dataset of 8.8 billion tokens from 11 Indian languages, essential for developing advanced NLP models like IndicBERT and IndicFT. These models enable cross-lingual transfer learning, allowing advancements in resource-rich languages like Hindi to benefit less-resourced languages like Assamese and Odia. Pre-trained multilingual embeddings support applications like multilingual search engines, virtual assistants, and chatbots, as well as government, education, and healthcare tools for non-English speakers. This research fosters linguistic inclusivity, ensuring AI-driven systems cater to millions of native language speakers, bridging the language gap for equal access to technology and privacy. 

As the intersection of privacy and technology becomes increasingly critical for individuals, organizations, and governments, it is essential to recognize its impact on our lives. In a world driven by data, the right to consent and data privacy are fundamental. Protecting our data is not just a necessity but a responsibility. Learning about data privacy is the first step towards data literacy and integral in empowering individuals to navigate the digital landscape securely and ethically.

Authors: Prof. Maya Dodd, Faculty of Humanities & Languages, FLAME University, and Mannat Mehra, Undergraduate Student, FLAME University.


(Source:- https://cxotoday.com/story/the-intersection-of-data-privacy-multilingual-accessibility-and-economic-strategy-in-india/#google_vignette )