Amplify Initiative: Localized AI Data for Global Impact

DeepGeek
المؤلف DeepGeek
تاريخ النشر
آخر تحديث
Amplify Initiative: Localized AI Data for Global Impact

Generative AI models offer transformative potential across education and innovation worldwide, yet their impact is hampered by training data limited in languages, topics, and geographic scope. To ensure AI addresses critical local needs like accessible health information, culturally relevant education, and essential financial services, we require diverse, high-quality data representing global populations, their needs, and values in their native languages. Crucially, data collection methods must be locally respectful, community-oriented, and responsible. Introducing the Amplify Initiative: a project dedicated to building an open, community-driven data platform that scales novel data collection and validation globally. Through a pilot in Sub-Saharan Africa, we co-created datasets with domain experts, yielding an annotated collection of 8,091 adversarial queries in seven languages from 155 experts via an Android app. Amplify Initiative will expand this methodology to Brazil and India, identifying innovative methods for capturing knowledge absent online.

Amplify Initiative Drives Global AI Data Creation

Amplify Initiative designs structured, culturally relevant datasets using a community-focused app. This platform empowers users to:

  • Co-create participatory, structured datasets reflecting global needs. Building on the Sub-Saharan Africa pilot, Amplify Initiative connects regional researchers to define AI development data requirements, addressing region-specific challenges responsibly. Participants and researchers align on data needs to produce high-quality datasets.
  • Access high-quality, multilingual datasets for AI innovation. AI developers and researchers leverage Amplify datasets to advance techniques, models, and tools. Open data access particularly enables researchers from the Global South to apply AI for community benefit and solve pressing societal issues. Datasets support fine-tuning and evaluation, such as a Swahili misinformation benchmark or an Indian financial literacy fine-tuning dataset.
  • Receive recognition and rewards for vital AI contributions. The platform offers rewards and recognition, including data authorship attribution, professional certificates, and research acknowledgements. Future iterations will allow data authors to track their contributions' impact on AI innovation.

Amplify Initiative Pilot in Sub-Saharan Africa

Google Research partnered with Makerere University’s AI Lab in Uganda for an on-the-ground pilot. This collaboration focused on co-developing high-quality datasets with experts across Sub-Saharan Africa, leveraging Makerere's prior research into LLM harms within the African context. Together, the initiative:

  • Developed a methodology for collecting and validating data on critical domains (health, education, finance) with relevant experts.
  • Identified effective rewards for data creation, including compensation and certificates.
  • Established an ecosystem for data collection utilizing a dedicated app.
  • Trained and onboarded 259 experts in Ghana, Kenya, Malawi, Nigeria, and Uganda through workshops and app-based training.
  • Collected 8,091 annotated adversarial queries in seven languages, collaboratively authored by 155 experts across diverse industries.

Amplify Initiative: A Structured Data Collection Process

Before data collection begins, the Google Research team and partner institutions identify the region's most critical domains. Experts with relevant professional or academic experience in these domains then contribute to the data collection process, ensuring diverse perspectives address pressing local issues. Team members and country-specific research leads establish clear guidelines for data creation. Comprehensive training materials and hands-on workshops are provided in local languages, emphasizing responsible practices, potential bias mitigation, and effective annotation techniques. To ensure scalability, a privacy-preserving Android app facilitates training and data collection, communicating data goals and capturing locally relevant generative AI themes like stereotypes, specialized advice, and misinformation.

Amplify1_Hero

View the Android app featuring training materials on responsible AI and query creation.

Experts utilize the app to create and annotate data, receiving automated feedback to ensure query relevance and originality. Queries are annotated with thematic and domain-specific topics, localized for each participating country and adapted for regional recognition and compensation. The app streamlines the contribution process, ensuring experts receive appropriate rewards.

Amplify2_App

Explore the app's query creation flow with an example of a localized query and suggested annotations.

Following data collection, regional partners and country research leads with language and regional expertise meticulously translate, evaluate, and validate queries for local relevance, coherence, fluency, and coverage. The team also employs AI-driven automated approaches for translation and validation before finalization.

Amplify Initiative Pilot Data Delivers Critical Insights

The Makerere AI Lab and Google Research pilot collected 8,091 annotated adversarial queries in English and six African languages, including Pidgin English, Luganda, Swahili, and Chichewa. These queries are designed to provoke potentially unsafe LLM responses, enabling robust testing and mitigation of harm. This dataset serves to evaluate model safety and cultural relevance within these linguistic contexts. The dataset is open-source and available for exploration. Experts across seven sensitive domains, such as culture, religion, and employment, annotated these queries with ten domain-specific topics, five generative AI themes (e.g., public interest, misinformation), and 13 sensitive characteristics (e.g., age, tribe) relevant to the African context. Health (2,076 queries) and education (1,469 queries) were the most prominent domains, with chronic disease (373) and education assessment and measurement (245) as top topics, respectively. Approximately 80% of queries addressed misinformation, disinformation, stereotypes, or public welfare topics like health and law. Queries primarily concerned social groups defined by gender (e.g., “Chibok girls”), age (e.g., “newborns”), religion or belief (e.g., “Traditional African” religions), and education level (e.g., “uneducated”).

Amplify3_Results

Analyze the distribution of queries per thematic area and domain across all participating countries.

The dataset captures unique local concerns, concepts, and social groups. It includes adversarial queries rooted in local contexts, misconceptions, and fallacies. For instance, a query addresses Ugandan women consuming a specific clay during pregnancy, a prevalent cultural practice with potential health risks. Diverse cultural nuances within this dataset enhance AI models, enabling them to accurately detect and respond to a wide array of populations.

Amplify Initiative Expands Globally

Building trust with global communities is paramount to Amplify Initiative’s mission. The initiative is scaling its pilot program into Latin America and South and Southeast Asia, partnering with Universidade Federal de Minas Gerais in Brazil and Indian Institute of Technology Kharagpur in India. Future efforts involve collecting and validating data on critical, localized issues unaddressed by current AI models. The app will enable regional experts to prompt Gemini on salient issues in their languages and countries, refining generated responses to incorporate contextual information missing in existing AI models. By facilitating collaboration between domain experts and Gemini, Amplify Initiative can identify and bridge global data gaps on crucial topics, from crop selection for Brazilian farmers to the value of education for girls in India.

play silent looping videopause silent looping video
unmute videomute video

See a demonstration of the new feature using Gemini within Amplify Initiative's web application.

Join the Amplify Initiative

Amplify Initiative empowers communities worldwide, placing them at the forefront of the next AI innovation wave. To learn more or get involved in your country, express your interest here.

Acknowledgements

This project's success stems from the enduring partnership and leadership of Joyce Nakatumba-Nabende and Rehema Baguma from Makerere University. Essential contributions to the platform ecosystem development came from Andrew Katumba, Chodrine Mutebi, Jagen Marvin, Eric Peter Wairagala, Mugizi Bruce, Peter Oketta, Lawrence Nderu, Obichi Obiajunwa, Abigail Oppong, and Michael Zimba. Crucial effort and leadership from Erin van Liemt, Amber Ebinama, Tiffany Shih, Adam Forbes, Karla Barrios Ramos, Madhurima Maji, Aishwarya Verma, Charu Kalia, and Alexandre Zanoni from Google Research were invaluable. We appreciate the continuous support and guidance from Jamila Smith-Loud, Tiffany Deng, Saška Mojsilović, and Marian Croak. Finally, we recognize the data authors listed in the Amplify Initiative paper, whose contributions were fundamental to the dataset's creation.

أضف تفاعلك على هذا المقال

Commentaires

عدد التعليقات : 0