The Tower of Babel No More: Multilingual AI's Role in Code Generation

Andrej Karpathy recently said [1,2], "English is the hottest new programming language." His statement specifically pertains to our interactions with large language models and other generative AI systems. This observation has led me to ponder the profound implications of this assertion. English has long reigned as the internet's dominant language, surpassing all others in historical content abundance. This vast reservoir of internet data has been instrumental in shaping the capabilities of generative AI models, enabling them to excel in tasks such as summarization, question answering, and even code generation across the web.

Many may question the assertiveness of his statement, yet prompt engineering is becoming increasingly popular. Numerous instances illustrate how English can be harnessed to produce code in various programming languages [3,4]. Setting aside debates about the quality and feasibility of such code, it's evident that the proficiency of these generative AI models has been advancing at a remarkable pace. Moreover, an increasing number of generative AI models [5] are being introduced into the field, expanding the horizons of what can be achieved.

However, I have invested significant time in exploring the potential for human-computer communication in natural languages or our native tongues. To dive deeper into this subject, it's crucial to differentiate between programming languages [6] and human or natural languages [7]. I view programming languages as constructed languages specifically crafted to simplify communication with machines. Effectively interacting with machines necessitates the use of bits, bytes, and machine operation codes. Programming languages introduce a layer of abstraction by providing us with English-like keywords such as "if," "while," "do," and others, greatly facilitating our interaction [8]. Moreover, compilers and interpreters play a vital role in translating this human-like programming language into machine code that machines can comprehend. Human languages, encompassing oral, sign, and written forms, are just one of the many ways humans communicate with each other.

Over time, efforts have been made to streamline programming languages, aiming to make machine communication more accessible to a broader audience [9,10]. Various programming paradigms [11] have been devised to clarify programming concepts. However, the ultimate aspiration in programming language and artificial language research has been to enable communication with machines in a manner closely resembling human languages [12,13,14].

Multilingual Generative AI models

With the advent of generative AI models, it seems we've collectively set aside the rigid confines of grammar and traditional programming languages. In just a matter of months since the public release of ChatGPT [15], we've gained the ability to converse with AI models in English and request them to generate code for intricate tasks across a spectrum of programming languages. In the blink of an eye, we've transitioned from meticulously crafting code in programming languages to simply conversing with AI models.

Fortunately, large language models have also brought benefits to languages other than English. The horizon of multilingual support is rapidly expanding. ChatGPT, Meta and Bard now extend their capabilities to encompass a multitude of human languages, numbering in the hundreds [16,17,18]. I've been actively experimenting with some of these languages to assess their code generation capabilities. So far, I'm rather happy, especially considering that we're still in the early stages of these multilingual large language models. However, our mission is far from complete. As we celebrate the International Decade of Indigenous Languages [19], it's important to keep in mind that there are roughly 7,000 human languages [20], each with its unique dialects. Our work remains unfinished until every individual on this planet can communicate with computers or machines in their native tongues.

Conclusion

In closing, the rise of multilingual AI models marks a significant milestone in the evolution of programming. As these models expand their language capabilities, they bring us closer to a world where anyone can communicate with machines in their native tongue. The journey ahead, especially as we celebrate the International Decade of Indigenous Languages, promises to be both challenging and rewarding. It's a journey towards a more inclusive and linguistically diverse digital future, where technology speaks every human language.

References

  1. The hottest new programming language is English
  2. Tech’s hottest new job: AI whisperer. No coding required.
  3. The next programming language is English
  4. Introducing English as the New Programming Language for Apache Spark
  5. Open LLM Leaderboard
  6. Programming language
  7. Language
  8. Coding Is for Everyone—as Long as You Speak English
  9. Unicode
  10. gettext
  11. Programming Paradigm
  12. Language-independent specification
  13. Non-English-based programming languages
  14. Natural Language Programming
  15. Introducing ChatGPT
  16. ChatGPT language support - alpha (web)
  17. Bard’s latest update: more features, languages and countries
  18. No Language Left Behind
  19. Indigenous Languages Decade (2022-2032)
  20. Languages of the world - Interesting facts about languages