The Tower of Babel No More: Multilingual AI's Role in Code Generation
Andrej Karpathy recently said [1,2], "English is the hottest new programming language." His statement specifically pertains to our interactions with large language models and other generative AI systems. This observation has led me to ponder the profound implications of this assertion. English has long reigned as the internet's dominant language, surpassing all others in historical content abundance. This vast reservoir of internet data has been instrumental in shaping the capabilities of generative AI models, enabling them to excel in tasks such as summarization, question answering, and even code generation across the web.
Many may question the assertiveness of his statement, yet prompt engineering is becoming increasingly popular. Numerous instances illustrate how English can be harnessed to produce code in various programming languages [3,4]. Setting aside debates about the quality and feasibility of such code, it's evident that the proficiency of these generative AI models has been advancing at a remarkable pace. Moreover, an increasing number of generative AI models [5] are being introduced into the field, expanding the horizons of what can be achieved.
However, I have invested significant time in exploring the potential for human-computer communication in natural languages or our native tongues. To dive deeper into this subject, it's crucial to differentiate between programming languages [6] and human or natural languages [7]. I view programming languages as constructed languages specifically crafted to simplify communication with machines. Effectively interacting with machines necessitates the use of bits, bytes, and machine operation codes. Programming languages introduce a layer of abstraction by providing us with English-like keywords such as "if," "while," "do," and others, greatly facilitating our interaction [8]. Moreover, compilers and interpreters play a vital role in translating this human-like programming language into machine code that machines can comprehend. Human languages, encompassing oral, sign, and written forms, are just one of the many ways humans communicate with each other.
Over time, efforts have been made to streamline programming languages, aiming to make machine communication more accessible to a broader audience [9,10]. Various programming paradigms [11] have been devised to clarify programming concepts. However, the ultimate aspiration in programming language and artificial language research has been to enable communication with machines in a manner closely resembling human languages [12,13,14].