Efforts to make mobile and web applications accessible in users' local languages are on the rise. These endeavors often involve translating user interfaces and essential documentation into multiple human languages. In today's world, where internet usage and mobile device adoption continue to surge, digital literacy has become a necessity. It demands that we, as users, not only understand and use our digital devices but also communicate with them effectively.

However, a fundamental question persists: What about the individuals responsible for building these devices and crafting the software that powers them? Can they develop using their local languages? Is it conceivable to create a programming language that empowers developers to communicate with machines in any human language?

This inquiry delves into the evolving landscape of technology and programming, exploring the challenges and opportunities presented by the quest for multilingual programming and the potential for a future where language barriers in software development may be overcome.

Introduction

Language literacy typically encompasses the ability to read, write, and articulate one's emotions or requirements in one or multiple human languages to facilitate communication with fellow human beings. With the increasing prevalence of mobile phones and computers, it is evident that these devices have become indispensable in our daily lives. Internet usage is on the rise globally, and nearly every facet of our existence is now conducted online via the internet. Human languages enable us to communicate with each other, while developers employ programming languages to interact with machines.

Understanding the various methods through which human users communicate with machines, particularly computers, is crucial. Graphical user interfaces (GUI) and command-line interfaces are among the commonly employed methods. GUIs offer users a menu with a range of available options, allowing them to select their preferred option using either a mouse or keyboard shortcuts. Recognizing that many software, including mobile and web applications, are used by non-English-speaking users, developers have begun exploring ways to make their applications accessible in users' local languages.

Localization and internationalization

There is currently a noticeable trend among application developers and service providers to offer user interfaces in local languages, enabling users to engage with applications in their native tongues. Localization and internationalization efforts have gained significant importance in application development. It is important to note, however, that the number of supported languages is considerably smaller when compared to the nearly 7,000 human languages in existence. Translations are typically available for languages spoken by a large majority of people, leaving thousands of less common languages underserved on the internet.

In 2019, we celebrated the International Year of Indigenous Languages, which elevated the discussion of indigenous languages to an international level. This year-long campaign drew attention to the pressing need to preserve many languages from the brink of extinction. It's imperative for both digital and literacy campaigns to recognize that no language should face extinction in this digital age.

Significant progress has been made in enhancing the internationalization and localization of applications. Text encoding standards have evolved from ASCII to Unicode, enabling support for a wider array of world languages. Inbuilt localization libraries now ensure that information is presented in a comprehensible format for users.

Creating a multilingual user experience involves two major aspects: the human-computer interface and the programming language. Achieving a multilingual experience through the human-computer interface typically entails expressing information in multiple human languages. This means that application developers must provide the user interface in both their native language and other languages spoken by their users.

The standardization of internationalization and localization has made it easier to translate user interfaces into one or more languages, sometimes without the need for specialized software development expertise. However, the importance of addressing the second aspect, the programming language, is often underestimated.

Translation efforts for user interfaces typically focus on making software and services accessible to their users, but what about the developers behind these software and their associated interfaces? It's been observed that many programming languages utilize keywords that resemble English [1]. Even the command line [2], with its subcommands and associated parameters, heavily relies on English words, abbreviations, and mnemonics constructed from English words. This can pose challenges for developers whose primary language is not English, as they may struggle to quickly grasp the rationale behind certain abbreviations. For example, understanding that "ls" stands for "list," "cd" stands for "change directory," and "-o" stands for "output" is more straightforward for English speakers who can easily memorize these abbreviations.

Towards Multilingual Natural Language Programming

When we look at manuals, blogs, and documentation on programming languages, it's evident that more content developers and programmers are producing informative articles in multiple languages. However, this is distinct from programming in a local human language, as abbreviations and keywords are not typically expressed in these local languages. Instead, developers mentally translate their ideas using the available constructs within a given programming language.

Recognizing this challenge, some programmers have proposed and created new programming languages that utilize keywords in their local languages, referred to as Non-English-based programming languages [3]. This approach necessitates that such a programming language supports keywords, user messages, variables, classes, and function names in the local language. Developers of these non-English-based programming languages can comfortably program in their native language.

However, a critical issue arises here. Software development involves not only writing code but also maintaining it. If individuals reading the code cannot understand the meaning of code they haven't written, it can be quite challenging for them to identify and correct bugs or make modifications. This highlights the importance of striking a balance between linguistic accessibility and code maintainability in software development.

An alternative approach could involve supporting a programming language that utilizes human-language agnostic [4] keywords, potentially in the form of numerical identifiers. However, it's important to recognize that a program is essentially another form of text, where the writer conveys their logic in a concise manner that they can understand, and which can be translated into the underlying machine code.

Using numerical identifiers for keywords, variables, functions, or class names may enhance machine readability, but it often results in code that is cryptic and incomprehensible to humans. To address this challenge, we might consider adding another layer, akin to internationalization efforts, where every numerical identifier is translated into keywords in local languages and replaced with understandable names for variables, functions, or classes. This would bridge the gap between machine readability and human comprehension, offering a potential solution to the issue of language barriers in programming.

The idea of a natural language programming [5, 6, 7], where users can write code in their local language without needing to understand traditional programming keywords, has been a long-standing aspiration in the field of computer science. With the emergence of generative AI models [8], such as large language models [9], we've made significant progress toward this vision. These models have the potential to translate human languages into machine code or existing programming languages, making it more accessible for individuals who are not proficient in traditional programming languages.

In this scenario, users could instruct the machine in their local language to generate code for specific tasks. The machine could then either execute the task directly or produce an executable machine code that the user can run later. This approach could empower new users to work with machine-generated code, even if they don't understand the language in which the original code was written. They could request assistance from the AI model to comprehend the code's purpose, debug errors, or make modifications to adapt it to their needs.

While substantial progress has been made in the development of generative AI models, it's essential to note that not all large language models are multilingual [10, 11, 12]. Some models are trained primarily in specific languages and may not provide robust support for every language in the world. However, ongoing research and advancements in AI continue to move us closer to realizing the vision of natural language programming languages that can bridge language barriers in coding.

Conclusion

In our increasingly digital world, communication isn't limited to interactions among humans; it extends to our interactions with machines. Historically, developers have used programming languages as the bridge between human intent and machine execution. However, many of these programming languages necessitated some level of familiarity with English, either to start programming or to understand the rationale behind the language's choice of words and keywords by its developers.

While creating non-English-based programming languages is a potential solution, it introduces challenges in terms of code maintenance, especially for programmers who lack proficiency in the language underlying the programming language itself.

The emergence of generative AI models, particularly large language models, offers a glimmer of hope for enabling communication with machines in our primary languages. However, it's crucial to acknowledge that not all large language models are entirely multilingual; that is, they may not comprehensively support all human languages.

As we navigate this evolving landscape of multilingual programming, it's clear that the pursuit of more accessible and inclusive technology continues. The future may hold innovative solutions that further bridge the gap between human languages and machine communication, fostering a more inclusive and diverse digital world.

References

  1. Coding Is for Everyone—as Long as You Speak English
  2. Rethinking the command-line
  3. Non-English-based programming languages
  4. Language-independent specification
  5. Natural Language Programming
  6. van der Storm, Tijs, and Jurgen J. Vinju. “Towards Multilingual Programming Environments.” Science of Computer Programming, vol. 97, Jan. 2015, pp. 143–49. ScienceDirect, doi:10.1016/j.scico.2013.11.041
  7. The transition to multilingual programming
  8. Generative artificial intelligence
  9. Large language model
  10. Introducing The World's Largest Open Multilingual Language Model: BLOOM
  11. Introducing LLaMA: A foundational, 65-billion-parameter large language model
  12. List of languages supported by ChatGPT