Efforts to make mobile and web applications accessible in users' local languages are on the
rise.
These endeavors often involve translating user interfaces and essential documentation into
multiple
human languages. In today's world, where internet usage and mobile device adoption continue to
surge, digital literacy has become a necessity. It demands that we, as users, not only
understand
and use our digital devices but also communicate with them effectively.
However, a fundamental question persists: What about the individuals responsible for building
these
devices and crafting the software that powers them? Can they develop using their local
languages? Is
it conceivable to create a programming language that empowers developers to communicate with
machines in any human language?
This inquiry delves into the evolving landscape of technology and programming, exploring the
challenges and opportunities presented by the quest for multilingual programming and the
potential
for a future where language barriers in software development may be overcome.
Introduction
Language literacy typically encompasses the ability to read, write, and articulate one's emotions or
requirements in one or multiple human languages to facilitate communication with fellow human
beings.
With the increasing prevalence of mobile phones and computers, it is evident that these devices have
become indispensable in our daily lives.
Internet usage is on the rise globally, and nearly every facet
of our existence is now conducted online via the internet.
Human languages enable us to communicate with
each other, while developers employ programming languages to interact with machines.
Understanding the various methods through which human users communicate with machines, particularly
computers, is crucial.
Graphical user interfaces (GUI) and command-line interfaces are among the
commonly employed methods.
GUIs offer users a menu with a range of available options, allowing them to
select their preferred option using either a mouse or keyboard shortcuts.
Recognizing that many
software, including mobile and web applications, are used by non-English-speaking users, developers
have
begun exploring ways to make their applications accessible in users' local languages.
Localization and internationalization
There is currently a noticeable trend among application developers and service providers to offer
user
interfaces in local languages, enabling users to engage with applications in their native tongues.
Localization and internationalization efforts have gained significant importance in application
development.
It is important to note, however, that the number of supported languages is considerably
smaller when compared to the nearly 7,000 human languages in existence.
Translations are typically
available for languages spoken by a large majority of people, leaving thousands of less common
languages
underserved on the internet.
In 2019, we celebrated the International Year of Indigenous Languages, which elevated the discussion
of
indigenous languages to an international level.
This year-long campaign drew attention to the pressing
need to preserve many languages from the brink of extinction.
It's imperative for both digital and
literacy campaigns to recognize that no language should face extinction in this digital age.
Significant progress has been made in enhancing the internationalization and localization of
applications. Text encoding standards have evolved from ASCII to Unicode, enabling support for a
wider
array of world languages. Inbuilt localization libraries now ensure that information is presented in
a
comprehensible format for users.
Creating a multilingual user experience involves two major aspects: the human-computer interface and
the
programming language. Achieving a multilingual experience through the human-computer interface
typically
entails expressing information in multiple human languages. This means that application developers
must
provide the user interface in both their native language and other languages spoken by their users.
The standardization of internationalization and localization has made it easier to translate user
interfaces into one or more languages, sometimes without the need for specialized software
development
expertise. However, the importance of addressing the second aspect, the programming language, is
often
underestimated.
Translation efforts for user interfaces typically focus on making software and services accessible
to
their users, but what about the developers behind these software and their associated interfaces?
It's
been observed that many programming languages utilize keywords that resemble English [1]. Even the
command line [2], with its subcommands and associated parameters, heavily relies on English words,
abbreviations, and mnemonics constructed from English words. This can pose challenges for developers
whose primary language is not English, as they may struggle to quickly grasp the rationale behind
certain abbreviations. For example, understanding that "ls" stands for "list,"
"cd"
stands for "change directory," and "-o" stands for "output" is more
straightforward
for English speakers who can easily memorize these abbreviations.
Towards Multilingual Natural Language Programming
When we look at manuals, blogs, and documentation on programming languages, it's evident that more
content developers and programmers are producing informative articles in multiple languages.
However,
this is distinct from programming in a local human language, as abbreviations and keywords are not
typically expressed in these local languages. Instead, developers mentally translate their ideas
using
the available constructs within a given programming language.
Recognizing this challenge, some programmers have proposed and created new programming languages
that
utilize keywords in their local languages, referred to as Non-English-based programming languages
[3].
This
approach necessitates that such a programming language supports keywords, user messages, variables,
classes, and function names in the local language. Developers of these non-English-based programming
languages can comfortably program in their native language.
However, a critical issue arises here. Software development involves not only writing code but also
maintaining it. If individuals reading the code cannot understand the meaning of code they haven't
written, it can be quite challenging for them to identify and correct bugs or make modifications.
This
highlights the importance of striking a balance between linguistic accessibility and code
maintainability in software development.
An alternative approach could involve supporting a programming language that utilizes human-language
agnostic [4] keywords, potentially in the form of numerical identifiers. However, it's important to
recognize that a program is essentially another form of text, where the writer conveys their logic
in a
concise manner that they can understand, and which can be translated into the underlying machine
code.
Using numerical identifiers for keywords, variables, functions, or class names may enhance machine
readability, but it often results in code that is cryptic and incomprehensible to humans. To address
this challenge, we might consider adding another layer, akin to internationalization efforts, where
every numerical identifier is translated into keywords in local languages and replaced with
understandable names for variables, functions, or classes. This would bridge the gap between machine
readability and human comprehension, offering a potential solution to the issue of language barriers
in
programming.
The idea of a natural language programming [5, 6, 7], where users can write code in their local
language
without needing to understand traditional programming keywords, has been a long-standing aspiration
in
the field of computer science. With the emergence of generative AI models [8], such as large
language
models [9], we've made significant progress toward this vision. These models have the potential to
translate
human languages into machine code or existing programming languages, making it more accessible for
individuals who are not proficient in traditional programming languages.
In this scenario, users could instruct the machine in their local language to generate code for
specific
tasks. The machine could then either execute the task directly or produce an executable machine code
that the user can run later. This approach could empower new users to work with machine-generated
code,
even if they don't understand the language in which the original code was written. They could
request
assistance from the AI model to comprehend the code's purpose, debug errors, or make modifications
to
adapt it to their needs.
While substantial progress has been made in the development of generative AI models, it's essential
to
note that not all large language models are multilingual [10, 11, 12]. Some models are trained
primarily
in specific
languages and may not provide robust support for every language in the world. However, ongoing
research
and advancements in AI continue to move us closer to realizing the vision of natural language
programming languages that can bridge language barriers in coding.
Conclusion
In our increasingly digital world, communication isn't limited to interactions among humans; it
extends
to our interactions with machines. Historically, developers have used programming languages as
the
bridge between human intent and machine execution. However, many of these programming languages
necessitated some level of familiarity with English, either to start programming or to
understand
the
rationale behind the language's choice of words and keywords by its developers.
While creating non-English-based programming languages is a potential solution, it introduces
challenges
in terms of code maintenance, especially for programmers who lack proficiency in the language
underlying
the programming language itself.
The emergence of generative AI models, particularly large language models, offers a glimmer of
hope
for
enabling communication with machines in our primary languages. However, it's crucial to
acknowledge
that
not all large language models are entirely multilingual; that is, they may not comprehensively
support
all human languages.
As we navigate this evolving landscape of multilingual programming, it's clear that the pursuit
of
more
accessible and inclusive technology continues. The future may hold innovative solutions that
further
bridge the gap between human languages and machine communication, fostering a more inclusive and
diverse
digital world.
References
-
Coding
Is for Everyone—as Long as You Speak English
-
Rethinking the command-line
-
Non-English-based
programming languages
-
Language-independent
specification
-
Natural Language
Programming
-
van der Storm, Tijs, and Jurgen J. Vinju. "Towards Multilingual Programming Environments."
Science of Computer Programming, vol. 97, Jan. 2015, pp. 143–49. ScienceDirect,
doi:10.1016/j.scico.2013.11.041
-
The
transition to multilingual programming
-
Generative artificial
intelligence
-
Large language model
-
Introducing The World's Largest Open Multilingual
Language Model: BLOOM
-
Introducing LLaMA: A
foundational, 65-billion-parameter large language model
-
List of languages
supported by ChatGPT