This is the transcript of the talk Building a Multilingual Command Line prepared for DebConf 2020 on 25th August, 2020

DebConf 2020

Good day/Good afternoon/Good evening to all depending on your timezone.

First of all, I am thankful to the organizers for giving me an opportunity to present this talk for DebConf 2020. I would also like to acknowledge the team for incorporating the rainbow colors to this year's DebConf logo.

I am John Samuel, and today I would like to talk about Building a Multilingual Command Line.

The goal of my talk is to look at the current implementations of the command line and explore one of the ways to Simplify command line, focusing on natural language interface with multilingual support.

Before continuing, I would like to ask the first question Is command line Simple?

I would like you to think about the first few days when you started using Linux or other Unix based systems when your instructors or professors told you to memorize some mnemonics or commands to interact with the command line. In the next couple of slides, I would like to show a very small subset of simple and commonly used commands to perform some basic functions for handling files, processes or networks.

Starting with files, I am very sure that all of us listening to this talk use ls at least once a day. But what is ls? It's a command to list files. Or possibly a mnemonic to list. This was fine. The first command memorized.

$ ls : list files

Now, let's move on to the second command. How can I create a blank file? There are several ways, of course. However, one that's often suggested is touch. How can a newcomer ever remember touch?

$ touch : create a blank file

Moving on to the directories, the first command you may have learned was to create a directory. Here comes mkdir.

$ mkdir : create a directory

This looks fine, coming from two words make and directory, hence the mnemonic mkdir.

Now comes the command to change a directory. Going by the logic of the command mkdir, it should have been chdir or cgdir. But No!, it's

$ cd : change a directory

I am wondering, after how many repititions I had memorized this command.

Enough with files and directories, now let's move on to the processes. How can I see all the processes? It's simple. It should be lsp or lspr. But once again I am disappointed. It's ps. Ok Fine! ps and process are pretty close.

$ ps : list processes

But you do learn that though ps is interesting, the command that may be quite useful is top, which can give you a better view of the processes.

$ top : list processes

The man page of top says it 'displays processes'. One may try to associate it with 'top running processes' to memorize this command.

Moving on from processes, I wonder lsnet would correspond to see all the network connections. Again, I was wrong. It is

$ netstat : list network connections

However, I think I was lucky to guess the command for listing all the hardware. Indeed it is

$ lshw : list hardware

Again for CPU, the guess was correct, it is

$ lscpu : display information about cpu architecure

But what about getting the time and date. I was not disappointed and I could memorize this simple command.

$ date : display or set date

I think, it's enough with these different commands. I would like to now show you a summary of the commands that we just saw. Is there any pattern that we can see, that can reduce the learning curve of newcomers. We do see that some commands use ls, which can simplify our learning process, but it may not be true for all the commands.

ls: list files
touch: create a blank file
mkdir: create a directory
cd: change a directory
ps: list processes
top: list processes
netstat: list network connections
lshw: list hardware
lscpu: display information about cpu architecture
date: print or set date
...:

As if remembering the commands was not enough, when the the professors told us about the options.

Options : What about them?

I want to give examples of some very commonly used options of the command line.

-v : Is it verbose or is it version?

If I type -v with python or tar, what would be the possible output?

What about

-r : Is it reverse or is it recursive?

And what about - or --

help : Is it -h or is it --h? Is it -help?

A lot has changed since these commands were initially developed. The problems of-h or --h are now pretty much well handled by a good support from standard libraries. This has helped towards the Standardization of commands.

I wish to give a couple of examples of the Support from standard libraries

  • C: getopt(), getopt_long()
  • Python: argparse()

These functions or methods can now be used to ensure there is only one hyphen with the short version, i.e., -h and there are -- for the long version of an option, therefore we have --help.

For example, in C: getopt(), getopt_long() functions can be used to easily create commands with

  • short and long options
  • optional arguments

And in Python: argparse(), the developers can easily create commands with

  • short and long options
  • one or more optional arguments
  • specify data types of arguments
  • subcommands

There are a lot of ways by which further enhancement of command line can be made possible. Most of you may have heard about bash, ksh, zsh etc. There is another shell called fish that incorporates colors to the command line making a clear distinction between commands and their options.

Yet, we need to see works to add progress bars to visualize the progress of a command in action. Currently, it is mostly handled by the command itself.

Multilingual documentation has been suggested to help improve the learning curve of command-line users. Unfortunately, this is mostly limited to a couple of languages and a very few commands.

This brings me to the second part of my talk. Instead of focusing only on multilingual documentation, what if we ask: But why not multilingual commands and options?. What if native-language support is brought to the command-line, where people can write commands in their native languages?.

What are Multilingual Commands. Taking the example of three languages: English, French and Malayalam, I want to illustrate multilingual commands or imperative sentences to list all the processes.

  • list all the processes
  • affiche tous les processus
  • പ്രക്രിയകൾ കാണിക്കുക

Please note that I have highlighted the action verbs in these three sentences. Note their position. Note that the action verb may not come in the first position. Look at the third example.

Taking the example of Bash, a shell and command language for Unix/Linux based systems, let's explore how a shell can be extended to give a multilingual interface.

  1. Solution 1: Ask the developers and maintainers to support their tools, applications or commands in multiple languages and release executables in multiple languages.
  2. Solution 2: Modify the shell (e.g., Bash) source code to allow accepting translations of existing commands, in such a manner that when a user types a command in their native language, the shell searches for this command in all the translations and if it finds one, checks whether the translation is mapped to any existing command.
  3. Solution 3: Extend the shell (e.g., Bash) in a transparent manner that does not require neither any modification to the shell nor any modification to the individual applications.

Solutions 1 and 2 require a tremendous amount of work for translating all the existing commands, their arguments (or options), etc. In real-life, however, we do not use all the available commands. Why not take a look at these regularly-used commands and build a transparent multilingual solution using these commands. And this is what brings us to the Solution 3. This is what I am going to talk about in the rest of my talk.

Firstly, I would like to explore how we can get some inspiration from other domains and their approaches in reducing the learning curve of the newcomers.

Let's consider REST API (Application Programming Interface) of web services. The REST architectural style has ensured a lot of simplification in creating new services and the way by which we can manipulate resources. When I say 'resource', I mean online resources like user profiles and associated data on social networking sites, resources on the machines like process, network, memory, files. Interestingly, these are some of the actions or vocabulary popularised by REST web services.

  • Create C
  • Read R
  • Update U
  • Delete D
  • List L

Continuing with the example of command line, let us see how some commands may look like in English:

action resource
create file
read file
update file
delete file
list file
(action)... file

Continuing with the example of command line, let us see how the same commands may look like in French:

action resource
créer fichier
afficher fichier
modifier fichier
supprimer fichier
lister fichier
(action)... fichier

In case of English and French, imperatives (or commands) are of the form Verb + Subject + Object, Verb + Object or just Verb. However, these imperative sentences follow a different order in languages like Malayalam, as we saw in the previous slide, where the resource comes before the verb. Thus, the word order cannot be assumed, and it's important to be flexible. Figure 1 shows the first possibility, where the verb comes first and the resource(s) come at the second place.

commands and actions
Fig 1: Command to list files, directories, processes or network connections

Other possibility of commands is shown in Figure 2, where the resource comes in the first place.

commands and actions
Fig 2: Command to create, show, delete and list files. Object comes in the first position.

While working with the command-line, taking a look at some of the commands that people normally use, we can obtain some possible objects and the actions on them as shown in Figure 3.

commands and actions
Fig 3: Command to list files, directories, processes or network connections. Object comes in the first position.

For those of you who are further interested in this topic, I would suggest you to take a look at Firefox Ubiquity.

So to conclude, we need to have the followig groups of multilingual commands. In the first group

Multilingual command: action + resource

i.e., the action verb comes first before the resource.

In the second group,

Multilingual command: resource + action

i.e., the action verb comes first after the resource.

Finally, in the third group

Multilingual command: action + resource + options

Multilingual command: resource + action + options

i.e., resource or action in any order, but with options.

With these groups in mind, we can now move to the third part of this talk, i.e., the Development of these ideas.

Let's explore a very simple way to implement this idea. We want to implement commands as shown in Figure 1, i.e., the action verb comes first and the object comes in the second place.

The first is a very basic solution, i.e., using aliases. For testing these commands, you just have to add these commands in .bashrc file in the home directory. The idea is to create a new word combining action words and resources.

alias listfile="ls"
alias createfile="touch"
alias deletefile="rm"
alias showfile="cat"

And I am sure, many of you listening to this talk have many complex commands using aliases. The idea here is that the user needs to remember only the action verbs: list, create, delete and show. And they can concatenate the resource with these commands.

Thus with this basic solution of using aliases, the user wishing to manipulate directories can simply write listdirectory, createdirectory, deletedirectory, showdirectory etc.

alias listdirectory="ls"
alias createdirectory="mkdir"
alias deletedirectory="rmdir"
alias showdirectory="ls"

However, some may see that the presence of big commands like listdirectory, createfile as not a very interesting approach since these words do not exist in the dictionary. We wish to create commands very close to human language.

So, the second approach is to separate action words and resources, so that the user can run the following type of commands.

  • create directory dir1
  • show directory dir1
  • delete directory dir1
  • create file file1
  • show file file1
  • ...

So the next solution is based on functions

function deleteaction() {
  count=$#
  if [[ $1 == "file" ]]
  then
    shift
    rm $@
  elif [[ $1 == "directory" ]]
  then
    shift
    rmdir $@
  fi
}

alias delete="deleteaction"

The above code shows how alias has been used to call a function. Though it shows a way to delete files and directories, it can be used similarly for working with network connections, processes etc. Now we can run the following commands

  • delete directory dir1
  • delete file f1

But what if we want to repeat this for French language?

function supprimeraction() {
  count=$#
  if [[ $1 == "fichier" ]]
  then
    shift
    rm $@
  elif [[ $1 == "répertoire" ]]
  then
    shift
    rmdir $@
  fi
}

alias supprimer="supprimer"

As you can see that the code is similar to that in the English language and we have replaced the action words and resources by their translations in the French language. Now we can run the following commands in the French language.

  • supprimer répertoire rép1
  • supprimer fichier f1

But what if we want to repeat this Malayalam. If you remember, in the case of Malayalam, the resource comes in the first position. Hence there is a slight change in the code. Notice that we have created functions for manipulating directories: creation and deletion using mkdir and rmdir.

function ഡയറക്ടറിപ്രവർത്തനങ്ങൾ() {
  count=$#
  if [[ $1 == "സൃഷ്ടിക്കുക" ]]
  then
    shift
    mkdir $@
  elif [[ $1 == "ഇല്ലാതാക്കുക" ]]
  then
    shift
    rmdir $@
  fi
}

alias ഡയറക്ടറി="ഡയറക്ടറിപ്രവർത്തനങ്ങൾ"

With these functions, we have succeeded in creating multilingual commands transparently using existing commands.

  • supprimer répertoire rép1
  • supprimer fichier f1
  • ഡയറക്ടറി സൃഷ്ടിക്കുക ഡ1
  • ഡയറക്ടറി ഇല്ലാതാക്കുക ഡ1

You may ask these are very long commands. But, What about shorter commands and options?.

As you may observe that the function has been slightly changed to support shorter option. And what if you want more options. Yes, it's possible. You can make use of the count variable to get the count of options or arguments passed by the user and $1, $2, $3... etc. to handle each argument.

function supprimeraction() {
  count=$#
  if [[ $1 == "f" ]]
  then
    shift
    rm $@
  elif [[ $1 == "r" ]]
  then
    shift
    rmdir $@
  fi
}

alias s="supprimeraction"

Here we are with shorter commands for the French users. Recall that the French users can easily associate s with supprimer (or delete in English), f with fichier (or file in English), r with répertoire (or directory in English)

  • s r rép1
  • s f f1

Finally, I want to come to the final part of my talk. What can be the possible role of the Debian Community in the future?

Because there are 7,000 languages and even if we decide to focus only on 100 to 300 languages with the most number of speakers, it is difficult to ensure the development, and translation of all commands, tools and documentation. Sharing and Collaboration is the way to ensure greater availability of multilingual command line to the users. Here I am describing them:

  1. The users can develop command-line configuration files for their language or languages by making use of the existing commands as discussed in this presentation
  2. They can share these multilingual configuration files with other users using open source licenses.
  3. Debian community can further imagine collaborating towards a flexible solution which not only promotes multiple commands and tools, but also ensures the flexibility to the users in choosing their favorite commands.

There are several possible ways by which a multilingual command-line can be built. This talk discussed some possible solutions, but focused in detail one particular solution, where existing commands need not be modified or translated. This possible transparent solution may be helpful to reduce the learning curve of students who are very new to the command-line. However, a detailed user evaluation is still required to understand whether these changes are indeed helpful to non-English speaking users. There are efforts like osquery that aim to simplify the command line using SPARQL-like commands. Some researchers have been exploring a natural language interface for the command line. However, these works are still focused on English language and do not consider the linguistic diversity of the world. Finally, future open source solutions need to be multilingual by design.

References

These are some of the references for this talk. You can also take a look at the dotfiles repository that I recently created. If you are interested, you can contribute to it or you can share your own to the open-source community.

  1. dotfiles https://github.com/johnsamuelwrites/dotfiles
  2. Command Line Interface https://en.wikipedia.org/wiki/Command-line_interface
  3. Python Argparse https://docs.python.org/3/library/argparse.html
  4. Rethinking the command line, John Samuel, Capitole du Libre, Toulouse, France, November 19, 2017
  5. .bashrc https://linux.die.net/man/1/bash
  6. Bash Startup Files http://www.gnu.org/software/bash/manual/html_node/Bash-Startup-Files.html
  7. Ubiquity https://mozillalabs.com/ubiquity/
  8. FEATURE: The linguistic command line Aza Raskin, Interactions - Toward a model of innovation, Volume 15 Issue 1, January + February 2008, Pages 19-22
  9. Ubiquity: Designing a Multilingual Natural Language Interface Michael Yoshitaka Erlewine, SIGIR Workshop on Information Access in a Multilingual World, July 23, 2009

Thank you once again for listening to my talk and giving me this opportunity. I hope that this talk may inspire current and future open-source contributors, especially bilingual or multilingual contributors to explore multilingual solutions.

If you have any questions or remarks or if you want to point to some interesting other interesting works, please do not hesitate to contact me.