Command-Line Interfaces (CLIs), argparse.ArgumentParser and some of my tricks.

Command-Line Interfaces (CLIs) are one of the best ways of providing your programs with useful parameters to customize their execution. If you are not familiar with CLI, in this blog post we will introduce them. Let’s say that you have a program that reads a file, computes something, and then, writes the results into another file. The simplest way of providing those arguments would be:

$ python mycode.py my/inputFile my/outputFile
### mycode.py ###
def doSomething(inputFilename):
    with open(inputFilename) as f:
        return len(f.readlines())

if __name__ == "__main__":
    #Notice that the order of the arguments is important
    inputFilename = sys.argv[1]
    outputFilename = sys.argv[2]

    with open(outputFilename, "w") as f:
        f.write( doSomething(inputFilename))

Although the previous code is perfectly fine for simple cases, as soon as your programs become more complicated, they will require more extra arguments, and at some point, the users will have problems remembering all the possible options and the order in which the arguments should be provided, which could have catastrophic consequences (e.g. overwriting some important file). Moreover, defining optional arguments becomes almost impossible, as each argument should have its particular location in the command arguments.

For these reasons, it is always a good idea to employ an argument parser that allows defining named arguments so that the users can provide them in any order. Moreover, most argument parsers also allow for setting default values to optional arguments, so that the user does not need to care about them. Finally, virtually all argument parsers provide the help functionality, that explains what the arguments are.

For python programs, there are a bunch of different libraries that can be used to build a CLI with ease, argparse being the most popular one. If you are curious about other alternatives, you can check Fergus Boyles previous post on docopt:

There is plenty of documentation and examples about argparse, starting from the official page, so I won’t cover it extensively. Instead, I will just illustrate it with a couple of examples and then I will comment on some of the less known features that I find extremely useful.

The argparse.ArgumentParser is the class employed to implement the parser. Once you have instantiated the parser object (1), you just need to add the arguments that the parser should be expecting (2) and finally, parse the arguments from the command line (3) and provide them to the function (4). The following code can be considered as a prototypical implementation:

def main(db_files, output_directory, threshold, slow_mode):
    #DO SOMETHING
    return something

if __name__ == "__main__":
    import argparse
    # 1) Instantiate an argument parser with a given description and epilog for help
    parser = argparse.ArgumentParser(description="A database searcher engine", epilog='''
    Example
    python -m scripts.run_database_query -t 0.1 -f ~/db_file.sqlite -o data/example_folder''')

    # 2) add arguments to the parser
    parser.add_argument('-t', '--threshold', help='the search threshold', type=float, required=False, default=0.1)
    parser.add_argument('-f', '--db-files', nargs='+', type=str, help='the database file', required=True)
    parser.add_argument('-o', '--output_directory', help='the directory to write the output', nargs=None, required=True)
    parser.add_argument('-s', '--slow-mode', action="store_true", help='activate the slow mode')

    # 3) Parse arguments from the command line
    args = parser.parse_args()

    # 4) Parse arguments from the command line
    main(db_files = args.db_files, output_directory= args.output_directory, threshold= args.threshold, slow_mode= args.slow_mode)    

That could be executed using the following command:

$ python mycode2.py --db-files ~/db_file1.sqlite  ~/db_file2.sqlite --o ./out --threshold 0.1

The full description of the program can be obtained from the command line:

$ python mycode2.py --help
usage: prueba_python.py [-h] [-t THRESHOLD] -f DB_FILES [DB_FILES ...] -o
                        OUTPUT_DIRECTORY [-w WDIR]

A database searcher engine

optional arguments:
  -h, --help            show this help message and exit
  -t THRESHOLD, --threshold THRESHOLD
                        the search threshold
  -f DB_FILES [DB_FILES ...], --db-files DB_FILES [DB_FILES ...]
                        the database file
  -o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
                        the directory to write the output
  -w WDIR, --wdir WDIR  the directory where intermediate results will be
                        computed

Example python -m scripts.run_database_query -t 0.1 -f ~/db_file.sqlite -o
data/example_folder

In section 1) an argparse.ArgumentParser object is created providing some descriptions that will be used when displaying the help.

In section 2), different arguments are added to the parser.

  • The argument threshold will ask the user for an optional argument of type float with flag --threshold VALUE. Since it is not a required argument, if the user does not provide it, then the parser will use the default value, in this case 0.1.
  • The argument db-files, of type string, is required and can be one or several strings (nargs="+").
  • Output_directory is also a string argument, but in this case, only one string can be provided (nargs=None). Notice that although type=str has not been provided, the default behaviour is to consider it as a str.
  • Finally, argument slow-mode, represent a boolean flag. Notice how in this case, instead of a type keywork, action="store_true" is used. In that way, when the users provides the --slow-mode option, the parser will store in the variable slow_mode the value True. There is also a action="store_false" that can be used when setting the argument to False is the desired option.

In section 3, the commands provided by the user through the stdin are parsed and the variables obtained from such process are stored into a Namespace object that I called args. Accessing the values associated with the arguments in the Namespace is as easy as using the dot notation (args.threshold).

Finally, in section 4, the arguments collected in args are used to call the main function of the program.

There are a few important details that you may have not noticed. One of them is that people tend to employ argument names that contain hyphens, but python variable names cannot, so, by convention, the name of the argument is displayed as it is in the command line but when it is stored in the args Namespace, the hyphens are replaced by underscore symbol. (e.g , —slow_mode is replaced by slow_mode in args).

Another tricky detail is that if you want an argument with one single content, you should not use nargs=1 (use nargs=None or leave it as default), otherwise when accessing the args.argument_name you will obtain a list with one single element but not your argument directly.

With respect to some of my tricks when using argparse, it is worth commenting the type=argparse.FileType('r'), which generally is a better option than type=str when some of your arguments are files. This argument type makes arg.argument_name store an opened file object instead of a string. Although this may not seem useful at first glance, its true power is that allows your program to use STDIN (standard input) directly as a source of the file, thus enabling Linux pipes. The typical use case could be using a program that processes text files line by line but we are only interested in processing a subset of them. Although we could modify the program to select the lines internally, it would be easier to use the built-in bash commands such as head, grep, etc., and feed our program with the output of such commands without storing temporary files. Notice that when executing from the command line the ‘python sc.py --myFile -‘ a hyphen is provided instead of a filename to indicate that the input should be read from STDIN.

# Take the 20 first lines and process them with sc.py
$ head -n 20 myFil.txt | python sc.py --myFile -  # '-' to read from  stdin

# Take lines that don't contain the word fake and process them with sc.py
$ grep -v "fake"  myFil.txt | python sc.py --myFile -

My last piece of advice refers to section 4 of the code, the moment in which the parsed arguments are passed to the main function: main(db_files = args.db_files, output_directory= args.output_directory, threshold= args.threshold, slow_mode= args.slow_mode). While this is correct, there is an easier way of doing exploiting the **kwargs argument expansion. Basically, when using **dict within a function call, the pairs of keys and values of the dictionary will be used as keyword arguments for the function call. The following function calls are equivalent.

main(x=1, y=2, z=3)
main( **{"x":1, "y":2, "z":3}) 
main( x=1, **{"y":2, "z":3})

Thus, in order to simplify the call to a simple main(**args_dict), we just need to convert the args Namespace into a dictionary. This can be easily done using the built-in function vars() so the main function call would read as easy as main(**vars(args)).

Bonus track: I have wrapped a bunch of scripts I’ve been using for a while into a package that automatically generates argument parsers from type hints and documentation. This package deserves its own post, so for the moment, I will just share the link https://github.com/rsanchezgarc/argParseFromDoc.

Author