Phones-to-Words Challenge III: Comparing runtimes of Bash and Emacs Lisp

So after having solved the study's challenge in both Emacs Lisp and Bash, how do they compare in speed?

Lispers' versions, in the year 2000, took between 19s and 78s. I can't easily emulate year 2000, but I can reach for an old machine and at least compare my solutions among themselves.

All of my Bash ones work — but boy, how slow they are! And how much faster the Elisp solution is.

As much as Bash pleases me, Elisp is the clear winner in speed. Not that I expected otherwise, but the magnitude of the difference surprised me quite a bit.

Here's my simple benchmark script:

#!/usr/bin/env bash
#
# SPDX-FileCopyrightText:  © flandrew
# SPDX-License-Identifier: GPL-3.0-or-later
#
# Benchmark script for Bash solutions.

# Note: default TIMEFORMAT is: $'\nreal\t%3lR\nuser\t%3lU\nsys\t%3lS'

bench1()   { display="$2"
             set -- "./$1" "${3:-in/sample-input}" "${4:-in/sample-dict}"
             case "$display" in
               # Show(display), Count(lines), Nothing(no display)
               -s) sep; echo -e "[${1:2}]"
                   TIMEFORMAT="[Time: %R]"; time "$@"                       ;;
               -c) TIMEFORMAT="${1:2}  %R"; time "$@" | wc -l | tr '\n' '\t';;
               -n) TIMEFORMAT="${1:2}  %R"; time "$@" >/dev/null            ;;
               *)  echo "Must have an option" >&2; exit 1
             esac ;}

benchAll() { for script in p2w-*sh
             do  bench1 "$script" "$@"
             done ;}

benchAll "$@"
exit 0

Having tested them before, I ran this:

bench.sh -s in/sample-input     in/sample-dict     # All good.
bench.sh -c in/nothing          in/dictionary.txt
bench.sh -c in/sample-input     in/dictionary.txt
bench.sh -c in/sample-input-5x  in/dictionary.txt
bench.sh -c in/input.txt        in/dictionary.txt

(File sizes, for reference)

File Lines
nothing 0
sample-input 8
sample-input-5x 40
input 1000
sample-dict 23
dictionary 73113

And from that, I built this table:

  Input ==> sample-in nothing sample-in sample-x5 in in in in
  Dict ==> sample-dict dict dict dict dict dict dict dict
  Hits ==> 12 0 16 80 262 262 262 262
Lang Unit ==> seconds-->         norm. ms/ph# norm.
.elc hash-table 0.03 15.96 16.32 16.39 17.27 1.00 1.3 1.00
.sh aarray-many 1.08 6.38 8.92 16.95 776 45 770 588
  aarray 0.97 6.80 10.38 23.31 1122 65 1115 852
  fsystem-ascii 47.65 46.81 48.74 51.48 465 27 418 319
  grep 1.23 1.16 9.87 45.34 6618 383 6617 5055
  look-parallel 1.24 1.36 2.24 5.36 326 19 325 248
  look 1.10 1.09 2.49 8.08 672 39 671 513

Notes:

  1. For proper comparison, times for fsystem-ascii are considering the re-creation of the directories each time. Although it takes less than a minute, deleting takes much longer, and you'd probably prefer to do it once. The code detects whether it already exists.
  2. aarray-many is running with its default of 10 associative arrays, whereas aarray uses only one.

So my Elisp solution is 19× as fast as that of my fastest (look+parallel) Bash solution: 17s vs. 326s.

But this is taking into account the initial storage in a hash table (or array, or filesystem, or variable, or whatever). If we subtract that initial cost and divide by the number of phones to get the time per phone... wow. If we store the hashtable of the dictionary in a variable, it's about 250× the speed of Bash using look+parallel (1.3 vs. 325 milliseconds per phone number). The storage time is there at the "nothing" column, where we use an empty input — so all time spent there is storing time, and the rest is actually looking up the numbers.

By the way, note that the 2nd place in speed goes to the crazy I-know-I-know:-let's-use-a-zillion-subdirectories-as-a-hashtable! solution. Without parallelizing look, it'd've been the fastest Bash solution. Perhaps not so crazy, after all.

Closing thoughts

Part of Bash's slowness is truly a... well-known bug of the language:

man bash | grep -A1 ^BUGS
BUGS
       It's too big and too slow.

And yet, in a way, the comparison in this particular case is a bit unfair, because we aren't comparing the same algorithms. So on top of whatever speed problems Bash may have, his arrays and greps and even binary look with look can't quite compete with Elisp's hash table. (Although strictly speaking grep and look aren't really Bash, so this is already a concession.) And Elisp can be compiled, whereas Bash is interpreted.

But then again, it is fair, because if Bash doesn't have something to compete with hash tables for a problem that asks for hash tables, then... well.

But maybe you are not too interested in speed. If you are going to run this 1000-phone list only once, then who cares if it takes 20 seconds or 20 minutes? If you are going to run it once in a while over a dozen phones, then also, who cares? Then your considerations are other — in which case, pick whatever language meets whatever other requirements you have, one of which may be how much you enjoy using it.

Here I enjoyed both, each in its own particular way.

Bash's speed issues truly put a cap on what can be done with the language. Nevertheless, it offers more flexibility than people realize. And conciseness, if you have some idea of what you're doing.

But it's all strings, so it's not for the faint of heart. Some would probably be horrified by my abuse of brace expansion to deal with trees (I'm looking at you, Haskeller friends!). And they have a point: it's not the "proper" way. But sometimes the improper way does what you need. Not proper, but still suitable.

Another point is that dealing with "lesser" tools exercises your creativity. I was staring at Bash with curiosity: "Ok, Bash, you don't have hash tables. What have you got? If I had nothing but you, what would you recommend me?" So rather than feeling that I was wasting my time with a "lesser tool", I enjoyed working with the constraints it imposed on me. I also found the code there beautiful, in its own Bash-y way.

And this concludes my analysis. Try the exercise in your favorite languages and tell us what you find out.