Binary search in two dozen lines of Bash: concise but much too slow
After finding myself unable to use look
without it pointing to an actual dictionary file (in my Solution 5 to Phones-to-Words Challenge II), I went ahead and tried to write binary search in Bash. I did it for the exercise of doing it without looking up the algorithm, just from my idea of what a binary search should be doing.
It works. It's concise. And, as predicted, it's sloooow. It doesn't even outperform regular grep's linear search. I suppose the many necessary tests make it prohibitively costly.
It might have been faster had I used arrays, but given the general slowness even when the hits are few, I didn't want to spend time further improving it. Still, I decided to post it because some of you may like to see it for the algorithm itself or for insights into Bash's syntax.
#!/usr/bin/env bash # # SPDX-FileCopyrightText: © flandrew # SPDX-License-Identifier: GPL-3.0-or-later # # Binary search: this tries to roughly mimic look's behavior. LC_ALL=C matchp() [[ "$(lnkey "$2" "$3")" == "$1"* ]] greatp() [[ "$(lnkey "$2" "$3")" < "$1" ]] lnkey() { lnprn "$1" "$2" ;} lnprn() { sed -n "$2"p <<<"${!1}" ;} binsch() { sz="$(wc -l <<<"${!2}")"; ((min=1, max=sz, pln=min, cln=max/2)) # str="$1" dataname="$2" while ((cln>min)) && ((cln<max)); do if matchp "$@" "$cln"; then ((pln=cln)); while matchp "$@" "$((--cln))"; do :; done lnprn "$2" "$((cln+1)),$pln"; ((cln=pln)) while matchp "$@" "$((++cln))"; do lnprn "$2" "$cln"; done break elif greatp "$@" "$cln"; then ((cln>pln)) && ((d=max-cln, cln=cln+d/2, pln=cln-d/2, min=pln)) || ((d=pln-cln, cln=cln+d/2, pln=cln-d/2, min=pln)) else ((cln>pln)) && ((d=cln-pln, cln=cln-d/2, pln=cln+d/2, max=pln)) || ((d=cln-min, cln=cln-d/2, pln=cln+d/2, max=pln)) fi done ;} # The above is all. The below benchmarks it against look and grep. main() { fil="/usr/share/dict/american-english" data="$(< "$fil")" str="$1" l() { look "$str" "$fil" | tr -d "'" | xargs ;} g() { grep "^$str" <<< "$data" | tr -d "'" | xargs ;} b() { binsch "$str" "data" | tr -d "'" | xargs ;} echo; TIMEFORMAT="Look %R"; time l "$1" echo; TIMEFORMAT="Grep %R"; time g "$1" echo; TIMEFORMAT="This %R"; time b "$1" ;} main "$@" exit 0
Note that it isn't optimized for multiple hits — that is, it assumes there will be few hits.
Here are some results, run on an old machine I used for testing.
It looks up in a standard English dictionary words starting with "shell" and with "slow".
./binsch.sh shell
shell shells shellac shellacs shellacked shellacking shellacs shelled sheller shellfish shellfishs shellfishes shelling shells Look 0.009 shell shells shellac shellacs shellacked shellacking shellacs shelled sheller shellfish shellfishs shellfishes shelling shells Grep 0.038 shell shells shellac shellacs shellacked shellacking shellacs shelled sheller shellfish shellfishs shellfishes shelling shells This 1.996
and:
./binsch.sh slow
slow slowdown slowdowns slowdowns slowed slower slowest slowing slowly slowness slownesss slowpoke slowpokes slowpokes slows Look 0.008 slow slowdown slowdowns slowdowns slowed slower slowest slowing slowly slowness slownesss slowpoke slowpokes slowpokes slows Grep 0.050 slow slowdown slowdowns slowdowns slowed slower slowest slowing slowly slowness slownesss slowpoke slowpokes slowpokes slows This 2.742
Conclusion: don't.