Embracing Emacs Lisp hash tables: introducing XHT
I've "finished" writing XHT, the extensive hash table library for Emacs. It aims to cover pretty much anything you may want to do with hash tables there. Below I add a couple of things that didn't quite fit the README, namely:
- Some longer observations about hash table speeds in Emacs Lisp.
- Why you don't see hash tables being used in Emacs Lisp as often as it seems warranted by their merits.
Emacs Lisp hash table speeds
They're fast. In fact, faster than alists and plists when beyond a few tens of items.
I found the observation above a few times: in Adam Porter's emacs-package-dev-handbook, which benchmarked them; in the README of Wilfred Hughes's ht library; and in the Emacs Lisp Manual itself, which says:
A hash table is a very fast kind of lookup table, somewhat like an alist in that it maps keys to corresponding values.
and
Lookup in a hash table is extremely fast for large tables — in fact, the time required is essentially independent of how many elements are stored in the table. For smaller tables (a few tens of elements) alists may still be faster because hash tables have a more-or-less constant overhead.
This raises the question: how many "tens"?
According to Adam Porter's benchmarking, hash tables are already faster than plists and alists at 26 pairs, and the difference only gets larger beyond that. But... what about for fewer items?
I gave it a look. Below I use bench-multi-lets
, which can be found in emacs-package-dev-handbook.
(bench-multi-lets :times 10000 :ensure-equal t :lets (("with 5 pairs" ((numseq (number-sequence 1 5)) (strings (--map (format "prefix-%s" it) numseq)) (strings-alist (--map (cons it it) strings)) (strings-ht (h<-alist strings-alist)) ;or use ht, same here (strings-plist (h->plist strings-ht))))) ;or use ht, same here :forms (("strings/alist" (let (r) (dolist (string strings r) (push (cdr (assoc string strings-alist)) r)))) ("strings/plist" (let (r) (dolist (string strings r) (push (plist-get strings-plist string) r)))) ("strings/hash-table/equal" (let (r) (dolist (string strings r) (push (gethash string strings-ht) r))))))
Form | x fastest | Total runtime | # of GCs | Total GC runtime |
---|---|---|---|---|
with 5 pairs: strings/plist | fastest | 0.015930 | 0 | 0 |
with 5 pairs: strings/hash-table/equal | 1.23 | 0.019618 | 0 | 0 |
with 5 pairs: strings/alist | 1.30 | 0.020658 | 0 | 0 |
And since we're on it, I repeated it with other numbers. Results:
Form | x fastest | Total runtime | # of GCs | Total GC runtime |
---|---|---|---|---|
with 10 pairs: strings/plist | fastest | 0.028445 | 0 | 0 |
with 10 pairs: strings/hash-table/equal | 1.24 | 0.035319 | 0 | 0 |
with 10 pairs: strings/alist | 1.60 | 0.045407 | 0 | 0 |
with 20 pairs: strings/plist | fastest | 0.056012 | 0 | 0 |
with 20 pairs: strings/hash-table/equal | 1.22 | 0.068329 | 0 | 0 |
with 20 pairs: strings/alist | 2.02 | 0.112923 | 0 | 0 |
with 30 pairs: strings/plist | fastest | 0.091255 | 0 | 0 |
with 30 pairs: strings/hash-table/equal | 1.09 | 0.099372 | 0 | 0 |
with 30 pairs: strings/alist | 2.48 | 0.226039 | 0 | 0 |
with 40 pairs: strings/plist | fastest | 0.130532 | 0 | 0 |
with 40 pairs: strings/hash-table/equal | 1.00 | 0.130891 | 0 | 0 |
with 40 pairs: strings/alist | 2.93 | 0.382924 | 0 | 0 |
with 50 pairs: strings/hash-table/equal | fastest | 0.165664 | 0 | 0 |
with 50 pairs: strings/plist | 1.08 | 0.178832 | 0 | 0 |
with 50 pairs: strings/alist | 3.67 | 0.608449 | 0 | 0 |
with 100 pairs: strings/hash-table/equal | fastest | 0.776694 | 1 | 0.455533 |
with 100 pairs: strings/plist | 1.15 | 0.892707 | 1 | 0.455061 |
with 100 pairs: strings/alist | 3.41 | 2.649609 | 1 | 0.457006 |
with 500 pairs: strings/hash-table/equal | fastest | 3.885438 | 5 | 2.266520 |
with 500 pairs: strings/plist | 1.95 | 7.562806 | 5 | 2.268269 |
with 500 pairs: strings/alist | 12.73 | 49.476486 | 5 | 2.264634 |
Conclusion? For strings, lookups in:
- alists are way slower than plists and hash tables and it gets worse with size (see the runtime change from 100 to 500?);
- plists are about 20% faster than hash tables for up to 20 pairs; this drops to 10% at about 30 pairs; then at about ~40 pairs hash tables catch up — and from there on they outperform both alists and plists (by ever-widening margins).
Should you then opt for plists when below 40 items? I think it depends:
- If the number of lookups is large, you do it repeatedly, the dataset is unlikely to grow beyond 40 items, and the thing is performance-critical, then I suppose so.
- Otherwise, it makes no difference. We're talking about a few microseconds per lookup. Use whatever you like.
Yes, whatever you like. But consider this: if you have to pick one to learn well, hash tables have this advantage, then: fastest option beyond 40 items; and no more than about 24% slower below that — which will likely only matter at all if you'll have to look up this less-than-40-item table thousands of times to get you out of the microseconds scale (and even that will get you to milliseconds: do you care?).
Moreover, it seems that the advantages in speed are asymmetrically larger than the disadvantages in speed. The former can be considerable, whereas it'd rarely be the case where the latter would matter: hash tables' speed seems to be capped at about ~20% slower than plists for 5 items, but already faster than alists.
But we're left with this "whatever you like", and then some questions come up.
- Are Elisp hash tables any... less-likable than these other two fellows?
- And if they're "better", then why won't Elispers use them more?
Let's try to answer them.
Barriers for using hash tables in Emacs Lisp
Here are some conjectures about why hash tables might not currently be your first choice for key–value collections in Elisp.
Not knowing they exist
Good enough reason.
Now you do.
Not knowing about their advantage in speed
Now you do.
"Lisp is about lists"
Following this logic, alist
and plist
sound more fitting to use, right? (Even though they could also be said to be "tables", which likewise store key–value pairs.)
This might also explain why vectors are far less popular than lists despite the fact that lookups happen in constant time (as with hash tables) instead of linear time. And, you know, square brackets? Ugh, so unlispy! Gimme parentheses! (Well, vectors are also rigid structures, and so not as amenable to consing.)
But wait a minute. The Emacs Lisp Manual says:
Hash tables have a special printed representation, which consists of ‘#s’ followed by a list specifying the hash table properties and contents.
So, a hash table is represented by... a list? With just two tiny little extra characters before it?
This sounds terribly list-y. It's certainly list-y enough to me. In fact, if you started calling then "hash lists" (hlist
?) I'd be totally ok with it.
If it's list-y, it's Lisp-y, therefore you can use them alright. Problem solved.
You see it as being only for large datasets
People who know hash tables might implicitly believe that a hash table is something you only launch when faced with large datasets — something special, more complicated, which isn't necessary for dealing with smaller things where a list would do.
Fair enough, not necessary. But are they a problem? Only if:
- there's a performance cost in using them (we covered that already: see the last #2 above), or
- dealing with them is bothersome and ugly and complicated.
The last point is a bit more defensible. XHT attempts to fix that. I currently find dealing with hash tables pleasant, and usually more so than the alternatives.
But wait, did I mention "ugly"? What am I talking about?
Noisy representation
Let's compare. You have two key–value pairs:
:a
and"Alice"
:b
and two more pairs:
2.1:ba
and"Barbara"
2.2:bo
and"Bob"
How to represent this in Emacs Lisp? Either with a plist:
'(:a "Alice" :b (:ba "Barbara" :bo "Bob"))
or an alist:
'((:a . "Alice") (:b . ((:ba . "Barbara") (:bo . "Bob")))) ;; (which, since the dot represents consing, can be simplified into) '((:a . "Alice") (:b (:ba . "Barbara") (:bo . "Bob")))
(Already a bit noisier than the plist, right?)
or a hash table...
#s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data (:a "Alice" :b #s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data (:ba "Barbara" :bo "Bob"))))
Good grief! Who wants to deal with that mess?
(And I formatted it a bit — they usually come jumbled in a single line.)
First of all, you see that #s
thing and become a bit uncomfortable. You think:
"What is that? This is clearly not a regular list. Can I edit it directly?
And these other parameters: should I change them? Leave them?
Too complicated! Back to plists."
Fair enough.
Let's see another issue: how do you create these structures when you're given the items?
To generate the plist you would:
(list :a "Alice" :b (list :ba "Barbara" :bo "Bob"))
Simple.
To generate the alist you would:
(list (cons :a "Alice") (list :b (cons :ba "Barbara") (cons :bo "Bob")))
Eh. Not as nice.
For a hash table... you'd have to go through this:
(let ((tbl1 (make-hash-table :test #'equal)) (tbl2 (make-hash-table :test #'equal))) (puthash :a "Alice" tbl1) (puthash :ba "Barbara" tbl2) (puthash :bo "Bob" tbl2) (puthash :b tbl2 tbl1) tbl1)
Bored already? No wonder.
See, you can't even create one on-the-fly when given the items — you must create an empty one, give it a name, then add the items one by one, then return it. And you have to do it for the nested ones as well.
Or you could — but with some hack such as this:
(read (format "#s(hash-table test equal data (%S %S %S %S))" :a "Alice" :b (read (format "#s(hash-table test equal data (%S %S %S %S))" :ba "Barbara" :bo "Bob"))))
which does create the table, but is ugly and awkward.
Enter ht
Then came Wilfred Hughes's ht. With it, part of that noise was gone.
Want to create that hash table? No problem:
(ht (:a "Alice") (:b (ht (:ba "Barbara") (:bo "Bob"))))
Ah, that's better!
Want to create an empty one? Instead of:
(make-hash-table :test 'equal)
just
(ht-create)
So these details add up and it becomes more pleasant to deal with the thing. The >1M downloads of ht
from MELPA suggests that:
- People do care about hash tables, and
- The basic, half a dozen hash table functions that come with Emacs are not enough.
However, some areas for further development remained, in my opinion. Here are two:
- After you create a hash table with
#'ht
, if you want to look at the table you're still stuck with its noisy representation (unless you export it to a list withht-items
and reverse it). - Other than
ht-get*
, it has no tools for dealing with nested tables. Examples:
(ht-items (ht (:a "Alice") (:b (ht (:ba "Barbara") (:bo "Bob")))))
returns
'((:b #s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data (:ba "Barbara" :bo "Bob"))) (:a "Alice"))
and conversion to alist doesn't make the expected nested alist, so
(ht->alist (ht (:a "Alice") (:b (ht (:ba "Barbara") (:bo "Bob")))))
returns
((:b . #s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data (:ba "Barbara" :bo "Bob"))) (:a . "Alice"))
and vice-versa from nested alist.
Equality used to fail when nested as well.
So I started writing a few functions to complement and extend the paths that had already been opened with this library, and...
Enter xht
...the "few functions" evolved into a full-fledged, systematically-structured library containing more than 200 functions (passing more than 1800 regression tests) for doing all sorts of things with hash tables: XHT.
I'm satisfied with the results.
I wish that with it more people adopt hash tables as their go-to key–value data structure when coding in Emacs Lisp. And I wish that those who already like and use them can now further extend their use and possibly also simplify their code.
Usage is fully documented in the README.org
. It's all there. Here's a tiny sample:
h*
Alternative way of creating, similar to ht
, but plist-like and therefore still less noisy.
This creates the same table as the previous example:
(h* :a "Alice" :b (h* :ba "Barbara" :bo "Bob"))
h-let (dot-binding)
Inspired by, and directly adapted from, let-alist
.
(h-let (h* :a "Alice" :b (h* :ba "Barbara" :bo "Bob")) .b.ba) => "Barbara"
And unlike let-alist
, it also works with string keys:
(h-let (h* "a" "Alice" "b" (h* "ba" "Barbara" "bo" "Bob")) .b.ba) => "Barbara"
h--hmap
Mapping from hash table to hash table.
Here's the anaphoric, nesting-aware, non-destructive function:
(h--hmap* (intern (upcase key)) (concat key "--" (downcase value)) (h* "a" "Alice" "b" (h* "ba" "Barbara" "bo" "Bob"))) H=> (h* 'A "a--alice" 'B (h* 'BA "ba--barbara" 'BO "bo--bob"))
All eight combinations exist as separate functions:
- (regular, anaphoric) × (non-nesting, nesting) × (non-destructive, destructive)
Their names:
echo h-{,-}hmap{,*}{,\!}
h-hmap h-hmap! h-hmap* h-hmap*! h--hmap h--hmap! h--hmap* h--hmap*!
The naming scheme is consistent and systematic across the library.
xht-do
This is XHT's dispatcher, a command to display and convert all sorts of things from and to hash tables.
It has several choices of target destination (insert at point, insert in a different buffer, write it to file, replace source...). It can also autodetect the source, DWIM-style.
Remember that nested hash table in its nested natural representation?
#s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data (:a "Alice" :b #s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data (:ba "Barbara" :bo "Bob"))))
You can put point inside it (or after it — both work), press three keys, and replace it with:
(h* :a "Alice" :b (h* :ba "Barbara" :bo "Bob"))
Or the (ht…)
representation if you like it better — also available.
Or insert after point if you prefer it.
You can as easily convert to/from hash table:
- alist
- plist
- list of lists
- org table
- JSON
- TSV
and a few others.
Three key presses over this org table:
| id | name | age | |----+-------+-----| | b | bob | 30 | | e | emily | 21 |
can create this:
(h* "b" (h* "id" "b" "name" "bob" "age" "30") "e" (h* "id" "e" "name" "emily" "age" "21"))
or vice-versa, if you want it.
And so on.
What next?
Having finished this introduction, I invite you to check XHT's README.
Explore, use, enjoy!