Hash tables look better in Emacs 30

Emacs 30 brings changes to hash tables. From the NEWS file:

 ** Hash tables

 *** ':rehash-size' and ':rehash-threshold' args no longer have any effect.
These keyword arguments are now ignored by 'make-hash-table'.  Emacs
manages the memory for all hash table objects in the same way.
The functions 'hash-table-rehash-size' and 'hash-table-rehash-threshold'
remain for compatibility but now always return the old default values.

 *** The printed representation has been shrunk and simplified.
The 'test' parameter is omitted if it is 'eql' (the default), as is
'data' if empty.  'rehash-size', 'rehash-threshold' and 'size' are
always omitted, and ignored if present when the object is read back in.

As someone who has called the native hash table format "ungainly", I much appreciate these changes.

Let's have a look at some before-and-after examples.

Emacs 26 to 29

Expression Size Test Evaluation in Emacs 26 to 29
(make-hash-table) 65 eql #s(hash-table size 65 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
(make-hash-table :size 0) 1 eql #s(hash-table size 1 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
(make-hash-table :size 7) 7 eql #s(hash-table size 7 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
#s(hash-table) 65 eql #s(hash-table size 65 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
#s(hash-table size 7) 7 eql #s(hash-table size 7 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
#s(hash-table data (:a 1)) 1 eql #s(hash-table size 1 test eql rehash-size 1.5 rehash-threshold 0.8125 data (:a 1))
#s(hash-table size 7 data (:a 1)) 7 eql #s(hash-table size 7 test eql rehash-size 1.5 rehash-threshold 0.8125 data (:a 1))
(h-new) 1 equal #s(hash-table size 1 test equal rehash-size 1.5 rehash-threshold 0.8125 data ())
(h-new 0) 1 equal #s(hash-table size 1 test equal rehash-size 1.5 rehash-threshold 0.8125 data ())
(h-new 7) 7 equal #s(hash-table size 7 test equal rehash-size 1.5 rehash-threshold 0.8125 data ())
(h*) 1 equal #s(hash-table size 1 test equal rehash-size 1.5 rehash-threshold 0.8125 data ())
(h-s* 7) 7 equal #s(hash-table size 7 test equal rehash-size 1.5 rehash-threshold 0.8125 data ())
(h* :a 1) 1 equal #s(hash-table size 1 test equal rehash-size 1.5 rehash-threshold 0.8125 data (:a 1))
(h-s* 7 :a 1) 7 equal #s(hash-table size 7 test equal rehash-size 1.5 rehash-threshold 0.8125 data (:a 1))

Above we see seven expressions using native hash table format, followed by seven xht expressions. The latter roughly correspond to the former, except for the test.

The columns size and test show the values we get if we look those properties up. For example:

(hash-table-size (make-hash-table)) => 65
(hash-table-test (make-hash-table)) => 'eql

or using xht:

(h-prop (make-hash-table) 'size) => 65
(h-prop (make-hash-table) 'test) => 'eql

The fourth column shows what these expressions evaluate to. As you can see, up to Emacs 29 all properties are explicitly listed in the evaluation.

Two notes:

  • The same evaluations happened in Emacs 25 — except for the rehash-threshold, which was 0.8.
  • The examples here use xht version 2.2.2, in which (h-new) has size 1 (the minimum) in Emacs 26–29; and size 0 in Emacs 30.

Now look how these same expressions evaluate in Emacs 30.

Emacs 30

Expression Size Test Evaluation in Emacs 30
(make-hash-table) 0 eql #s(hash-table)
(make-hash-table :size 0) 0 eql #s(hash-table)
(make-hash-table :size 7) 7 eql #s(hash-table)
#s(hash-table) 0 eql #s(hash-table)
#s(hash-table size 7) 0 eql #s(hash-table)
#s(hash-table data (:a 1)) 1 eql #s(hash-table data (:a 1))
#s(hash-table size 7 data (:a 1)) 1 eql #s(hash-table data (:a 1))
(h-new) 0 equal #s(hash-table test equal)
(h-new 0) 0 equal #s(hash-table test equal)
(h-new 7) 7 equal #s(hash-table test equal)
(h*) 0 equal #s(hash-table test equal)
(h-s* 7) 7 equal #s(hash-table test equal)
(h* :a 1) 1 equal #s(hash-table test equal data (:a 1))
(h-s* 7 :a 1) 7 equal #s(hash-table test equal data (:a 1))

Isn't that much nicer? Goodbye noise!

Until here, the printed representation always showed everything:

;; Emacs 26–29

#s(hash-table)
=> #s(hash-table size 65 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())

#s(hash-table size 100 test equal rehash-size 2.5 rehash-threshold 0.4242 data (:a 1 :b 2))
=> #s(hash-table size 100 test equal rehash-size 2.5 rehash-threshold 0.4242 data (:a 1 :b 2))

Now it's the opposite: it shows as little as possible — only the test, if not eql; and the data, if not nil:

;; Emacs 30

#s(hash-table)
=> #s(hash-table)

#s(hash-table size 65 test eql rehash-size 1.5 rehash-threshold 0.8125 data ())
=> #s(hash-table)

#s(hash-table size 100 test equal rehash-size 2.5 rehash-threshold 0.4242 data (:a 1 :b 2))
=> #s(hash-table test equal data (:a 1 :b 2))

The rehash parameters are now always ignored and dealt with internally. I like this decision.

The way size is dealt with is subtler

You can still set the size of your new table when you use make-hash-table:

(hash-table-size (make-hash-table))          => 0
(hash-table-size (make-hash-table :size 42)) => 42

but the value assigned is lost if you evaluate the expression:

(make-hash-table)          => #s(hash-table)
(make-hash-table :size 42) => #s(hash-table)

since:

(hash-table-size #s(hash-table)) => 0

The evaluation resizes it to fit the actual data. It assumes you're done changing the table.

And if you pass the size directly like this, it's ignored:

(hash-table-size #s(hash-table))         => 0
(hash-table-size #s(hash-table size 42)) => 0

which is something that does not happen with test, mind you:

(hash-table-test #s(hash-table))            => 'eql
(hash-table-test #s(hash-table test equal)) => 'equal

If you want to hardcode a size, assign the table to a variable

(let ((x (make-hash-table :test 'equal :size 26)))  ; with xht: (h-new 26)
  (puthash 'a 1 x)                                  ; with xht: (h-put! x 'a 1)
  (puthash 'b 2 x)
  ...
  (puthash 'z 26 x)
  (list :size (hash-table-size x)
        'x     x))
=> '(:size 26
     x #s(hash-table test equal data (a 1 b 2 ... z 26)))

When would you want to do that? When you already know how many pairs it'll have. This creates exactly one hash table of the exact size to fit your data.

Behind the scenes, xht's h* does that, too:

(hash-table-size (h*))                => 0
(hash-table-size (h* :a 1))           => 1
(hash-table-size (h* :a 1 :b 2))      => 2
(hash-table-size (h* :a 1 :b 2 :c 3)) => 3

Not specifying the size is necessarily less efficient, because it'll either create a few tables (that's what the "rehash" things do when starting from size 0); or create an oversized one (the native solution until Emacs 29, whose default size 65 had a large upfront cost when dealing with smaller tables).

When creating from scratch, notice the difference between these two:

(let ((x (make-hash-table :test 'equal)))  ; size = 0
  (puthash 'a 1 x)  ; a size-6 table is "rehashed" here to fit the pair
  (hash-table-size x))
=> 6

(let ((x (make-hash-table :test 'equal :size 1)))
  (puthash 'a 1 x)  ; size is 1 — so it already fits
  (hash-table-size x))
=> 1

Same as above but using xht:

(let ((x (h-new)))
  (h-put! x 'a 1)
  (h-prop x 'size))
=> 6

(let ((x (h-new 1)))
  (h-put! x 'a 1)
  (h-prop x 'size))
=> 1

(let ((x (h* 'a 1)))
  (h-prop x 'size))
=> 1

In the previous alphabet example, if we don't set the initial size of 26, the final size will be 96:

(let ((x (make-hash-table :test 'equal)))  ; with xht: (h-new)
  (puthash 'a  1 x)
  (puthash 'b  2 x)
  ...
  (puthash 'z 26 x)
  (hash-table-size x))
=> 96

This is because when we add:

  • 'a 1, it rehashes to size 6
  • 'g 7, it rehashes to size 24
  • 'y 25, it rehashes to size 96

So four tables are created (allocation sizes 0, 6, 24, and 96); the first three are destroyed.

Whereas if we set the size, only one (allocation size 26) is created.

Will those gains be substantial? Probably not much. The elisp manual says:

‘:size SIZE’
     This specifies a hint for how many associations you plan to
     store in the hash table.  If you know the approximate number,
     you can make things a little more efficient by specifying it
     this way but since the hash table memory is managed
     automatically, the gain in speed is rarely significant.

But it doesn't hurt.

The new default initial size benefits smaller tables

The previous default size of 65 for new hash tables seemed to assume that it'd be a bad idea to create one with fewer items than that.

The new default assumes nothing: it allocates no space. When one item is added, it rehashes to size 6, and stays like that until a seventh item is added, upon which it goes up to 24. It postpones the fixed costs of creation until it's unavoidable.

Hash table data files will be smaller

Since hash table evaluations became less noisy, elisp data files storing hash tables will be smaller.

How much smaller? It depends on the data.

If you're storing a single hash table containing thousands of pairs, the reduction in noisy metadata at the beginning won't make much of a difference. But if you're storing nested hash tables, each of which holding few pairs, it'll shrink a lot.

To illustrate that, let's suppose you want to store in pretty-printed (to avoid long lines) native format the following nested hash table, which is in (h*…) format:

(h-write-htbl-pp!
 "~/htbl-data.el"
 (h* :a 1
     :b (h* :c 3
            :d (h* :e 5
                   :f 6))))

In Emacs 26–29, the file ~/htbl-data.el would look like this:

#s(hash-table size 2 test equal rehash-size 1.5 rehash-threshold 0.8125 data
              (:a 1 :b
                  #s(hash-table size 2 test equal rehash-size 1.5 rehash-threshold 0.8125 data
                                (:c 3 :d
                                    #s(hash-table size 2 test equal rehash-size 1.5 rehash-threshold 0.8125 data
                                                  (:e 5 :f 6))))))

whereas in Emacs 30 it'd shrink to this:

#s(hash-table test equal data
              (:a 1 :b
                  #s(hash-table test equal data
                                (:c 3 :d
                                    #s(hash-table test equal data (:e 5 :f 6))))))

I welcome all these changes.