Pansymbolic: Mapping all Emacs Lisp symbols

Pansymbolic

Before anything — here's the link to what's missing.

Does any of these sound like something you could do?

Goals

Map all Emacs Lisp symbols out there (one map for each namespace).
Provide tools to search these maps.

Why?

To give you the highest confidence that the symbols you want to use for your package haven't already been used. Searching this database can inform your decisions and minimize the chance of a clash of symbols.
All sorts of other uses that people might come up with (research, statistics, etc).

Process

Overview

Have all elisp packages
Extract all symbols from all these packages
Remap: from package→symbols to symbol→packages
Have tools to query it

What do we have?

I consider #3 and #4 well addressed by my SymTree and SymTable packages.

SymTable can convert between the two input formats accepted by SymTree.
SymTree tokenizes symbols, builds a huge tree with them, and provides functions to search prefixes in that tree.

Which of the two to use, or both, depends on final choices about formats and data structure.

Comparison of both is given further down.

What is missing?

Most of what is missing can be summarized as:

Do we have a few functions that can reliably extract lists of symbols (one list per namespace) defined by any given elisp package?
Do we have comprehensive, frequently updated archives of elisp packages from which to extract these symbols?
By "comprehensive" I mean a high coverage of all public elisp packages out there.
Do we have someone who can apply the first to the second, on a regular basis, and supply results in one of the two input formats accepted by SymTree?

If all answers are "yes", then the problem is mostly solved.

Note about primitives

There are more than 1400 native Emacs functions written in C, so a "package" 'EMACS-PRIMITIVES needs to be added with its corresponding symbols.

These symbols could be extracted:

directly from the C sources (somehow); or
by loading the latest minor version of Emacs and running something like this:

(let (prims)
  (mapatoms
   (lambda (symbol)
     (and (functionp symbol)
          (subr-primitive-p (symbol-function symbol))
          (push symbol prims))))
  (sort prims #'string<))

Design notes

One of the main motivations for this project is to avoid collisions between elisp symbols. Inevitably, this brings up the topic of prefixes.

Conventional approach: try to determine what "the prefix" of a package is, store that, repeat for every package, and done.

This sounds simple, but isn't. In the end, the question "What is this package's prefix?" may admit multiple answers, all of which in principle valid; and, crucially, selecting between them depends on factors that are external to the package.

So my approach is to ignore that question altogether.

Symbols, not prefixes

Instead of asking the package a question that usually does not admit of a singular definitive answer, we simply slurp lists of symbols defined by the package, separated by namespace.

This symbol hoarding may, at first, look like an unnecessary complication. Instead, it seems to simplify things. Collection becomes straightforward, and queries can be more flexible.

Eventually, we'll ask not what "the prefix" is, but rather "does any package here has any symbol that has this (my query) as one of its prefixes?".

The prefix paradox

Sometimes, the best way to simplify a problem is by making it bigger.

By not trying to determine "the prefix" of any package in isolation, we can sooner answer questions about all prefixes of all symbols of all packages.

Data structures

Inputs and outputs

One way to store collected symbols is to put them in a map where each package is a key and each value is a list of symbols defined by that package in one namespace. Let's call this pack→syms. It's a one-dimensional map.

Another way is sym→packs. It can be produced either directly or by converting pack→syms.

Here, each symbol is a key, and the value is a list of packages that define that symbol. For almost all symbols, that list will have only one package. When not, an unintended symbol collision might be going on. We can see that pack→syms is also a one-dimensional map.

In order to query the symbols, it's cumbersome to have them be inside the lists in the values (pack→syms). We'd have to walk all the lists in all the values for every search. That's expensive.

Some alternatives to choose from are:

Make pack→syms; convert to sym→packs, then to tree:sym→packs; query that.
Make pack→syms; convert to sym→packs; query that.
Make sym→packs; convert to tree:sym→packs; query that.
Make sym→packs; query that.
Make pack→syms; convert directly to tree:sym→packs; query that.

What is tree:sym→packs? That's a tree in which the symbols have been properly tokenized and the list of packages that define each symbol is to be found at the leaves. It can provide visualization, navigation, and extremely fast "prefix" queries.

Partitioning

Given the huge size of the results (easily a few hundred thousand symbols for the functions' namespace), it seems important to horizontally partition sym→packs and/or tree:sym→packs.

Partitioning can be alphabetic, and perhaps first letter is enough, with an extra bucket (say, %) to put anything starting with [^a-zA-Z0-9]. So:

A search for "foo-this" or "f-ext" would open and read file f.
A search for "", "<", ="/", "+", or "-each" would read file %.

Storage formats

These flat maps could be stored in any way that we have available to represent one-dimensional key–value collections. Which one to pick will depend on whether we want to have it all in elisp or not.

All in elisp

Hash tables — that's my suggestion.

In this case:

SymTable can convert from pack→syms to sym→packs.
SymTree can convert from either of these two to tree:sym→packs.

Note that tree:sym→packs here isn't a "regular tree" (nested lists), but a nested hash table.

For more information, see the respective packages.

One question is whether it'd be better to have only sym→packs hash table(s) (without also the corresponding tree:sym→packs). The most critical factor would be querying speed.

Speed

To query a "foo-this" prefix:

With tree:sym→packs, we'd be sequentially looking up exact matches of, respectively, "foo", "-", "this" in a nested hash table: instantaneous.
With sym→packs, we'd be walking all keys of a large flat hash table to see which of them matches "^foo-this$\\|^foo-thisSEP-RX": clearly slower.

But how much slower, and how much would that matter?

That would need some testing. You can build a single huge sym→packs hash table with real data and then time the queries.

Speed in this case could be enhanced by aggressively partitioning into several smaller files. Having 27 files is one option (a-z%). Another option is having 27 directories, and then having one file per "first prefix". Example:

.
├── %
├── a
├── b
├── ...
├── n
└── o
    ├── ob
    ├── ...
    ├── org
    ├── ...
    └── ox

Then inside the file ob all symbols whose first prefix is "ob", from all packages, would be listed.

In the sym→packs version, all complete symbols would be keys: ob-this, ob-that, ob--that-too, etc.
```
;;  Symbols (funs)  Packages that define them
(h* "ob-this"       '(ob-this)
    "ob-that"       '(ob ob-plus)  ; collision!
    "ob--that-too"  '(ob))
```
Note:
- (h*…) is a hash table creation function from xht.
- #s(hash-table…), the native format, can be used instead.

In the tree:sym→packs version, ob would be the only key, and the other tokens branch from there:
```
(symtree-h-make
 'sp (h* "ob-this"      '(ob-this)
         "ob-that"      '(ob ob-plus)  ; collision!
         "ob--that-too" '(ob)))
H=>
(h* "ob" (h* "-" (h* "this" (h* :@ '(ob-this))
                     "that" (h* :@ '(ob ob-plus))  ; collision!
                     "-"    (h* "that" (h* "-" (h* "too" (h* :@ '(ob))))))))
```
With tree:sym→packs, the speed gains of partitioning in many files would be negligible compared to having it all in one huge nested hash table (with several keys like ob).

But it could still be worth doing, for two reasons:
- less memory (a much smaller table is loaded for each query)
- possibility of selective downloads (a user might want to fetch only the o folder or the ob file, for example, instead of the whole thing).

Advantages of each version

Working with tree:sym→packs has as advantages:

instantaneous queries
"incremental prefix navigation" (as org trees, if desired)

Working directly with sym→packs has as advantages:

simpler to maintain

Other data formats

If data formats other than elisp are fine, then flat TSV is a good choice.

Both pack→syms and sym→packs could be put into a two-column TSV each.

Since sym→packs could have hundreds of thousands of lines, it might still be a good idea to partition it alphabetically into several files.

Querying sym→packs could then be done by grepping lines by regex and then uniq-ifying the values. Naturally, this could also be coded in elisp and done from inside emacs in a platform-independent fashion.

Below is a simple solution to that, using a new function candidate for my SymTable package. Note: a KVL, which means "key–value lines", is just a headerless two-column TSV that is flexible about field separators.

;; Example of a possible new function for non-elisp data format
(defun symtable/query-pre-in-kvl (korkf prefix &optional kvl-sep)
  "Return packages that have PREFIX.
KORKF is either a KVL file or a KVL.
For the meaning of KVL, see ‘h<-kvl’.
For the meaning of PREFIX, see ‘symtable-query-pre’.
Optional argument KVL-SEP is the separator used in the KVL. If nil,
use ‘h-kvl-sep-re’, whose value is \" *= *\". Pass \\t for tabs."
  (let* ((case-fold-search nil)
         (h-kvl-sep-re (if kvl-sep kvl-sep h-kvl-sep-re))
         (kvl (if (file-exists-p korkf) (h-read-kvl korkf) korkf))
         (rx  (symtable--query-make-pre-rx prefix))
         (htb))
    (with-temp-buffer
      (save-excursion (insert kvl))
      (keep-lines rx)
      (setq htb (h<-kvl (buffer-string))))
    (h--hmap! key (s-split " " value 'no-nulls) htb)
    (symtable-query-rx htb rx)))

So that:

(let ((kvl "
foo-this          =  foo
foo--this-thing   =  foo foo-plus
global-foo-mode   =  foo
bar--that         =  bar
global-bar-mode   =  bar
baz-global-doing  =  baz
globalize-things  =  globalize
qux/mode          =  qux
global-qux/mode   =  qux-lite qux
qux/something     =  qux"))

  (symtable/query-pre-in-kvl kvl "global"))
=> '("bar" "foo" "qux" "qux-lite")

This means these four packages define at least one function that has global as one of its prefixes. Note that globalize isn't a match, because it's not a "prefix for packaging purposes".

(The above isn't the most efficient way of doing that, but it's simple enough. It could be optimized later, if needed, after exact formats have been chosen.)

Products and availability

Once ready, the maps should be:

in an open format
available for download
mirrored in different forges

Input maps could also be made available.

A proper license should be chosen for these files.

Frequency of update has to be determined.

Besides all that, people might also welcome APIs or web interfaces to search all these maps online, if someone can offer it. At the very least, this would include "Prefix as a Service".

Development

I thought about design, data structures, the overall process. I also wrote two packages to address steps #3 and #4. That's what I can offer.

I have no interest in dealing with the first two steps, namely the retrieval of packages or the extraction of symbols.

Going forward, I might still be able to fix bugs and make improvements in SymTree and SymTable, if that's needed.

Other people are needed for the rest. In particular:

Solving the first two steps, which means being able to reliably produce results in pack→syms or sym→packs format.
Testing SymTable and/or SymTree against these inputs.
Owning the ongoing process (which should be mostly automatable).

These are not necessarily all the same person — but could be.

If you write code that has a chance of providing an automated solution for #1 and #2 (great!), consider hosting it on (or at least mirroring it to) SourceHut.

Acknowledgments

I had this idea a couple years ago; I thought this database should exist. Yet I was unsure whether there'd be any interest.

Later, Jonas Bernoulli shared with me the outlines of his idea of an independent database of prefixes. Its goals were similar. This influenced my putting the time to publish SymTable, SymTree, and this document — and launch this project.

I hope this can pave the way to the first working version of a queryable database of Emacs Lisp symbols.

Version

Version: 0.2.0.1
Updated: 2025-08-08

License

This project follows the REUSE Specification (FAQ), which in turn is built upon SPDX.

Therefore, license and copyright information can be found in:

each file's comment header, or
an adjacent file of the same name with the additional extension .license, or
the .reuse/dep5 file

The full text of the licenses can be found in the LICENSES subdirectory.

📆 2025-08-08