Enjoying Lisp-like quasiquotation in Bash

Why?

Lisp has a useful syntax in which one can quote S-expressions in a way that they return everything unevaluated except any parts that are selectively unquoted. Bash has no such option — nor do most other languages, as far as I can tell. In Bash, your only option is the opposite: to allow evaluation of everything and selectively quote what you don't want to evaluate. This "quoting" is usually called "escaping", and is usually accomplished with backslashes.

I wondered: how hard would it be to implement this feature in Bash?

Common Lisp uses tildes to represent unquoting, whereas Emacs Lisp uses a comma. I ended up deciding to use tildes to represent the functions that emulate this syntax, and use commas for the selective quoting.

First, a word about heredocs

What is a here-document, you ask. You must have seen it. It looks like this:

x="variables"

cat <<EOF
Some "stuff",
  often with 'quoting', and multiple lines
      and $x
          and formatting.
EOF

Output:

Some "stuff",
  often with 'quoting', and multiple lines
      and variables
          and formatting.

It's a little-known fact that quoting the EOF in a here-document has the effect of single-quoting its contents, so that everything is interpreted literally. The EOF (which can be any other string, by the way), in this case, can be quoted with either double or single quotes, no difference. I'd recommend single quotes as a matter of style, because elsewhere in Bash the use of double quotes means variables are interpreted, and only here it means they aren't.

So you can do this:

x=2
cat <<"HI!"
Hello $x
HI!

but this seems clearer to me:

x=2
cat <<'HI!'
Hello $x
HI!

both of which literally output:

Hello $x

Since we're here, know that the sentinel string needs to be on a line on its own to be seen as terminator. But with an added dash you could add tabs to the beginning, like this:

   { echo
     cat <<-HEY
<tab>Hello there
<tab>HEY
}

I don't like to use literal TABs in code, so I stopped using them with this syntax, but you may like it.

Anyway: quasiquoting.

The quasiquoting functions

are applied to here-documents. Any string can be used as a delimiter (here "A").

You must quote it. Single or double, it doesn't matter.

This:

~~ <<"A"
$unexpandedvar
,$expandedvar
A

is the same as:

~~ <<'A'
$unexpandedvar
,$expandedvar
A

As in Emacs Lisp, the comma is the unquote operator:

myvar=42
~~ <<'END'
# won't be interpreted:
This is $myvar
# will be interpreted:
This is ,$myvar
END

This produces seq 3:

~~ <<"EOC"
seq ,$((2+1))
EOC

A longer example:

variable="thing here"   #<-- outside the block: can you see why
                        # the example wouldn't work otherwise?

~~ <<"KITTENS!"
Hello world.
This $variable   is kept unexpanded.
This "$variable" is also kept unexpanded.
This '$variable' is also kept unexpanded.

$(this), "$(this)" and '$(this)' also remain unexpanded.

However, this ,$variable was actually expanded.

The only expanded thing are those prefixed with the comma operator.

So this will expand:
,$(seq 3)

And this, too (although it uses obsolete Bash syntax):
,`seq 3,`

KITTENS!

What if we wanted to evaluate the results?

We can produce code by selectively quasiquoting what we want. What about evaluating it?

Let's have another function, with an extra tilde.
It is somewhat equivalent to a funcall.

Let's see how it works. If ~~ gave us seq 3 as code, ~~~ will execute that.

~~~ <<"A"
seq ,$((2+1))
A

So this:

~~ <<"END"
wor="World"
echo "Hello $wor"
seq -s: ,$((2+1))
END

produces this:

wor="World"
echo "Hello $wor"
seq -s: 3

And this:

~~~ <<"END"
wor="World"
echo "Hello $wor"
seq -s: ,$((2+1))
END

produces this:

Hello World
1:2:3

The code

I wrote a few equivalent versions.

This works:

~~()  { . <(printf "%s\n" \
                   "$(sed -E 's/[\`$]/\\&/g
                              s/,[\]//g
                            1 s/^/cat <<EOF\n/
                            $ s/$/\nEOF/')");}

This also works:

~~()  { . <(printf "%s\n" \
                   "cat <<EOF
$(sed -E 's/[\`$]/\\&/g
          s/,[\]//g')
EOF");}

And this also works:

~~()  { . <(cat <<OUTSIDE
            cat <<INSIDE
$(sed -E 's/[\`$]/\\&/g
          s/,[\]//g')
INSIDE
OUTSIDE
);}  #<-- bipolar emoticon:
     #    half sad-crying, half devilishly-winking.

The definition of the triple tilde function is simply this:

~~~() { . <(~~);}

Comparing it to cat <<EOF

Now, the first one is equivalent to this:

cat <<END
wor="World"
echo "Hello \$wor"
seq -s: $((2+1))
END

So why bother?

Because sometimes you will have whole blocks of text in Bash where you want everything quoted except for one or two variables. So instead of madly escaping every [`$\] you see except those ones, you add the comma to these ones and to nothing else.

You'd normally do this:

varsum=42
cat <<END
We will have \$var1, \$var2, \$var3, \$var4, and \$var5.
You can use a backslash (\\) to escape them.
They must add up to $varsum.
END

Now you can do this, which is cleaner and closer to the output:

varsum=42
~~ <<"END"
We will have $var1, $var2, $var3, $var4, and $var5.
You can use a backslash (\) to escape them.
They must add up to ,$varsum.
END

And the ~~~ function can be handy for executing output intended as code rather than just text.

Going meta?

So. This produces "1 2 3" :

~~~ <<"A"
seq -s' ' ,$((2+1))
A

How to produce the very code above? With this:

cat <<'B'  # could have been "B", no difference.
~~~ <<"A"
seq -s' ',$((2+1))
A
B

And the previous one? With this:

cat <<'C'
cat <<'B'
~~~ <<"A"
seq -s' ',$((2+1))
A
B
C

Ad infinitum.

Alternative names

If you don't mind UTF-8-ing your Bash, you could instead (or in addition) use other names for the functions.

Such as these:

() { ~~ ;}  #<-- U+201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
‛‛() { ~~~;}  #<-- Two of them in a row

inspired by Elisp's backquote operator.

But these are somewhat easy to confuse with your regular backquote, right? And in Bash the regular backquote is already taken, `somecommand` being equivalent to $(somecommand), although the former is a deprecated syntax that doesn't allow nesting.

So we can pick something more distinct. Say, the below.

()  { ~~ ;}  #<-- U+275F HEAVY LOW SINGLE COMMA QUOTATION MARK ORNAMENT
❟❟() { ~~~;}  #<-- Two of them in a row
() { ~~~;}  #<-- U+2760 HEAVY LOW DOUBLE COMMA QUOTATION MARK ORNAMENT

A disadvantage is that it will slightly mess up with your fixed-spacing. Which you won't mind because you were hankering for some "heavy low comma quotation mark ornaments" in your Bash, weren't you?

Does it work? Sure.
Load the functions above. Then:

Producing code:

❟ <<"END"
wor="World"
echo "Hello $wor"
seq -s' ' ,$((2+1))
END
wor="World"
echo "Hello $wor"
seq -s' ' 3

Executing it:

❠ <<"END"
wor="World"
echo "Hello $wor"
seq -s' ' ,$((2+1))
END
Hello world
1 2 3

Disadvantages of this naming? May be incompatible with older Bash versions(?); and may have a cost in speed(?).
So the "tilde-naming" is a safer bet.

Ok, but in practice, what is a good use for it?

Here is a simple example:

for fun in cat less ls; do
    ~~ <<'FUNS'
    ,${fun}2x() { ,$fun "$@" | xclip -i -selection clipboard ;}
FUNS
done

Output:

cat2x() { cat "$@" | xclip -i -selection clipboard ;}
less2x() { less "$@" | xclip -i -selection clipboard ;}
ls2x() { ls "$@" | xclip -i -selection clipboard ;}

Change it to ~~~ and you have these functions declared.

If you find interesting additional uses for these functions, or simply were glad to be able to quasiquote in Bash, I'd be happy to hear it.