Homeemacs › Raw strings

Raw strings

6 months ago, back when I was reading the C source of the Emacs reader I tried to implement raw strings in Emacs. This post was supposed to be written/published earlier but I had a lot of work in between, I’m still not very comfortable writing in English and I had a hosting problem. Anyway, here it is.

A raw string is just a special syntax for a string literal where the content is interpreted literally (especially the character \) i.e. nothing can be escaped or interpolated. Several programming languages handle them e.g.:

Python: r"aa\naa"    r"""aa\n"aa"""
Perl:   'aa\naa'     q{aa\n'aa}
C++11:  R"(aa\naa)"  R"foo(aa\n)aa)foo"

It’s very useful for regexes because every time you need to match a character that also happens to be a meta-character (like + or \) you have to escape it. And since the regex is written in a string literal you have to escape the escape character because they both use \ as the escape character. This process can be painful and error-prone. Google backslash hell or backslashitis for some examples.

Back to Emacs. I actually wrote a working proof of concept in the form of 2 patches to the reader function:

  • Triple-quoted strings (à la Python) (diff)
  • Custom-delimiter strings (à la Perl/sed) (diff)

The code is not very clean and may be buggy since most of it comes from the regular string syntax code but it works:

# Python
$ ./emacs -Q -batch --eval '(message #r"""ha"\nha""")'
ha"\nha

# Perl
$ ./emacs -Q -batch --eval '(message #r,ha"\nha,)'
ha"\nha
$ ./emacs -Q -batch --eval '(message #r~ha"\nha~)'
ha"\nha

Although the reader works, some minor parts of Emacs are broken in the presence of raw strings (sexp navigation, font-locking, C-x C-e, …). These other parts of the environment need to be aware of the new syntax and shouldn’t be too hard to fix.

At this point I posted my result to the emacs-devel mailing-list which led to an interesting discussion. There was no clear consensus but I think most people realized that raw strings are not a satisfying solution to the regex problem. Some would rather have a way to write custom syntax reader in Lisp which is nice but hard to implement. Others said you’re better off using rx.

rx is a macro that lets you write readable regex in the form of s-expressions:

(rx (+ "abc") "foo" (group (or "zob" "foo")))
=> "\\(?:abc\\)+foo\\(\\(?:foo\\|zob\\)\\)"

I personally think raw strings have their use outside of regexes and would be a nice addition to the Emacs Lisp language. As for the regex I now write mine with rx all the time. I just wish there was a built-in way to use rx in interactive search/replace functions. I will work on this eventually if someone hasn’t done this already.

That’s all for today.

1 Comments.[ Leave a comment ]

  1. Unified communication system helps small, medium and big businesses
    with customized solutions simplifying processes and making it easier for users
    gain access to and share information. The marketing on Craigslist tool in particular works in a real way that an account is detected and created
    automatically.

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>