Skip to main content

Compile-time regex matcher using constexpr

With my growing constexpr fascination, I thought of using it for something that would be really hard using template meta-programs. How about implementing a compile-time regular expression matcher? Fortunately, a simple regular expression matcher has already been written by Rob Pike. I just rewrote it using constexpr: single return statement in functions, no modifications to the parameters, abundant ternery operators, and recursion. Here we go...

constexpr int match_c(const char *regexp, const char *text);
constexpr int matchhere_c(const char *regexp, const char *text);
constexpr int matchstar_c(int c, const char *regexp, const char *text);
constexpr int matchend_c(const char * regexp, const char * text);

constexpr int matchend_c(const char * regexp, const char * text)
{
return matchhere_c(regexp, text) ? 1 :
(*text == '\0') ? 0 : matchend_c(regexp, text+1);
}

constexpr int match_c(const char *regexp, const char *text)
{
return (regexp[0] == '^') ? matchhere_c(regexp+1, text) :
matchend_c(regexp, text);
}

/* matchhere: search for regexp at beginning of text */
constexpr int matchhere_c(const char *regexp, const char *text)
{
return (regexp[0] == '\0') ? 1 :
(regexp[1] == '*') ? matchstar_c(regexp[0], regexp+2, text) :
(regexp[0] == '$' && regexp[1] == '\0') ? (*text == '\0') :
(*text!='\0' && (regexp[0]=='.' || regexp[0]==*text)) ?
matchhere_c(regexp+1, text+1) : 0;
}

/* matchstar: search for c*regexp at beginning of text */
constexpr int matchstar_c(int c, const char * regexp, const char *text)
{
return matchhere_c(regexp, text) ? 1 :
(*text != '\0' && (*text == c || c == '.')) ?
matchstar_c(c, regexp, text+1) : 0;
}

#define TO_STR_IMPL(R) #R
#define TO_STR(R) TO_STR_IMPL(R)

int main(void)
{
static_assert(match_c(TO_STR(REGEX), TO_STR(TEXT)), "...");

return 0;
}


To compile it, as of today, you need g++ 4.6 or better. You've to pass REGEX and TEXT as #defines while compilation. For instance, -D REGEX=o$ -D TEXT=Foo It matches!

I used two macros TO_STR and To_STR_IMPL to convert the REGEX and TEXT into string literals. #R is basically using the preprocessor stringification technique. For some reason I need two separate TO_STR macros for TEXT substitution and stringification. Seems like the gcc preprocessor can't do those two things in a single macro.

Have fun!

Comments

Kev said…
Pardon me if I am being dumb here, but for this to be useful, you have to know the text to be matched at compile time as well right?

Perhaps I am missing something though.
Sumant said…
@Kev: That's right! You need to know both the strings (regex and the text) at compile-time. What good is it then?

Think of parsing XPath and SQL queries for syntactical correctness. An XML data-binding tool can check for constraints specified in xsd when XML object is created from string literals. For instace, checking a phone number has two dashes and 10 digits. You don't have to wait till you get a run-time exception. A compiler can find out a typo in a string literal!

Another example could be a library that may require (or not require) absolute path for a file and checks that at compile-time. I think it is possible to design ifstream constructor to cause compilation failure if an empty string literal is passed as a filename.
meekrosoft said…
This might be the best C++0x hack I've seen so far. Kudos.
xander345 said…
if you like c++ you can compile it online here: http://codecompiler.info/

32, 64 - windows & Linux - and more programming languages
Anonymous said…
for more information on c c++
tryWao wat a sueful information given
<a href="/examandlearning.in></a>
write a c++ program which will read a text and count all the occurence
of a particular word?
sarah said…
very informative post indeed .being enrolled in http://www.wiziq.com/course/5776-object-oriented-programming-with-c
i was looking for such articles online to assist me and your article helped me a lot. i really like that you are providing such information.
Anonymous said…
I have not tested your code, but to me it looks like it's violating the standard.
To quote the standard: "A constant-expression function cannot be called before it is
defined."
But eg. in matchend_c() you do exactly this: You call matchhere_c() before matchhere_c() is defined. You have just declared it at that point.
Sumant Tambe said…
Recursive constexpr functions are allowed. The standard even recommends a minimum depth compilers should support for recursive constexpr functions. It is 512 in C++11 public draft N3337 (Annex B).

Popular posts from this blog

Multi-dimensional arrays in C++11

What new can be said about multi-dimensional arrays in C++? As it turns out, quite a bit! With the advent of C++11, we get new standard library class std::array. We also get new language features, such as template aliases and variadic templates. So I'll talk about interesting ways in which they come together.

It all started with a simple question of how to define a multi-dimensional std::array. It is a great example of deceptively simple things. Are the following the two arrays identical except that one is native and the other one is std::array?

int native[3][4];
std::array<std::array<int, 3>, 4> arr;

No! They are not. In fact, arr is more like an int[4][3]. Note the difference in the array subscripts. The native array is an array of 3 elements where every element is itself an array of 4 integers. 3 rows and 4 columns. If you want a std::array with the same layout, what you really need is:

std::array<std::array<int, 4>, 3> arr;

That's quite annoying for two r…

Understanding Fold Expressions

C++17 has an interesting new feature called fold expressions. Fold expressions offer a compact syntax to apply a binary operation to the elements of a parameter pack. Here’s an example. template <typename... Args> auto addall(Args... args) { return (... + args); } addall(1,2,3,4,5); // returns 15. This particular example is a unary left fold. It's equivalent to ((((1+2)+3)+4)+5). It reduces/folds the parameter pack of integers into a single integer by applying the binary operator successively. It's unary because it does not explicitly specify an init (a.k.a. identity) argument. So, let add it. template <typename... Args> auto addall(Args... args) { return (0 + ... + args); } addall(1,2,3,4,5); // returns 15. This version of addall is a binary left fold. The init argument is 0 and it's redundant (in this case). That's because this fold expression is equivalent to (((((0+1)+2)+3)+4)+5). Explicit identity elements will come in handy a little la…

Folding Monadic Functions

In the previous two blog posts (Understanding Fold Expressions and Folding Functions) we looked at the basic usage of C++17 fold expressions and how simple functions can be folded to create a composite one. We’ll continue our stride and see how "embellished" functions may be composed in fold expressions.

First, let me define what I mean by embellished functions. Instead of just returning a simple value, these functions are going to return a generic container of the desired value. The choice of container is very broad but not arbitrary. There are some constraints on the container and once you select a generic container, all functions must return values of the same container. Let's begin with std::vector.
// Hide the allocator template argument of std::vector. // It causes problems and is irrelevant here. template <class T> struct Vector : std::vector<T> {}; struct Continent { }; struct Country { }; struct State { }; struct City { }; auto get_countries…