Skip to main content

Compile-time regex matcher using constexpr

With my growing constexpr fascination, I thought of using it for something that would be really hard using template meta-programs. How about implementing a compile-time regular expression matcher? Fortunately, a simple regular expression matcher has already been written by Rob Pike. I just rewrote it using constexpr: single return statement in functions, no modifications to the parameters, abundant ternery operators, and recursion. Here we go...

constexpr int match_c(const char *regexp, const char *text);
constexpr int matchhere_c(const char *regexp, const char *text);
constexpr int matchstar_c(int c, const char *regexp, const char *text);
constexpr int matchend_c(const char * regexp, const char * text);

constexpr int matchend_c(const char * regexp, const char * text)
{
return matchhere_c(regexp, text) ? 1 :
(*text == '\0') ? 0 : matchend_c(regexp, text+1);
}

constexpr int match_c(const char *regexp, const char *text)
{
return (regexp[0] == '^') ? matchhere_c(regexp+1, text) :
matchend_c(regexp, text);
}

/* matchhere: search for regexp at beginning of text */
constexpr int matchhere_c(const char *regexp, const char *text)
{
return (regexp[0] == '\0') ? 1 :
(regexp[1] == '*') ? matchstar_c(regexp[0], regexp+2, text) :
(regexp[0] == '$' && regexp[1] == '\0') ? (*text == '\0') :
(*text!='\0' && (regexp[0]=='.' || regexp[0]==*text)) ?
matchhere_c(regexp+1, text+1) : 0;
}

/* matchstar: search for c*regexp at beginning of text */
constexpr int matchstar_c(int c, const char * regexp, const char *text)
{
return matchhere_c(regexp, text) ? 1 :
(*text != '\0' && (*text == c || c == '.')) ?
matchstar_c(c, regexp, text+1) : 0;
}

#define TO_STR_IMPL(R) #R
#define TO_STR(R) TO_STR_IMPL(R)

int main(void)
{
static_assert(match_c(TO_STR(REGEX), TO_STR(TEXT)), "...");

return 0;
}


To compile it, as of today, you need g++ 4.6 or better. You've to pass REGEX and TEXT as #defines while compilation. For instance, -D REGEX=o$ -D TEXT=Foo It matches!

I used two macros TO_STR and To_STR_IMPL to convert the REGEX and TEXT into string literals. #R is basically using the preprocessor stringification technique. For some reason I need two separate TO_STR macros for TEXT substitution and stringification. Seems like the gcc preprocessor can't do those two things in a single macro.

Have fun!

Comments

Kev said…
Pardon me if I am being dumb here, but for this to be useful, you have to know the text to be matched at compile time as well right?

Perhaps I am missing something though.
Sumant said…
@Kev: That's right! You need to know both the strings (regex and the text) at compile-time. What good is it then?

Think of parsing XPath and SQL queries for syntactical correctness. An XML data-binding tool can check for constraints specified in xsd when XML object is created from string literals. For instace, checking a phone number has two dashes and 10 digits. You don't have to wait till you get a run-time exception. A compiler can find out a typo in a string literal!

Another example could be a library that may require (or not require) absolute path for a file and checks that at compile-time. I think it is possible to design ifstream constructor to cause compilation failure if an empty string literal is passed as a filename.
Anonymous said…
This might be the best C++0x hack I've seen so far. Kudos.
Anonymous said…
for more information on c c++
tryWao wat a sueful information given
<a href="/examandlearning.in></a>
Anonymous said…
write a c++ program which will read a text and count all the occurence
of a particular word?
Anonymous said…
I have not tested your code, but to me it looks like it's violating the standard.
To quote the standard: "A constant-expression function cannot be called before it is
defined."
But eg. in matchend_c() you do exactly this: You call matchhere_c() before matchhere_c() is defined. You have just declared it at that point.
Sumant said…
Recursive constexpr functions are allowed. The standard even recommends a minimum depth compilers should support for recursive constexpr functions. It is 512 in C++11 public draft N3337 (Annex B).

Popular Content

Unit Testing C++ Templates and Mock Injection Using Traits

Unit testing your template code comes up from time to time. (You test your templates, right?) Some templates are easy to test. No others. Sometimes it's not clear how to about injecting mock code into the template code that's under test. I've seen several reasons why code injection becomes challenging. Here I've outlined some examples below with roughly increasing code injection difficulty. Template accepts a type argument and an object of the same type by reference in constructor Template accepts a type argument. Makes a copy of the constructor argument or simply does not take one Template accepts a type argument and instantiates multiple interrelated templates without virtual functions Lets start with the easy ones. Template accepts a type argument and an object of the same type by reference in constructor This one appears straight-forward because the unit test simply instantiates the template under test with a mock type. Some assertion might be tested in

Multi-dimensional arrays in C++11

What new can be said about multi-dimensional arrays in C++? As it turns out, quite a bit! With the advent of C++11, we get new standard library class std::array. We also get new language features, such as template aliases and variadic templates. So I'll talk about interesting ways in which they come together. It all started with a simple question of how to define a multi-dimensional std::array. It is a great example of deceptively simple things. Are the following the two arrays identical except that one is native and the other one is std::array? int native[3][4]; std::array<std::array<int, 3>, 4> arr; No! They are not. In fact, arr is more like an int[4][3]. Note the difference in the array subscripts. The native array is an array of 3 elements where every element is itself an array of 4 integers. 3 rows and 4 columns. If you want a std::array with the same layout, what you really need is: std::array<std::array<int, 4>, 3> arr; That's quite annoying for

Covariance and Contravariance in C++ Standard Library

Covariance and Contravariance are concepts that come up often as you go deeper into generic programming. While designing a language that supports parametric polymorphism (e.g., templates in C++, generics in Java, C#), the language designer has a choice between Invariance, Covariance, and Contravariance when dealing with generic types. C++'s choice is "invariance". Let's look at an example. struct Vehicle {}; struct Car : Vehicle {}; std::vector<Vehicle *> vehicles; std::vector<Car *> cars; vehicles = cars; // Does not compile The above program does not compile because C++ templates are invariant. Of course, each time a C++ template is instantiated, the compiler creates a brand new type that uniquely represents that instantiation. Any other type to the same template creates another unique type that has nothing to do with the earlier one. Any two unrelated user-defined types in C++ can't be assigned to each-other by default. You have to provide a