Skip to main content

Fun with Lambdas: C++14 Style (part 4)

This is part 4 in the series of Fun with Lambdas: C++14 Style. The previous posts are part 3, part 2, and part 1.

C++14 has a number of features that support functional-style design. By "functional-style" I mean heavy use of higher-order functions (functions that take other functions as arguments). Quite often arguments to the higher-order functions are lambdas (closures, to be precise). With automatic return type deduction for normal functions, writing higher-order function becomes very easy and seamless in C++14.

This time, I have chosen a "text-book" example to show you the power of C++14: Composable Data Generators

What is a Generator?

A Generator<T> produces values of type T randomly. There is already a random number generator defined in the C library: random(). It produces long ints.

We can use this basic generator to create higher-level generators, such as bool, character, floating point numbers, etc. Even random sequence and structure generators are possible.

But first, lets add some structure around the C library function so that we can compose generators.

#include <cstdlib>

struct RootRandomGen
{
  long int operator () () const 
  {
    return random();
  }
};

RootRandomGen is a very simple function-object that when called produces a random number between 0 and RAND_MAX.

Let's create a Generator template from which we can create other generators.
template <class T, class GenFunc>
class Gen 
{
    GenFunc genfunc;

  public:
    explicit Gen(GenFunc func) 
      : genfunc(std::move(func)) 
    { } 
    
    T generate() 
    {   
      return genfunc();
    }   
};

The Gen class template allows us to pass any function-object or closure and a make a "generator" out of it. Of course, the function must not take any arguments and must produce a value.

To simplify creation of Generators from just lambdas, we create a helper factory function. This is where the power of C++14 starts becoming apparent.
template <class GenFunc>
auto make_gen_from(GenFunc&& func)
{
  return Gen<decltype(func()), GenFunc>(std::forward<GenFunc>(func));
}

make_gen_from is a higher-order function that takes a closure as an argument and creates a Gen<T> object. GenFunc is the type of the closure. The type T is deduced using decltype(func()), which is C++14 syntax to say whatever the type of the return value of func is. Rest of it is perfect-forwarding of the func argument to the Gen<T> object.

To create many more generators, such as for bool, char, string, etc, a function like make_gen<T> might be quite useful. So, let's add one.
template <class T>
auto make_gen();

template <>  
auto make_gen<long int>()
{
  return make_gen_from(RootRandomGen()); 
  //return make_gen_from([]() { return random(); }); 
}

The long int generator simply uses the "Root" generator. Alternatively, RootRandomGen can be defined in-place using a lambda as shown above. I.e., RootRandomGen is superfluous.

Let's test what we've so far.

void init_random() 
{
  time_t t;
  time(&t);
  srandom(t);
}

int main(void)
{
  init_random();
  auto gen = make_gen<long int>();
  std::cout << gen.generate(); // expect a random value.
}

We can create many more generators by explicitly specializing make_gen for a number of types. But before we do that let's observe the core properties of Gen<T>.

The Generator<T> Functor

In functional programming literature, Gen<T> is a functor, which means you can "map over it". I.e., you can write a function named map that takes a generator and a function and returns another generator that applies the function to the values generated by the argument generator. It's much easier to look at code.
template <class Gen, class Func>
auto map (Gen gt, Func func)
{
  return make_gen_from([gt, func]() { 
                          return func(gt.generate()); 
                      });
}

First, the lambda captures gt and func by value. When called, it first generates a value from gt and passes it to the function and simply returns the value produced by the function. We've already seen that make_gen_from converts any lambda (with right signature) to a generator. So we now have a very general-purpose facility to create arbitrarily many generators simply by passing functions to map.

Let's look at an example.
int main(void)
{
  init_random();
  auto gen = make_gen<long int>();
  auto boolgen = map(gen, [](long int i) { return bool(i % 2); });
  std::cout << std::boolalpha << boolgen.generate(); // expect a random boolean.
}

The only problem, however, is that it does not work.

The problem is that Gen<T> is designed to support stateful generators that might mutate state between two successive calls to generate. That's why the generate function is not const. But the lambda in the map function is by default const. Therefore, gt is also const, which prevents us from calling gt.generate() as Gen<T>::generate() is a non-const function.

The solution is to make the lambda in map function mutable. With that, the program compiles but there are more things that can be improved about map.

First, gt and func arguments are passed by value and the lambda captures them by value. That may be potentially quite wasteful. We can improve efficiency by using perfect forwarding. Adding perfect forwarding, however, adds a lot of noise to the otherwise simple map function. This noise has become my pet peeve regarding functional-style programming in C++14.
template <class Gen, class Func>
auto map (Gen&& gt, Func&& func)
{
  return make_gen_from([gt=std::forward<Gen>(gt), 
                        func=std::forward<Func>(func)]() mutable { 
                          return func(gt.generate()); 
                      });
}

I think this map function is a well-behaved citizen of the C++14 world. It's using the generalized lambda capture syntax and perfect-forwarding in combination.

Using this map function is slightly awkward because it's a free function. To support more fluent style of API, I would like to "upgrade" the map function to the Gen<T> class. As I said before, every generator supports mapping. So here's the new Get<T> template.
template <class T, class GenFunc>
class Gen 
{
    GenFunc genfunc;

  public:
    explicit Gen(GenFunc func) 
      : genfunc(std::move(func)) 
    { } 
    
    T generate() 
    {   
      return genfunc();
    }  
 
    template <class Func>
    auto map (Func&& func)
    {
      return make_gen_from([gt=*this, 
                            func=std::forward<Func>(func)]() mutable { 
                              return func(gt.generate()); 
                          });
    }
};

Note that map makes a full copy of this in the lambda so that every generator becomes self-sufficient.

We can create a number of other generators using the built-in map function. For instance, an consider Gen<int> below.
template <>  
auto make_gen<int>()
{
  return make_gen<long int>().map([](long int i) { return static_cast<int>(i); });
}

A range generator that produces a random value in the specified range may be created as follows. Like in the iterator semantics, hi is one past the desirable range.
template <class Integer>
auto make_range_gen(Integer lo, Integer hi) 
{
  return make_gen<long int>().map( 
          [lo, hi](long int x) { return static_cast<Integer>(lo + x % (hi - lo)); });
}

Using the range generator, a generator for uppercase characters is quite simple.
auto uppercase_gen = make_range_gen('A', 'Z'+1);
std::cout << uppercase_gen.generate(); // expect a random uppercase character.

Combinators

Many more helper functions can be added to the Gen<T> class that produce new generators from argument generators. In functional literature they are called combinators.

Here's the zip2 combinator: Zip works just like a zipper. It takes 2 generators and produces another generator that combines the values generated by the argument generators. To combine the values, it needs a function that accepts two arguments and return a value. The user must provide the function.

template <class T, class GenFunc>
class Gen 
{
    // ....

    template <class UGen, class Zipper2>
    auto zip2(UGen&& ugen, Zipper2&& func)
    {
      return this->map(
                [ugen=std::forward<UGen>(ugen),
                 func=std::forward<Zipper2>(func)](auto&& t) mutable {
                    return func(std::forward<decltype(t)>(t), ugen.generate());
                });
    }
};

auto uppergen = make_range_gen<char>('A', 'Z'+1);
auto lowergen = make_range_gen<char>('a', 'z'+1);
auto pairgen  = 
       uppergen.zip2(lowergen, 
                     [](char up, char low) { return std::make_pair(up, low); });

The example above shows how a pair of random characters can be produced by zipping an uppercase generator with a lowercase generator. The zipper function simply constructs the pair from two characters. Alternatively, &std::make_pair<char, char> would have been sufficient.

The zip2 function looks significantly more verbose than a comparable implementation in most other languages that support lambdas. A lot of code is devoted to perfect-forwarding of arguments, which is quite necessary for highly composable libraries such as this one. We'll see later that C++ compilers are smart enough to inline the call-chain completely.

Another example of zip is string generator. A string generator zips a bool generator and int generator where the bool value indicates whether string is empty or not and int generator determines the length of the string. Of course, string generator also needs a char generator to populate the string. Here's one way of doing it.
template <>
auto make_gen<std::string>()
{
  auto char_gen = make_range_gen(32, 127); // printable characters.
  auto length_gen = make_range_gen(1, 256);

  return make_gen<bool>().zip2(
                      length_gen,
                      [char_gen](bool empty, int length) mutable {
                        std::string str;
                        if(!empty)
                        {
                          str.reserve(length);
                          for(int i = 0; i < length; ++i)
                            str.push_back(char_gen.generate());
                        }
                        return str;
                      });
}

There are many more combinators. The single generator would always produce the same value. The oneOf generator selects one of the elements from a given array non-deterministically. Finally, the amb combinator will use of the two input combinators to produce value. Here's a couple of them.
template <class T>
auto make_single_gen(T&& t)
{
    return make_gen_from([t=std::forward<T>(t)]() { return t; });
}

template <class T>
auto make_oneof_gen(std::initializer_list<T> list)
{
    return make_range_gen(0ul, list.size()).map([list](int idx) { return *(list.begin()+idx); }); 
}

Stateful Generators

The examples we've seen so far are stateless generators. I.e., between two successive calls to generate, no state is updated. Let's look at a stateful generator: fibonacciGen. This generator must maintain at least two integers (a and b) for its computation.
auto fiboGen()
{
  int a = 0;
  int b = 1;
  return make_gen_from([a, b]() mutable {
                          int c = a;
                          a = b;
                          b = c+b;
                          return c;
                       });
}

The Cost of Functional Design

It is quite interesting how complex generators can be created from simple generators. But is there a cost to this high level of abstraction? Is the code as fast as it can be?

Here are two different algorithmically identical implementations of bool generator. The reason I chose this algorithm because I wanted make use of zip2, which in turn uses map. I wanted to include multiple levels of indirection.
extern "C" bool random_bool1()
{
  return (random()-random()) > 0;
}

extern "C" bool random_bool2()
{
  auto boolgen = 
    make_gen<long int>()
           .zip2(make_gen<long int>(),
                 [](long int i, long int j) { return (i-j) > 0; });

  return boolgen.generate();
}

The screenshot below shows the compiler's assembly output for both the functions. The amazing fact is that it is exactly identical! The compiler is able to see through the layers and layers of indirections (invocations of lambdas) and is able to produce optimal code for the random_bool functions. That's quite a remarkable feat achieved by g++ 5.1 in this case. Perhaps it is the same with other major C++ compilers.

Generator size

The performance story does not end here though. Note that producing a random boolean does not need any state. I.e., it is just a function. However, RootRandomGen take one byte because it's a class. Every object in C++ must have a unique identity. To ensure that's the case, C++ compiler gives minimal possible size to each object. As we compose higher-level generators from smaller generators, we are clearly creating objects, which have non-zero sizes. But how much memory do they need exactly? What is the size of boolgen in random_bool2?

The size of boolgen is 3 bytes on my machine. The reason for the state is lambda captures. Both map and zip combinators use lambdas with one or more captures. As higher-level generators are built from lower level generators, the state adds up. The problem is that in most generators we've seen so far, there is no real reason to maintain state between two successive calls to the generate function. I.e, the next value is completely unrelated to the previous values. In fact, as we saw before, the compiler did not refer to any state in the implementation of random_bool2. Of course, for truly stateful generators such as the the fibonacci generator, maintaining state from the prior computation is necessary.

The build-up of unnecessary state is quite fast though. For instance, the size of the string generator is whopping 28 bytes! The compiler maintains 28 bytes of state and does not serve any obvious purpose to the user! A generator of printable strings implemented as a simple function would require no persistent state at all. As the size of the generators get larger and larger, pretty soon they won't fit in the cache line and will start to degrade performance, especially if truly stateful generators are mixed with only accidently stateful generators. I hope compiler writers will figure something out about this problem.

This concludes the part 4 in the series of Fun with Lambdas: C++14 Style. I hope you enjoyed it. See Live Example.

Comments

foonathan said…
Regarding the size: You could use the Empty Base Optimization (EBO) by inheriting from the template argument when it is an emtpy class type as it is the case for non stateful allocators.
Then it can be optimized out.
Sumant said…
@foonathan I tried that for the Gen<T> template. It did not work as expected. The issue I believe is that each closure that captures something has non-zero size and the captured values are members of the closure accessible only from the operator(). In such cases, the behavior is similar to any regular class and EBO does not kick in.
Anonymous said…
Is it just me or
  return make_gen().map([](long int i) { return static_cast(i); });

is really crappy RNG where size of long is 8? I mean afaik all implementations will just drop top 4 bytes, but it seems bad to rely on that.
Paul said…
Great post. After Learning Java 8 lambda expression, reading this on C++ is just a different view.
Milan Otawa said…
Certainly with your thoughts here and that i love your blog! I’ve bookmarked it making sure that I can come back & read more in the foreseeable future.
Unknown said…
Hey, thanks for this insightful article.
I'm wondering how much of this will work in C++11. I haven't really looked at C++14 a whole lot yet and most of the code feels familiar. Have you thought about this?
Certainly with your thoughts here and that i love your blog! I’ve bookmarked it making sure that I can come back & read more in the foreseeable future.
Peterdell said…
Great article !!!


I know lambda only from java or haskell but nice to see that you can have some fun in c++ too :D
I wanted to create you the tiny note to say thank you over again with the striking solutions you’ve featured on this website. It’s certainly remarkably open-handed with you to grant publicly exactly what a lot of people could possibly have offered for sale for an electronic book to help make some profit on their own, even more so given that you might well have tried it if you considered necessary. Those pointers likewise served to become good way to be aware that many people have a similar zeal much like my own to learn a great deal more on the topic of this condition. Certainly there are numerous more pleasurable times in the future for individuals who discover your blog post.
gunmayhem said…
I wanted to create you the tiny note to say thank you over again with the striking solutions you’ve featured on this website. It’s certainly remarkably open-handed with you to grant publicly exactly what a lot of people could possibly have offered for sale for an electronic book to help make some profit on their own, even more so given that you might well have tried it if you considered necessary. Those pointers likewise served to become good way to be aware that many people have a similar zeal much like my own to learn a great deal more on the topic of this condition. Certainly there are numerous more pleasurable times in the future for individuals who discover your blog post.
Unknown said…
I just see the post i am so happy the post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be subscribing to your feed and I hope you post again soon.

seo training in chennai
Yang Kuo said…
is really crappy RNG where size of long is 8? I mean afaik all implementations will just drop top 4 bytes, but it seems bad to rely on that.

www.golden-slot.com
gclub online
Anonymous said…
Thanks admin for sharing the unique content, you have done a great job
golden slot mobile
gclub

Popular Content

Unit Testing C++ Templates and Mock Injection Using Traits

Unit testing your template code comes up from time to time. (You test your templates, right?) Some templates are easy to test. No others. Sometimes it's not clear how to about injecting mock code into the template code that's under test. I've seen several reasons why code injection becomes challenging. Here I've outlined some examples below with roughly increasing code injection difficulty. Template accepts a type argument and an object of the same type by reference in constructor Template accepts a type argument. Makes a copy of the constructor argument or simply does not take one Template accepts a type argument and instantiates multiple interrelated templates without virtual functions Lets start with the easy ones. Template accepts a type argument and an object of the same type by reference in constructor This one appears straight-forward because the unit test simply instantiates the template under test with a mock type. Some assertion might be tested in

Covariance and Contravariance in C++ Standard Library

Covariance and Contravariance are concepts that come up often as you go deeper into generic programming. While designing a language that supports parametric polymorphism (e.g., templates in C++, generics in Java, C#), the language designer has a choice between Invariance, Covariance, and Contravariance when dealing with generic types. C++'s choice is "invariance". Let's look at an example. struct Vehicle {}; struct Car : Vehicle {}; std::vector<Vehicle *> vehicles; std::vector<Car *> cars; vehicles = cars; // Does not compile The above program does not compile because C++ templates are invariant. Of course, each time a C++ template is instantiated, the compiler creates a brand new type that uniquely represents that instantiation. Any other type to the same template creates another unique type that has nothing to do with the earlier one. Any two unrelated user-defined types in C++ can't be assigned to each-other by default. You have to provide a

Multi-dimensional arrays in C++11

What new can be said about multi-dimensional arrays in C++? As it turns out, quite a bit! With the advent of C++11, we get new standard library class std::array. We also get new language features, such as template aliases and variadic templates. So I'll talk about interesting ways in which they come together. It all started with a simple question of how to define a multi-dimensional std::array. It is a great example of deceptively simple things. Are the following the two arrays identical except that one is native and the other one is std::array? int native[3][4]; std::array<std::array<int, 3>, 4> arr; No! They are not. In fact, arr is more like an int[4][3]. Note the difference in the array subscripts. The native array is an array of 3 elements where every element is itself an array of 4 integers. 3 rows and 4 columns. If you want a std::array with the same layout, what you really need is: std::array<std::array<int, 4>, 3> arr; That's quite annoying for