Monday, March 24, 2014

Why we need compile-time reflection in C++1y

Programs need data. That's a no brainer. Programs are only as good as the data you provide them. Based on what kind of data is consumed, programs can be divided into two broad categories: (1) those that operate on regular data (a file), and (2) those that operate on other programs. The first kind of programs are abundant. Your browser, for instance, is showing you this page--its data. The second kind of programs are more interesting and they are called meta-programs.

Meta-programs need data too. As with the other programs, meta-programs are only as good as the data you provide them. So what do we feed them? ... Well, In C++, more important than 'what' is 'when'. (remember Morpheus?) A C++ program is just a sequence of bits the compiler is trying to understand. So, while the compiler is trying to make sense of your program, most of it gets translated (to assembly) but some of it gets executed. Quite intriguing! We're talking about compile-time meta-programming.

Coming back to the 'what'. We want to be able to feed whatever is available at compile-time: types, members, functions, arguments, namespaces, line numbers, file names, all are a fair game. Less obvious things are relationships among types: convertibility, parent/child, base/derived, container/iterator, friends, and more.

A C++ compiler already has this information but it is not in a form a meta-program can use. So we're in a soup, where we can run programs (at compile-time) but there is no data! So the next question is 'how' do we make the data available to our meta-programs? And that brings me to what I like to call the Curiously Recurring Template Meta-Programming  (CRTMP) pattern.

Curiously Recurring Template Meta-Programming Pattern

The idea is rather general and many have done it successfully before: Make data available to meta-programs without offending the compiler and do something interesting with it.

Let's look at who are the subjects (players) in this pattern. (1) the compiler, (2) the meta-program, and last but not the least is (3) the programmer itself because machines haven't taken over yet and humans still write most of the programs as of today.

The compile-time data must make sense to all three above. Today, C++ programmers, because we don't mind pain, create that data in a form that is understood by the former two. The prime examples are the traits idiom, the type_traits library, and sometimes code generators that parse C++ files and spit out relationships between classes.  For example, LEESA's gen-meta.py script generates typelists (Boost MPL vectors) for classes that contain other classes (think XML data-binding). Effectively it builds a compile-time tree of the XML node types.

When things are not auto generated, we make it palatable to the fellow programmers using macros. To many, macros are as obnoxious as the data they hide/generate but lets move on. There are many examples of super-charged too: Boost SIMD, pre-variadic Boost MPL, smart enumerations, and many more. When macros are used in a clever way (abused!) they really do look like magic. I got a first-hand experience of that while developing the RefleX library.

RefleX is a compile-time reflection-based type modeling in C++ for DDS Topics. It is open-source but you need the RTI Connext DDS to play with it. It essentially transforms your native C/C++ type into a serializable type representation called a TypeObject and marshals your data in what is called a DynamicData object. Note that both, type and data are serialized. There are systems--perhaps many we owe our modern life to--that need to distribute types and data over the network for discovery, interoperability, compatibility, and for other reasons.

Here's an example:


The RTI_ADAPT_STRUCT macro expands to about 120 lines of C++ code, which is primarily reflection information about ShapeType and it can be used at compile-time. It is based on the BOOST_FUSION_ADAPT_STRUCT macro. The macro opens the guts of the specified type to the RefleX library. The meta-programs in RefleX use this "data" to do their business. The reflection information includes member types, member names, enumerations, and other ornaments such as a "key". The point is that the same CRTMP pattern is used to "export" information about a native C++ type.

So, the last two open-source C++ libraries I wrote use the CRTMP pattern: In one, "data" is generated using a Python script and in the other using a macro. CRTMP makes C++ libraries remarkably powerful. The reality is there is nothing novel about it. It is seen everywhere.

The natural step in evolution of a idiom/pattern is first-class language support.  If something is so prevalent, the language itself should absorb it eliminate the crud involved developing and writing CRTMP-based libraries.

That brings us to the main point of this post: Compile-time Reflection. We need it. Period. It's a natural step of evolution from where C++ is now. When available, it will make vast amount of compile-time data available to C++ meta-programs. They will run faster, look nicer, and they will knock your socks off! It is mind boggling what has been achieved using template and preprocessor meta-programming. Compile-time reflection will push it two notches up. So stay tuned for C++1y.

Wednesday, March 12, 2014

Fun with Lambdas: C++14 Style (part 1)

It's common knowledge that Functional Programming is spreading like a wildfire in mainstream languages. Latest promoted languages: Java 8 and C++, both of which now support lambdas. So, let the lambdas begin! and may the fun be ever on your side. The same text is available in slides form on Slideshare. This blog post and the talk/slides are inspired by JSON inventor Douglas Crockford.

Write an Identity function that takes an argument and returns the same argument.
auto Identity = [](auto x) {
  return x;
};
Identity(3); // 3
Write 3 functions add, sub, and mul that take 2 parameters each and return their sum, difference, and product respectively.
auto add = [](auto x, auto y) {
  return x + y;
}; 
auto sub = [](auto x, auto y) {
  return x - y;
};
int mul (int x, int y) {
  return x * y;
};

Write a function, identityf, that takes an argument and returns an inner class object that returns that argument.
auto identityf = [](auto x) {
  class Inner {
    int x;
    public: Inner(int i): x(i) {}
    int operator() () { return x; }
  };
  return Inner(x);
};
identityf(5)(); // 5

Write a function, identityf, that takes an argument and returns a function that returns that argument.
auto identityf = [](auto x) {
  return [=]() { return x; };
};
identityf(5)(); // 5

Lambda != Closure
  • A lambda is just an anonymous function.
  • A closure is a function which closes over the environment in which it was defined. The line #2 above defines a closure that closes over x.
  • Not all closures are lambdas and not all lambdas are closures. 
  • Closures are just function objects in C++ 
  • C++ closures do not extend the lifetime of their context. (If you need this use shared_ptr)
Write a function that produces a function that returns values in a range.
auto fromto = [](auto start, auto finish) {    
  return [=]() mutable {      
    if(start < finish)        
      return start++;      
    else        
      throw std::runtime_error(“Complete");    
  };  
};
auto range = fromto(0, 10); 
range(); // 0
range(); // 1
Write a function that adds from two invocations.
auto addf = [](auto x) {
  return [=](auto y) { 
    return x+y; 
  };
};
addf(5)(4); // 9
Write a function swap that swaps the arguments of a binary function.
auto swap =[](auto binary) {
  return [=](auto x, auto y) {
    return binary(y, x);
  };
};
swap(sub)(3, 2); // -1
Write a function twice that takes a binary function and returns a unary function that passes its argument to the binary function twice.
auto twice =[](auto binary) {
  return [=](auto x) {
    return binary(x, x);
  };
};
twice(add)(11); // 22
Write a function that takes a binary function and makes it callable with two invocations.
auto applyf = [](auto binary) {
  return [=](auto x) { 
    return [=](auto y) {
      return binary(x, y); 
    };
  };
};
applyf(mul)(3)(4); // 12
Write a function that takes a function and an argument and returns a function that takes the second argument and applies the function.
auto curry = [](auto binary, auto x) {
  return [=](auto y) { 
    return binary(x, y);
  };
};
curry(mul, 3)(4); // 12
Currying (schönfinkeling)
  • Currying is the technique of transforming a function that takes multiple arguments in such a way that it can be called as a chain of functions, each with a single argument.
  • In lambda calculus functions take a single argument only.
  • Must know Currying to understand Haskell.
  • Currying != Partial function application
Partial Function Application
auto addFour = [](auto a, auto b, 
                  auto c, auto d) {
  return a+b+c+d;
};
auto partial = [](auto func, auto a, auto b) {
  return [=](auto c, auto d) { 
    return func(a, b, c, d);
  };
};
partial(addFour,1,2)(3,4); //10
Without creating a new function show 3 ways to create the inc function.
auto inc = curry(add, 1);
auto inc = addf(1);
auto inc = applyf(add)(1);
Write a function composeu that takes two unary functions and returns a unary function that calls them both.
auto composeu =[](auto f1, auto f2) {
  return [=](auto x) {
    return f2(f1(x));
  };
};
composeu(inc1, curry(mul, 5))(3) // 20
Write a function that returns a function that allows a binary function to be called exactly once.
auto once = [](auto binary) {    
  bool done = false;    
  return [=](auto x, auto y) mutable {
    if(!done) {        
      done = true;        
      return binary(x, y);      
    }      
    else        
      throw std::runtime_error("once!");     
  };  
};
once(add)(3,4); // 7
Write a function that takes a binary function and returns a function that takes two arguments and a callback.
auto binaryc = [](auto binary) {    
  return [=](auto x, auto y, auto callbk) {
   return callbk(binary(x,y));    
  };  
};
binaryc(mul)(5, 6, inc) // 31
binaryc(mul)(5,6,[](int a) { return a+1; });
Write 3 functions:
  • unit – same as Identityf
  • stringify – that stringifies its argument and applies unit to it
  • bind – that takes a result of unit and returns a function that takes a callback and returns the result of callback applied to the result of unit.
auto unit = [](auto x) { 
  return [=]() { return x; };
};
auto stringify = [](auto x) {    
  std::stringstream ss;
  ss << x;
  return unit(ss.str());
};

auto bind = [](auto u) {    
  return [=](auto callback) {
   return callback(u());    
  };  
}; 
Then verify.
std::cout << "Left Identity "  
          << stringify(15)() 
          << "==" 
          << bind(unit(15))(stringify)()
          << std::endl;

std::cout << "Right Identity " 
          << stringify(5)() 
          << "=="
          << bind(stringify(5))(unit)()
          << std::endl;
Why are unit and bind special? Read more about them here.

Sunday, March 09, 2014

Fun with Lambdas: C++14 Style

I am presenting at the SF Bay Area Association of C/C++ Users (ACCU) meetup on Wed, Mar 12th. Topic: Fun with Lambdas: C++14 Style. Slides and the blog will be available here so stay tuned.


Wednesday, October 16, 2013

Creating Recursive Lambdas ... and returning them too!

Ever since C++ adopted lambda expressions, many have stumbled upon the question whether they can be recursive. IMO, if an anonymous function needs to call itself why not create a named function/functor in the first place? But I'm not here today to argue whether it is a good or bad design. So here it is: the most common way to create a recursive lambda in C++11.
void test() 
{
  std::function<int(int)> fib = [&fib](int n)
  {
    return (n <= 2)? 1 : fib(n-1) + fib(n-2);
  };
}
The way it works is quite interesting. First, I created a std::function object, fib, and captured that in the lambda by reference. When the lambda captures fib, it is uninitialized because nothing has been assigned to it yet. The compiler knows the type of fib so it does not complain. It happily creates a closure with an uninitialized std::function. Right after that, it assigns the closure to the fib object so it gets initialized. Therefore, the reference inside lambda also works automatically. The lambda simply uses the captured std::function reference to call itself.

A problem with this approach is that the recursive lambda can not be returned. Why? When the function ends, so does the fib object and consequently, the reference inside the closure becomes invalid. Capture by value is also futile because in that case the closure object will end up getting a copy of the uninitialized std::function object that upon invocation throws std::bad_function_call exception.

Is there a way out? As it turns out, there is!

UPDATE: Within hours of uploading the original post, I came to understand a sleeker way to create and return recursive lambdas. Thanks to the generous people out there on reddit/r/cpp.
std::function<int(int)> create()
{
  int foo = 20;
  std::function<int(int)> f = 
    [=](int n) mutable {
         std::function<int(int)> recurse;
         recurse = [&](int n) { 
            foo = 10;
            return (n<=2)? 1 : recurse(n-1) + recurse(n-2); 
         };  
         return recurse(n);
       };  
  std::cout << f(6) << ", foo=" << foo << "\n"; // prints "8, foo=20"
  return f;
}
The technique involves creating nested lambdas. You don't have to return the inner recursive lambda, which becomes somewhat clunky as described later in this post. Instead, you return the outer lambda, which when invoked creates a recursive lambda using the same technique described at the beginning. It also invokes it and returns the result because the inner lambda does not live for too long.

Capturing the right state at the right place may become important if your recursive lambda is stateful. The outer lambda (copied in f) captures the state by value. I.e., it captures foo by value. The inner lambda captures its state by reference. However, the foo it refers is not the local foo variable in function create. Instead, it refers to the state captured by the outer lambda. So when you modify foo inside the inner lambda, it modifies the copy inside outer and the local foo remains unchanged. Further, to allow modification, the outer lambda must be mutable.

This behavior does not appear surprising at all when you consider the "context" in which the inner lambda runs. The create function might have returned long before the inner lambda ever got a chance to execute. The compiler is smart to figure that out and so the context of the capture for the inner lambda is limited to the state captured by the outer lambda.

foo is referenced inside the inner lambda only. But syntactically it lives inside the outer lambda too. And therefore, it must be captured by the outer lambda. This allows clear separation of who manages the state and who manages recursion.

Of course, there are many other combinations of capturing state for the inner and outer lambdas. Those are left to the reader as an excercise!

Finally, returning the std::function object will make a copy of the outer lambda and in turn the state captured by it. In most cases this is desirable. But if it is not, you could create a shared_ptr to the std::function object and use that to pass the stateful recursive lambda around.

std::shared_ptr<std::function<int(int)>> is not callable, however. I mean, you have to dereference it before you can call it like a function. So it cannot be passed to STL algorithms. A wrapper may be what you need to take care of that.
template <class T>
class func_wrapper // version 3
{
  std::shared_ptr<std::function<T>> func;

public:

  func_wrapper(std::function<T> f)
    : func(std::make_shared<std::function<T>>(std::move(f)))
  { }

  template <class... Args>
  auto operator () (Args&&... args) const 
    -> decltype((*func)(std::forward<Args>(args)...)) 
  {
    return (*func)(std::forward<Args>(args)...);
  }
};
Continue reading if you are interested in avoiding nested lambdas.

WARNING: Hacks ahead!
The idea is something like this: If the std::function is captured by value, we get two copies. One is uninitialized and the other is initialized. If somehow, during the initialization of the std::function object, we could go back and modify the copy inside the closure, may be we will get lucky.

So here is how I've to rewrite the above recursive lambda. Because I also intend to return it, I captured the variables by-value.
const func_wrapper<int(int)> create()
{
  func_wrapper<int(int)> fib;
  fib = [fib](int n)
  {
    return (n <= 2)? 1 : fib(n-1) + fib(n-2);
  };
  return fib;
}

The func_wrapper class does a bunch of interesting things and most of my discussion is going to center around it. Note, however, that I've separated the default initialized fib object from the assignment. It is very important to do it that way.

The reason behind separating the default construction from assignment is that copy-constructor and copy-assignment operators are called separately for the fib object. First, the fib object is default-initialized so we have something "decent" to work with. When the fib object is captured by-value, its copy-constructor is called as part of the usual capture semantics. The copy constructor is the only chance we get to "interact" with the func_wrapper inside the closure object. Later on, copy-assignment operator is called on fib with closure object as the right hand side.

These two functions give us just the right opportunity to setup some references inside func_wrapper to change the state of the captured func_wrapper. So lets look at the func_wrapper class without further ado.
template <class T>
class func_wrapper
{
  std::shared_ptr<std::function<T>> func;
  func_wrapper *captured;

public:

  func_wrapper() 
    : captured(nullptr)
  {}

  func_wrapper(const func_wrapper &) = default;
  func_wrapper(func_wrapper &&) = default;

  // non-const copy-ctor
  func_wrapper(func_wrapper & f)
    : func(f.func),
      captured(nullptr)
  {
    f.captured = this;
  }

  template <class U> 
  const func_wrapper & operator = (U&& closure)
  {
    func = std::make_shared<std::function<T>>();
    if(captured)
      captured->func = func;

    (*func) = std::forward<U>(closure);
    return *this;
  }

  template <class... Args>
  auto operator () (Args&&... args) const 
    -> decltype((*func)(std::forward<Args>(args)...)) 
  {
    return (*func)(std::forward<Args>(args)...);
  }
};

The func_wrapper class is small but quite interesting (at least, I think so!). The role of func_wrapper is to behave just like std::function but the main difference is that func_wrapper uses a shared_ptr to a std::function. This way, multiple func_wrapper objects can share the same std::function object. You know where I am going with this, right?!

The default constructor, copy-constructor (const version), and the move-constructor of func_wrapper are very typical. There is also a non-const copy-constructor: "func_wrapper(func_wrapper &)". This constructor is called only when the right-hand-side object is a non-const lvalue. I.e., this is the constructor that gets invoked when the fib object is captured by-value in the lambda.

What's unusual about this constructor is that it modifies the right-hand-side lvalue. It is allowed because the parameter is non-const. We assign this pointer of the nameless func_wrapper (inside the closure) to the named fib object (outside the closure). I.e., the named fib object now "links" to the namesless object inside the closure. The func_wrapper inside the closure links to NULL. You can think of it as a little linked-list of func_wrapper objects. The shared_ptr in both objects are just default initialized at this stage.

Right after the copy construction of the captured variables, compiler creates a closure object and assigns it to the named func_wrapper object ("fib"). The template assignment operator (with universal reference) of func_wrapper is invoked. First thing we do is to allocate an empty std::function object using make_shared. This initializes the func shared pointer in the named func_wrapper ("fib" again). We cannot use the lambda closure object to create the std::function just yet because the std::function constructor is going to make an internal copy of the closure. Our captured pointer, however, points to the one inside the closure object, which is still on the stack. We have to modify the shard_ptr inside the closure object before passing it on to the std::function. That's what we do next.

We assign the (*this).func object to the captured func object only if something has been captured. As it turns out, we have captured a pointer to the func_wrapper inside the closure object. So we assign the shared_ptr to it.

At this stage, the reference count of the empty std::function object is 2. One inside the "fib" object and other one inside the closure object.

Now we are ready to assign the closure object to initialize the empty std::function. We use std::forward to pass on the closure object to the std::function assignment operator. Thus, std::function does not really make copies of the closure but instead moves it. The last step is to return *this.

At this stage, the lambda is ready to invoke itself recursively. The first call is made, of course, using the named func_wrapper.
int main(void)
{
  auto const fib = create();
  std::cout << fib(6); // prints 8
}
The func_wrapper has overloaded function call operator just like std::function does. It simply uses std::forward to pass on the arguments to the closure held by the shared_ptr.

Note that the fib object in main is const. This ensures that no further modifications can be made to this recursive lambda wrapper. After all, this the only reference we have to an anonymous recursive lambda closure. The fib object can be passed by-value to other functions without issues because the const version of the copy-constructor gets invoked in those cases. That constructor makes no modifications to the rhs lvalue.

So, are we done? Not quite, unfortunately! There's a memory leak!

Fixing the Memory Leak

Note that the shread_ptr inside the closure object points to itself. That's a cycle! Even though the named fib object goes out of scope the reference count of the std::function never reaches zero because there is at least one shared_ptr keeping it alive. Therefore, you end up leaking memory.

So we need std::weak_ptr. And here is an improved version of the func_wrapper that does not leak.
template <class T>
class func_wrapper // version 2
{
  std::shared_ptr<std::function<T>> func;
  std::weak_ptr<std::function<T>> weak_func;
  func_wrapper *captured;

public:

  func_wrapper() 
    : captured(nullptr)
  {}

  func_wrapper(const func_wrapper &) = default;
  func_wrapper(func_wrapper &&) = default;

  func_wrapper(func_wrapper & f)
    : func(f.func),
      weak_func(f.weak_func),
      captured(nullptr)
  {
    f.captured = this;
  }

  template <class U> 
  typename 
    std::enable_if<
      is_callable<typename std::remove_reference<U>::type>::value,
      const func_wrapper &>::type
  operator = (U&& closure)
  {
    weak_func = func = std::make_shared<std::function<T>>();
    if(captured)
       captured->weak_func = func;
    (*func) = std::forward<U>(closure);
    return *this;
  }

  template <class... Args>
  auto operator () (Args&&... args) const 
    -> decltype((*func)(std::forward<Args>(args)...)) 
  {
    if(func)
      return (*func)(std::forward<Args>(args)...);
    else
      return (*weak_func.lock())(std::forward<Args>(args)...);
  }
};


The overall concept is the same except that we never initialize the shared_ptr inside the closure object. Instead we use a std::weak_ptr. std::weak_ptr does not increase the reference count. When it needs the object it is referring to, it has to create a shared_ptr object and dereference that one. If in the meanwhile the reference count drops to zero, the std::function and the closure is reclaimed as expected.

There are two ways to invoke the recursive closure from within the overloaded operator (). If there is a named reference ("fib") the shared_ptr is active in that case. So we use that one in the first condition in operator(). Within the closure, however, there is a "half-backed" func_wrapper that does not have strong reference but only a weak one. So the second condition turns the weak reference into a strong one (by calling .lock) and forwards the arguments just as usual.

Additionally, I'm using std::enable_if in the template assignment operator to make sure that what you are assigning is indeed a callable object. is_callable is a non-standard trait I found here.

So there you have it. A reusable solution in C++11 for recursive lambdas that you can return and use just like any other object. BTW, if you know a compelling use-case for using recursive lambdas, I'm all ears!

Sunday, October 06, 2013

Moving elements from STL containers and std::initializer_list


Lot has been said about effectively using move-semantics, such as return-by-value and pass sink arguments by-value to constructors (and move later). Dust seems to be settling on those issues because I see some evidence building up to support that. Not much has been said (I think) about using move-semantics for a collection of objects, however. I mean moving objects from one type of STL container to another and also moving elements from std::initializer_list.

Moving STL Containers of the same type

This one is simple. STL containers provide the necessary move operations: move-constructor and move-assign operator. So moving an entire vector<T> to another is pretty straight forward. But that is only one of several ways to construct a vector<T>. How about move constructing a std::vector from a vector with custom allocators? Or for that matter, from a std::list or a std::set?

Moving objects from one type of STL container to another

The classic way of copying objects from one type of STL container to another has been the iterator-pair idiom. All STL containers provide an iterator-pair constructor where you can pass the begin/end iterators of the source container. That will copy the objects from the source to the destination.

How about moving them?

As it turns out, our dear C++ standardization committee has already thought of this use-case and has put a solution in place: std::move_iterator. A move_iterator takes any iterator and adapts it such that the lvalue it is refereing to turns into a rvalue reference upon dereferencing. It basically applies std::move on each element in the entire range with a small difference I'll talk about later. Using std::move_iterator is very easy. There is also a helper function std::make_move_iterator.
int main(void)
{
  std::list<std::string> l(10, "C++ Truths");
  std::cout << l.size(); // prints 10
  std::vector<std::string> v(std::make_move_iterator(begin(l)), 
                             std::make_move_iterator(end(l)));
  std::cout << l.size(); // prints 10
  std::cout << v.size(); // prints 10
  
  // prints 10 empty strings separated by #. (i.e,  ##########)
  std::copy(begin(l), 
            end(l), 
            std::ostream_iterator<std::string>(std::cout, "#"); 

  // prints "C++ Truths" 10 times.
  std::copy(begin(v), 
            end(v), 
            std::ostream_iterator<std::string>(std::cout, " "); 
}
Not surprisingly, the size of the list remains unchanged after moving all the elements into the vector. After moving each string, the moved-from string objects are empty (strictly speaking, undefined but valid state). Therefore, when the list is printed again, we get 10 #s. The vector on the hand has the real strings and we see "C++ Truths" 10 times. So far so good. With std::move_iterator we can move elements in vectors, lists to other STL containers. How about std::set? Can we use std::move_iterator on std::set? As it turns out the answer is not that straight forward.

Moving objects from std::set to other STL container

The problem with std::set is that std::set has only const_iterators. I.e., set<T>::iterator and set<T>::const_iterator may not be different types at all. If they are different, you cannot modify the objects pointed to by set<T>::iterator. Why? If it were allowed, breaking std::set invariants will be way too easy. Note that std::set is an associative container and the elements are stored in sorted order. If modifications through set<T>::iterator were allowed, one could change the element in ways that breaks the sorted order. Idiomatic way to replace an element with the same key is to erase it and insert another immediately after that.
char alpha='A';
std::vector<std::string> alpha_vector(26);

// fill the vector with A,B,C,....X,Y,Z.
std::generate(begin(alpha_vector), 
              end(alpha_vector), 
              [alpha]() mutable { return alpha++; }); 

// move all the strings from the vector to the string
// no invariants broken here.
std::set<std::string> alpha_set(std::make_move_iterator(begin(alpha_vector)),
                                std::make_move_iterator(end(alpha_vector)));

alpha_vector.clear();
                                
// move each string back to the vector
for(const std::string & s : alpha_set) {
  alpha_vector.push_back(std::move(const_cast<std::string &>(s)));
}

// prints 26 #s. That means the set has 26 empty strings, which 
// by definitions breaks the invariant of uniqueness.
std::copy(begin(alpha_set),
          end(alpha_set),
          std::ostream_iterator<std::string>(std::cout, "#"));
Moving means modification (in most cases). So what happens when you use std::move_iterator with std::set::iterators? ... The program does not compile. I think this restriction is reasonable in general but too limiting in some scenarios. For instance, some class types have "key" members and other "data" members, which are not part of the key. If the "data" members are large and efficiently movable, it makes sense to copy the key members but move the data members. I can think of a couple of ways to achieve that. So here is some code that conditionally compiles to one or the other.
class Foo
{
  std::string key;
  std::vector<int> data;
public:
  Foo(std::string k, std::vector<int> d)
    : key(std::move(k)),
      data(std::move(d))
  {}
  Foo(const Foo &) = default; 
  Foo & operator = (const Foo &) = default;
#ifdef OPTION1
  Foo(Foo && f) 
    : key(f.key),             // copies key
      data(std::move(f.data)) // moves data
   {}
  Foo & operator = (Foo && f) {
    using std::swap;
    this->key = f.key;
    swap(this->data, f.data);
  }
#endif
#ifdef OPTION2
  Foo(Foo &&) = default;
  Foo & operator = (Foo &&) = default;
  Foo move() { // copy key and move data.
    return { key, std::move(data) };
  }
#endif  
};
Option #1 above has a custom implementation of the move constructor. It copies the key and moves data. The move-assignment operator is also does the same. Both of them leave the "key" part of the rvalue untouched. This allows the Foo objects to be moved out of a std::set without breaking set invariants. If you think the Foo move constructor should rather be a "strict" move (i.e., it must scoop up everything it can) Option #2 might work for you.

In Option #2, the move-constructor and move-assignment operator are default but there is an additional member function called move. I could have also called it move_from_set. The member move function moves *this object into an rvalue leaving behind the key. The data part of *this is moved out. Note that simply returning a Foo rvalue reference won't be right. That is because in option#2 the move-ctor and move-assign are default and they will scoop up everything they can. We don't want that. Lets see how to use these two options when moving members out from a std::set.
std::set<Foo> foo_set;
foo_set.insert(...);
std::vector<Foo> foo_vector;
foo_vector.reserve(foo_set.size());

for(auto iter = begin(foo_set); iter != end(foo_set);)
{
#ifdef OPTION1
  foo_vector.push_back(std::move(const_cast<Foo &>(*iter)));
#else
  foo_vector.push_back(const_cast<Foo &>(*iter).move());
#endif

  foo_set.erase(iter++);
}

The above code is still quite wordy. Mostly because we are not able to use the move_iterators. move_itrators don't work with const iterators. That is because moving from const lvalues is conceptually wrong. I was quite tempted to create const_move_iterator, which simply disregard the const qualifier and make a T&& from a const T &. But I prefer somewhat wordy const_cast because it is explicit and it is easier to search that way.

Note that I'm erasing the set elements as soon as I move them into the vector. I've to make sure that set iterators are no invalidated as a result of calling erase. Therefore, I call iter++, which moves the iterator to the next position before the element is erased from the set. This is one of the times when I prefer calling post-fix increment instead of prefix.

Moving elements from std::initializer_list

Moving from std::initializer_list gets even more interesting because std::initiliazer_list provides only const iterators. As it turns out The in<T> idiom can be used to determine lvalue/rvalue at run-time and make a decision whether to move or not based on that. Here is an example of how to use it.
std::vector<std::string> 
attempt_move(std::initializer_list<in<std::string>> list) 
{
  std::vector<std::string> result;
  result.reserve(list.size());
  for(const in<std::string> & s : list)
  {
     if(s.rvalue()) {
        // rvalue string. Just move. 
        // in<T>.rget() returns T&&
        result.push_back(s.rget());
     }
     else {
        // lvalue string. Make a copy
        // in<T>.get() returns const T &.
        result.emplace_back(s.get());
     }
  }
  return result;
}

int main(int argc, char *argv[])
{
  if(argc >= 4) 
  {
     std::string s0(argv[0]);
     std::string s1(argv[1]);
     attempt_move({ s0, std::move(s1), argv[2], argv[3] }); 
  }
}

Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDS

Slides of my Silicon Valley Code Camp (2013) talk are now available. If you attended this session in person please evaluate it. I take feedback/comments seriously!

Abstract: When it comes to sending data across a network, applications send either binary or self-describing data (XML). Both approaches have merits. Data Distribution Service (DDS) combines the best of both in what’s called “data-centric messaging”. DDS shares the type description once, upfront, and later on sends binary data that meets the type description. You typically use IDL or XSD to specify the types and run them through a code generator for type-safe wrapper APIs for your application in your programming language. Simple and fast! As it turns out, however, C++11 bends the rules once again. In this presentation you will learn about a template-based C++11 messaging library that gives the DDS code generator a run for its money. The types and objects in your C++11 application are mapped to standard DDS X-Types type descriptions and serialized format, respectively, using template meta-programming. If you have never heard about SFINAE you won’t stop talking about it after you see "overloading in overdrive" in this presentation. What’s more? I will share my newfound hatred for std::vector of bool/enums. This presentation will cover DDS-XTypes, DDS_TypeCode, DDS_DynamicData, STL, type_traits, Boost Fusion, and overloading with enable_if (lots and lots of it!).

Monday, September 16, 2013

21 Ways of Passing Parameters ... Plus One!

You probably already know that there are 21 ways of passing parameters to a C++11 function! In case you lost the count somewhere, lets count one more time: 4 variations of pass-by pointer (i.e., const, volatile, const volatile, and none i.e., no cv-qualifiers), 4 variations of pass-by reference (ditto here), 4 variations of pass-by value (ditto here too), 4 ways of passing an argument as an rvalue reference (ditto here too! But seriously, you don't want to be around people who write cv-qualified pass-by rvalue reference arguments), std::initializer_list (ditto here too! and at this point it is ok to browse away from this page), and finally, pass-by universal reference (more formally, perfect forwarding ... T&& ... I think universal references deserve their own spot in this overcrowded VIP lounge because they are really special and they don't have 4 variations). So that makes it 21.

So you are still with me, hmm... You seem to tolerate me. So I'm going to submit that we need one more! Yes, 22nd way of passing parameters. And if you think the 22nd way should be cv-qualified too ... shame on you!

The Goal

Very simple: Move elements from a std::initializer_list. std::initializer_list provides a very convenient syntax to pass statically fixed number of arguments of the same type to a constructor/function. They are light-weight and easy to use. So just like regular arguments, I would like to steal the guts of the original argument when it is an rvalue.

However, there's catch! The std::initialization_list iterators always point to const elements. Why? Because the compiler could use the static read-only store to construct the elements of the initializer_list. However, I think, most objects will be an exception to that because std::initializer_list could be created using runtime variables and not just literals. For example,  { i, j, k} cannot be stored in a read-only store because i, j, and k are not known till run-time. In fact, the standard says, and I quote: "[ Note: The implementation is free to allocate the array in read-only memory if an explicit array with the same initializer could be so allocated. —end note ]". I think the note says it all. Also, according to cppreference: "The storage for std::initializer_list is unspecified (i.e. it could be automatic, temporary, or static read-only memory, depending on the situation)."

All right, so std::initializer_list iterators do not allow us to move objects out even though we could say with high confidence that the underlying array of the std::initializer_list has automatic storage lifespan. That's the problem.

Now, not all objects that are part of an initializer_list are rvalues, they could also be lvalues and const lvalues. So we need a way to carry the lvalue/rvalue-ness of the original object though the std::initializer_list. Let me give an example of what I mean.
void attempt_move(std::initializer_list<std::string> list) {
  // Copy lvalues and move rvalues in the list
}
int main(int argc, char *argv[])
{
  std::string s(argv[0]);
  if(argc >= 4)  
    attempt_move({ s, argv[1], argv[2], argv[3] }); 

  {
    // The underlying array initialization is roughly equivalent to
    std::string __list[4] = { s, 
                              std::string{argv[1]}, 
                              std::string{argv[2]}, 
                              std::string{argv[3]} };
    attempt_move(std::initializer_list<std::string>(__list, __list + 4));
    // __list[0] is an lvalue and others are rvalues
  }
}
The initializer_list of the attempt_move function contains both lvalues and rvalues. Therefore, we can't simply const_cast and move initializer_list iterators to get to the underlying std::string objects. It is dangerous because you might move an lvalue. In general, it is impossible to know which element is rvalue and which is not.

Or is it?

I think we turn out to be a bit lucky here. I've seen just the type we need here. So here I am shamelessly reproducing the original code.

The in<T> idiom
template <typename T> class in;
template <typename T> struct is_in { 
  static const bool value = false;
};
template <typename T> struct is_in<in<T>> { 
  static const bool value = true;
};
template <typename T> struct is_in<const in<T>> { 
  static const bool value = true;
};

template <typename T> 
class in
{
public:
  in (const T& l): v_ (l), rv_ (false) {}
  in (T&& r): v_ (r), rv_ (true) {}

  // Support for implicit conversion via perfect forwarding.
  //
  struct storage
  {
    storage (): created (false) {}
    ~storage () {
      if (created) reinterpret_cast<T*> (&data)->~T ();
    }

    bool created;
    typename std::aligned_storage<sizeof(T), alignof(T)>::type data;
  };

  template <typename T1,
            typename std::enable_if<
              std::is_convertible<T1, T>::value &&
              !is_in<typename std::remove_reference<T1>::type>::value,
              int>::type = 0>
  in (T1&& x, storage&& s = storage ())
      : v_ (*new (&s.data) T (std::forward<T1>(x))), 
        rv_ (true) 
  {s.created = true;}

  in (T& l): v_ (l), rv_ (false) {} // For T1&& becoming T1&.

  // Accessors.
  //
  bool lvalue () const {return !rv_;}
  bool rvalue () const {return rv_;}

  operator const T& () const {return v_;}
  const T& get () const {return v_;}
  T&& rget () const {return std::move (const_cast<T&> (v_));}

  // Return a copy if lvalue.
  //
  T move () const
  {
    if (rv_)
      return rget ();
    else
      return v_;
  }

private:
  const T& v_;
  bool rv_;
};

Except for the middle section of the in<T> class above, rest of the in class is pretty straight forward. It maintains state regarding how it was initialized. Three constructors in(const T&), in(T&&), and in(T &) create a reference to the original object and also maintain a boolean value to remember lvalue/rvalue-ness. in(T&&) is called only when the parameter is an rvalue and other constructors are called depending upon whether the passed argument is a const lvalue or non-const lvalue. We don't distinguish between the two because both cases are lvalues.

Lets see the in<T> idiom in action!
std::vector<std::string> 
attempt_move(std::initializer_list<in<std::string>> list) 
{
  std::vector<std::string> result;
  result.reserve(list.size());
  for(const in<std::string> & s : list)
  {
     if(s.rvalue()) {
        // rvalue string. Just move. 
        // in<T>.rget() returns T&&
        result.push_back(s.rget());
     }
     else {
        // lvalue string. Make a copy
        // in<T>.get() returns const T &.
        result.emplace_back(s.get());
     }
  }
  return result;
}

int main(int argc, char *argv[])
{
  std::string s(argv[0]);
  if(argc >= 4)  
    attempt_move({ s, argv[1], argv[2], argv[3] }); 
}

Using in<T> is simple. You simply ask if the object it is referring to is an lvalue or rvalue and take the appropriate action. That's what attempt_move is doing in a loop. When the referred object is an rvalue, in.rget() returns an rvalue reference, which can be readily passed to any function that is move-aware. For instance, vector.push_back.

Note that the loop variable s is a reference to const in<string>. It is required to be that way. The range-based for loop in attempt_move function is basically using the intializer_list iterators. They always point to const T. The reason is that compilers may use static read-only memory for storing the underlying array of an initializer_list.

In my view, const iterators of initializer_list are too limiting. The underlying in<std::string> array cannot possibly live in a read-only store because, AFAIK, read-only store must be initialized before main and the argv parameters are not known at that time. In fact, the C++11 standard says: "[ Note: The implementation is free to allocate the array in read-only memory if an explicit array with the same initializer could be so allocated. —end note ]". Seems like this is one case where the "explicit array with the same initializer" cannot be allocated read-only. So it probably has automatic storage.

It seems like because of the const iterators of std::initializer_list there are lost opportunities to move the std::string objects.

However, in<T> saves the day! We never modify the in<T> objects. We only query them. All member functions of in<T> are const.

Details

The middle section of in<T> is somewhat nifty. In fact, the earlier program would not compile without it. My discussion about it is going to be brief.

in<T> allows seamless conversion of c-strings and string literals to in<std::string>. It allows such a conversion only if the underlying type--std::string in this case--allows it. It creates an aligned storage and uses placement new to create the rvalue of T. Note that when an in<T> object is created while calling a function (that is the only use of in<T>), the in<T>::storage rvalue created in the constructor lives as long as the in<T> object lives. To be precise, "a temporary bound to a reference parameter in a function call persists until the completion of the full-expression containing the call." Ordinarily, if the in<T> object is created outside a function calling context, the temporary object (i.e., the in<T>::storage object) will be destroyed right after the constructor returns. That will leave the in<T> object in a dangling state, which will most likely lead to undefined behavior or program crash sometime later. Extending lifetimes in function-call context is a safety feature of C++ object lifetimes because the function almost always makes use of the temporary object. In fact, the lifetime is extended till end of the entire expression that contains the function call. That's important because nested function calls may pass a reference to the original temporary instance as return value to other function.

More Uses

The in<T> has more uses beyond just the std::initializer_list. It could be used directly to efficiently pass lvalue/rvalue arguments to functions and query them at run-time from within the function body. You don't have tilt your head to figure the most optimal strategy to pass arguments to a function: rvalue references, const lvalue references, or pass-by value. This a rather surprising side effect. Lets see some code.
struct matrix {
  matrix () {}
  matrix (matrix const&) {cout << "copy-ctor, ";}
  matrix (matrix&&) {cout << "move-ctor, ";}
  matrix & operator = (matrix const &) { cout << "copy-assign, "; return *this; }
  matrix & operator = (matrix &&) { cout << "move-assign, "; return *this; }

  matrix& operator+= (const matrix &m) {
    return *this;
  }
  std::vector<int> data;
};

#ifdef IN
inline matrix operator+ (in<matrix> x, in<matrix> y) {
  return std::move (
    x.rvalue () ? x.rget () += y :
    y.rvalue () ? y.rget () += x :
    matrix (x) += y);
}
#else
inline matrix operator+ (matrix result, matrix const& y) {
  result += y;
  return result;
}
#endif

int main()
{
    matrix s1;
    matrix s2;
    matrix m1 (matrix() + s1);        cout << endl;// (1)
    matrix m2 (matrix() + matrix());  cout << endl;// (2)
    matrix m3 (s1 + s2);              cout << endl;// (3) 
    matrix m4 (s1 + matrix());        cout << endl;// (4)
}
The code snippet above shows two implementations of operator+(). Both are extremely optimized implementations in their own right. The in<matrix> version simply uses pass-by-value and determines whether the parameter is rvalue or not at runtime. The second version is pretty nifty too. It takes the first matrix parameter by value because it has to create one copy anyhow. I found that in<matrix> version is about 1-2% faster than the one below. Hardly any difference! That's because in<T> is better in some respects but not all. To be precise, case #4 in main is optimized better using in<matrix> because it detects the right hand side rvalue. In other cases it is either equivalent or only slightly inferior. After all, the in<matrix> version has two conditionals. Not to mention the overall implementation looks more complex.

I think the opreator+() is not the best example for in<T>. It shines best when there are more parameters. With increasing number of parameters you have to resort to either using the universal references or exponential number of versions of your function--each optimized for a unique combination of arguments. In such cases in<T> may be a quick way out instead of worrying about using the 21 other ways of passing arguments. Having said that, I will encourage every self-respecting programmer to read the guidelines given by the original author of the in<T> idiom. I brought this idiom up once again because I thought it solves the problem of moving rvalues through a std::initializer_list quite elegantly. So that makes it the 22nd alternative!

Sunday, March 03, 2013

DDS Programming using Modern C++

I'm not the only one who agrees that C++ is the awesomest language for network programming. Raw network programming, however, is very low-level and leads to many distractions that reduce productivity. Networking middlewares provide higher-level network programming models while taking care of many accidental complexities that would otherwise likely to plague your distributed system. One such standards-based networking middleware is Data Distribution Service (DDS) and now there is a modern C++ binding for programming high-performance distributed real-time systems using DDS. The API is now finalized by the Object Management Group (OMG) and is known as the DDS-PSM-Cxx standard.

I have two upcoming talks on the DDS-PSM-Cxx. One is in the San Francisco Bay Area and another in Aspen (Yes, C++ Now'13!) . Find out what DDS-PSM-Cxx is about and what will be covered in the talks in this new blog post on DDS Programming using Modern C++.