Sunday, October 06, 2013

Moving elements from STL containers and std::initializer_list


Lot has been said about effectively using move-semantics, such as return-by-value and pass sink arguments by-value to constructors (and move later). Dust seems to be settling on those issues because I see some evidence building up to support that. Not much has been said (I think) about using move-semantics for a collection of objects, however. I mean moving objects from one type of STL container to another and also moving elements from std::initializer_list.

Moving STL Containers of the same type

This one is simple. STL containers provide the necessary move operations: move-constructor and move-assign operator. So moving an entire vector<T> to another is pretty straight forward. But that is only one of several ways to construct a vector<T>. How about move constructing a std::vector from a vector with custom allocators? Or for that matter, from a std::list or a std::set?

Moving objects from one type of STL container to another

The classic way of copying objects from one type of STL container to another has been the iterator-pair idiom. All STL containers provide an iterator-pair constructor where you can pass the begin/end iterators of the source container. That will copy the objects from the source to the destination.

How about moving them?

As it turns out, our dear C++ standardization committee has already thought of this use-case and has put a solution in place: std::move_iterator. A move_iterator takes any iterator and adapts it such that the lvalue it is refereing to turns into a rvalue reference upon dereferencing. It basically applies std::move on each element in the entire range with a small difference I'll talk about later. Using std::move_iterator is very easy. There is also a helper function std::make_move_iterator.
int main(void)
{
  std::list<std::string> l(10, "C++ Truths");
  std::cout << l.size(); // prints 10
  std::vector<std::string> v(std::make_move_iterator(begin(l)), 
                             std::make_move_iterator(end(l)));
  std::cout << l.size(); // prints 10
  std::cout << v.size(); // prints 10
  
  // prints 10 empty strings separated by #. (i.e,  ##########)
  std::copy(begin(l), 
            end(l), 
            std::ostream_iterator<std::string>(std::cout, "#"); 

  // prints "C++ Truths" 10 times.
  std::copy(begin(v), 
            end(v), 
            std::ostream_iterator<std::string>(std::cout, " "); 
}
Not surprisingly, the size of the list remains unchanged after moving all the elements into the vector. After moving each string, the moved-from string objects are empty (strictly speaking, undefined but valid state). Therefore, when the list is printed again, we get 10 #s. The vector on the hand has the real strings and we see "C++ Truths" 10 times. So far so good. With std::move_iterator we can move elements in vectors, lists to other STL containers. How about std::set? Can we use std::move_iterator on std::set? As it turns out the answer is not that straight forward.

Moving objects from std::set to other STL container

The problem with std::set is that std::set has only const_iterators. I.e., set<T>::iterator and set<T>::const_iterator may not be different types at all. If they are different, you cannot modify the objects pointed to by set<T>::iterator. Why? If it were allowed, breaking std::set invariants will be way too easy. Note that std::set is an associative container and the elements are stored in sorted order. If modifications through set<T>::iterator were allowed, one could change the element in ways that breaks the sorted order. Idiomatic way to replace an element with the same key is to erase it and insert another immediately after that.
char alpha='A';
std::vector<std::string> alpha_vector(26);

// fill the vector with A,B,C,....X,Y,Z.
std::generate(begin(alpha_vector), 
              end(alpha_vector), 
              [alpha]() mutable { return alpha++; }); 

// move all the strings from the vector to the string
// no invariants broken here.
std::set<std::string> alpha_set(std::make_move_iterator(begin(alpha_vector)),
                                std::make_move_iterator(end(alpha_vector)));

alpha_vector.clear();
                                
// move each string back to the vector
for(const std::string & s : alpha_set) {
  alpha_vector.push_back(std::move(const_cast<std::string &>(s)));
}

// prints 26 #s. That means the set has 26 empty strings, which 
// by definitions breaks the invariant of uniqueness.
std::copy(begin(alpha_set),
          end(alpha_set),
          std::ostream_iterator<std::string>(std::cout, "#"));
Moving means modification (in most cases). So what happens when you use std::move_iterator with std::set::iterators? ... The program does not compile. I think this restriction is reasonable in general but too limiting in some scenarios. For instance, some class types have "key" members and other "data" members, which are not part of the key. If the "data" members are large and efficiently movable, it makes sense to copy the key members but move the data members. I can think of a couple of ways to achieve that. So here is some code that conditionally compiles to one or the other.
class Foo
{
  std::string key;
  std::vector<int> data;
public:
  Foo(std::string k, std::vector<int> d)
    : key(std::move(k)),
      data(std::move(d))
  {}
  Foo(const Foo &) = default; 
  Foo & operator = (const Foo &) = default;
#ifdef OPTION1
  Foo(Foo && f) 
    : key(f.key),             // copies key
      data(std::move(f.data)) // moves data
   {}
  Foo & operator = (Foo && f) {
    using std::swap;
    this->key = f.key;
    swap(this->data, f.data);
  }
#endif
#ifdef OPTION2
  Foo(Foo &&) = default;
  Foo & operator = (Foo &&) = default;
  Foo move() { // copy key and move data.
    return { key, std::move(data) };
  }
#endif  
};
Option #1 above has a custom implementation of the move constructor. It copies the key and moves data. The move-assignment operator is also does the same. Both of them leave the "key" part of the rvalue untouched. This allows the Foo objects to be moved out of a std::set without breaking set invariants. If you think the Foo move constructor should rather be a "strict" move (i.e., it must scoop up everything it can) Option #2 might work for you.

In Option #2, the move-constructor and move-assignment operator are default but there is an additional member function called move. I could have also called it move_from_set. The member move function moves *this object into an rvalue leaving behind the key. The data part of *this is moved out. Note that simply returning a Foo rvalue reference won't be right. That is because in option#2 the move-ctor and move-assign are default and they will scoop up everything they can. We don't want that. Lets see how to use these two options when moving members out from a std::set.
std::set<Foo> foo_set;
foo_set.insert(...);
std::vector<Foo> foo_vector;
foo_vector.reserve(foo_set.size());

for(auto iter = begin(foo_set); iter != end(foo_set);)
{
#ifdef OPTION1
  foo_vector.push_back(std::move(const_cast<Foo &>(*iter)));
#else
  foo_vector.push_back(const_cast<Foo &>(*iter).move());
#endif

  foo_set.erase(iter++);
}

The above code is still quite wordy. Mostly because we are not able to use the move_iterators. move_itrators don't work with const iterators. That is because moving from const lvalues is conceptually wrong. I was quite tempted to create const_move_iterator, which simply disregard the const qualifier and make a T&& from a const T &. But I prefer somewhat wordy const_cast because it is explicit and it is easier to search that way.

Note that I'm erasing the set elements as soon as I move them into the vector. I've to make sure that set iterators are no invalidated as a result of calling erase. Therefore, I call iter++, which moves the iterator to the next position before the element is erased from the set. This is one of the times when I prefer calling post-fix increment instead of prefix.

Moving elements from std::initializer_list

Moving from std::initializer_list gets even more interesting because std::initiliazer_list provides only const iterators. As it turns out The in<T> idiom can be used to determine lvalue/rvalue at run-time and make a decision whether to move or not based on that. Here is an example of how to use it.
std::vector<std::string> 
attempt_move(std::initializer_list<in<std::string>> list) 
{
  std::vector<std::string> result;
  result.reserve(list.size());
  for(const in<std::string> & s : list)
  {
     if(s.rvalue()) {
        // rvalue string. Just move. 
        // in<T>.rget() returns T&&
        result.push_back(s.rget());
     }
     else {
        // lvalue string. Make a copy
        // in<T>.get() returns const T &.
        result.emplace_back(s.get());
     }
  }
  return result;
}

int main(int argc, char *argv[])
{
  if(argc >= 4) 
  {
     std::string s0(argv[0]);
     std::string s1(argv[1]);
     attempt_move({ s0, std::move(s1), argv[2], argv[3] }); 
  }
}

5 comments:

Unknown said...

on the last example, shouldn't it be

s = attempt_move({ argv[1], argv[2], argv[3] });


instead?

Ogunyinka Joshua said...

Let me simply answer your question "Unknown": No, it isn't s = attempt_move({argv[1], argv[2], argv[3]});
Reason: Variable "s" is a string but what's the return type of "attempt_move" --std::vector
ciao

all white said...

Fact!!! Thanks for sharing complete point…one more company has articles about Webdesign Agenter website pixo web design.

Kim said...

Your argument is excellent!! Unique points in the same, here one more website I like joomla development company.

Joshua Thai said...

I really appreciated with your content and completely agree with you, here also I was visit a company website Digital zona.