C++ rvalue, && and Move

C++ is hard, the newer versions become even harder. This article will deal with some of the hard parts in C++, rvalue, rvalue reference (&&) and move semantics. And I am going to reverse engineer (not a metaphor) these complex and correlated topics, so you can understand them completely in one shot.

Firstly, let’s examine

What is a rvalue?

A rvalue is one that should be on the right side of an equals sign.

Example:

1
2
3
4
5
int var; // too much JavaScript recently:)
var = 8; // OK! lvalue (yes, there is a lvalue) on the left

8 = var; // ERROR! rvalue on the left
(var + 1) = 8; // ERROR! rvalue on the left

Simple enough. Then let’s look at some more subtle rvalues, ones that are returned by functions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <string>
#include <stdio.h>

int g_var = 8;
int& returnALvalue() {
return g_var; //here we return a lvalue
}

int returnARvalue() {
return g_var; //here we return a rvalue
}

int main() {
printf("%d", returnALvalue()++); // g_var += 1;
printf("%d", returnARvalue());
}

Result:

1
2
8
9

It is worth noting that the way of returning a l-value (in the example) is considered a bad practice. So do not do that in real world programming.

Beyond theoretical level

Whether a variable is a rvalue can make differences in real programming even before && is invented.

For example, this line

1
const int& var = 8;

can be compiled fine while this:

1
int& var = 8; // use a lvalue reference for a rvalue

generates following error:

1
2
rvalue.cc:24:6: error: non-const lvalue reference to type 'int' cannot bind to a
temporary of type 'int'

The error message means that the compiler enforces a const reference for rvalue.

A more interesting example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <stdio.h>
#include <string>

void print(const std::string& name) {
printf("rvalue detected:%s\n", name.c_str());
}

void print(std::string& name) {
printf("lvalue detected:%s\n", name.c_str());
}

int main() {
std::string name = "lvalue";
std::string rvalu = "rvalu";

print(name); //compiler can detect the right function for lvalue
print(rvalu + "e"); // likewise for rvalue
}

Result:

1
2
lvalue detected:lvalue
rvalue detected:rvalue

The difference is actually significant enough and compiler can determine overloaded functions.

So rvalue is constant value?

Not exactly. And this where && (rvalue reference)comes in.

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <stdio.h>
#include <string>

void print(const std::string& name) {
printf(“const value detected:%s\n”, name.c_str());
}

void print(std::string& name) {
printf(“lvalue detected%s\n”, name.c_str());
}

void print(std::string&& name) {
printf(“rvalue detected:%s\n”, name.c_str());
}

int main() {
std::string name = “lvalue”;
const std::string cname = “cvalue”;
std::string rvalu = "rvalu";

print(name);
print(cname);
print(rvalu + "e");
}

result:

1
2
3
lvalue detected:lvalue
const value detected:cvalue
rvalue detected:rvalue

If the functions are overloaded for rvalue, a rvalue variable choose the more specified version over the version takes a const reference parameter that is compatible for both. Thus, && can further diversify rvalue from const value.

In bellow I summarize the compatibility of overloaded function versions to different types in default setting. You can verify the result by selectively commenting out lines in the example above.

compatibility

It sounds cool to further differentiate rvalue and constant value as they are not exactly the same indeed. But what is the practical value?

What problem does && solve exactly?

The problem is the unnecessary deep copy when the argument is a rvalue.

To be more specific. && notation is provided to specify a rvalue, which can be used to avoid the deep copy when the rvalue, 1) is passed as an argument of either a constructor or an assignment operator, and 2) the class of which contains a pointer (or pointers) referring to dynamically allocated resource (memory).

It can be more specific with examples:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#include <stdio.h>
#include <string>
#include <algorithm>

using namespace std;

class ResourceOwner {
public:
ResourceOwner(const char res[]) {
theResource = new string(res);
}
ResourceOwner(const ResourceOwner& other) {
printf("copy %s\n", other.theResource->c_str());
theResource = new string(other.theResource->c_str());
}
ResourceOwner& operator=(const ResourceOwner& other) {
ResourceOwner tmp(other);
swap(theResource, tmp.theResource);
printf("assign %s\n", other.theResource->c_str());
}
~ResourceOwner() {
if (theResource) {
printf("destructor %s\n", theResource->c_str());
delete theResource;
}
}

private:
string* theResource;
};

void testCopy() { // case 1
printf("=====start testCopy()=====\n");

ResourceOwner res1("res1");
ResourceOwner res2 = res1; //copy res1

printf("=====destructors for stack vars, ignore=====\n");
}

void testAssign() { // case 2
printf("=====start testAssign()=====\n");

ResourceOwner res1("res1");
ResourceOwner res2("res2");
res2 = res1; //copy res1, assign res1, destrctor res2

printf("=====destructors for stack vars, ignore=====\n");
}

void testRValue() { // case 3
printf("=====start testRValue()=====\n");

ResourceOwner res2("res2");
res2 = ResourceOwner("res1"); //copy res1, assign res1, destructor res2, destructor res1

printf("=====destructors for stack vars, ignore=====\n");
}

int main() {
testCopy();
testAssign();
testRValue();
}

result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
=====start testCopy()=====
copy res1
=====destructors for stack vars, ignore=====
destructor res1
destructor res1
=====start testAssign()=====
copy res1
assign res1
destructor res2
=====destructors for stack vars, ignore=====
destructor res1
destructor res1
=====start testRValue()=====
copy res1
assign res1
destructor res2
destructor res1
=====destructors for stack vars, ignore=====
destructor res1

The result are all good for the first two test cases, i.e., testCopy() and testAssign(), in which resource in res1 is copied for the res2. It is reasonable to copy the resource because they are two entities both need their unshared resource (a string).

However, in the third case, the (deep) copying of the resource in res1 is superfluous because the anonymous rvalue (returned by ResourceOwner(“res1”)) will be destructed right after the assignment thus it does not need the resource anymore:

1
res2 = ResourceOwner("res1"); // Please note that the destructor res1 is called right after this line before the point where stack variables are destructed.

I think it is a good chance to repeat the problem statement:

&& notation is provided to specify a rvalue, which can be used to avoid the deep copy when the rvalue, 1) is passed as an argument of either a constructor or an assignment operator, and 2) the class of which contains a pointer (or pointers) referring to dynamically allocated resource (memory).

If copying of a resource that is about to disappear is not optimal, what is the right operation then? The answer is

Move

The idea is pretty straightforward, if the argument is a rvalue, we do not need to copy. Rather, we can simply “move” the resource (that is the memory the rvalue points to). Now let’s overload the assignment operator using the new technique:

1
2
3
4
ResourceOwner& operator=(ResourceOwner&& other) {
theResource = other.theResource;
other.theResource = NULL;
}

This new assignment operator is called a move assignment operator. And a move constructor can be programmed in a similar way.

A good way of understanding this is: when you sell your old property and move to a new house, you do not have to toss all the furniture as we did in case 3 right? Rather, you can simply move the furniture to the new home.

All good.

What is std::move?

Besides the move assignment operator and move constructor discussed above, there is one last missing piece in this puzzle, std::move.

Again, we look at the problem first:

when 1) we know a variable is in fact a rvalue, while 2) the compiler does not. The right version of the overloaded functions can not be called.

A common case is when we add another layer of resource owner, ResourceHolder and the relation of the three entities is given as bellow:

1
2
3
4
5
holder
|
|----->owner
|
|----->resource

(N.b., in the following example, I complete the implementation of ResourceOwner’s move constructor as well)

Example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <string>
#include <algorithm>

using namespace std;

class ResourceOwner {

public:
ResourceOwner(const char res[]) {
theResource = new string(res);
}

ResourceOwner(const ResourceOwner& other) {
printf(“copy %s\n”, other.theResource->c_str());
theResource = new string(other.theResource->c_str());
}

++ResourceOwner(ResourceOwner&& other) {
++ printf(“move cons %s\n”, other.theResource->c_str());
++ theResource = other.theResource;
++ other.theResource = NULL;
++}

ResourceOwner& operator=(const ResourceOwner& other) {
ResourceOwner tmp(other);
swap(theResource, tmp.theResource);
printf(“assign %s\n”, other.theResource->c_str());
}

++ResourceOwner& operator=(ResourceOwner&& other) {
++ printf(“move assign %s\n”, other.theResource->c_str());
++ theResource = other.theResource;
++ other.theResource = NULL;
++}

~ResourceOwner() {
if (theResource) {
printf(“destructor %s\n”, theResource->c_str());
delete theResource;
}
}

private:
string* theResource;
};

class ResourceHolder {

……

ResourceHolder& operator=(ResourceHolder&& other) {
printf(“move assign %s\n”, other.theResource->c_str());
resOwner = other.resOwner;
}

……

private:
ResourceOwner resOwner;
}

In ResourceHolder’s move assignment operator, we want to call ResourceOwner’s move assignment operator since “a no-pointer member of a rvalue should be a rvalue too”. However, when we simply code resOwner = other.resOwner, what gets invoked is actually the ResourceOwner’s normal assignment operator that, again, incurs the extra copy.

It’s a good chance to repeat the problem statement again:

when 1) we know a variable is in fact a rvalue, while 2) the compiler does not. The right version of the overloaded functions can not be called.

As a solution we use to std::move to cast the variable to rvalue, so the right version of ResourceOwner’s assignment operator can be called.

1
2
3
4
ResourceHolder& operator=(ResourceHolder&& other) {
printf(“move assign %s\n”, other.theResource->c_str());
resOwner = std::move(other.resOwner);
}

What is std::move exactly?

We know that type cast is not simply a compiler placebo telling a compiler that “I know what I am doing”. It effectively generate instructions of mov a value to bigger or smaller registers (e.g.,%eax->%cl) to conduct the “cast”.

So what std::move does exactly behind scene. I do not know myself when I am writing this paragraph, so let’s find out together.

First we modify the main a bit (I tried to make the style consistent)

Example:

1
2
3
4
5
6
int main() {
ResourceOwner res(“res1”);
asm(“nop”); // remeber me
ResourceOwner && rvalue = std::move(res);
asm(“nop”); // remeber me
}

Compile it, and dissemble the obj using

1
2
clang++ -g -c -std=c++11 -stdlib=libc++ -Weverything move.cc
gobjdump -d -D move.o

Result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0000000000000000 <_main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 20 sub $0x20,%rsp
8: 48 8d 7d f0 lea -0x10(%rbp),%rdi
c: 48 8d 35 41 03 00 00 lea 0x341(%rip),%rsi # 354 <GCC_except_table5+0x18>
13: e8 00 00 00 00 callq 18 <_main+0x18>
18: 90 nop // remember me
19: 48 8d 75 f0 lea -0x10(%rbp),%rsi
1d: 48 89 75 f8 mov %rsi,-0x8(%rbp)
21: 48 8b 75 f8 mov -0x8(%rbp),%rsi
25: 48 89 75 e8 mov %rsi,-0x18(%rbp)
29: 90 nop // remember me
2a: 48 8d 7d f0 lea -0x10(%rbp),%rdi
2e: e8 00 00 00 00 callq 33 <_main+0x33>
33: 31 c0 xor %eax,%eax
35: 48 83 c4 20 add $0x20,%rsp
39: 5d pop %rbp
3a: c3 retq
3b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

between the two nop, we can notice some dummy instructions generated for the move (if looking closely, you can know that they do basically nothing) However, if we turn on O (-O1) for the compiler, all the instructions will be gone.

1
2
clang++ -g -c -O1 -std=c++11 -stdlib=libc++ -Weverything move.cc
gobjdump -d -D move.o

Moreover, if changing the critical line to:

1
ResourceOwner & rvalue = res;

The assembly generated is identical.

That means the move semantics is pure syntax candy and a machine does not care at all.

To conclude,

The MACHINE thinks it irrelevant, we don’t.
-Harold Finch

That's it. Did I make a serious mistake? or miss out on anything important? Or you simply like the read. Link me on -- I'd be chuffed to hear your feedback.