Handout Vectors
Handout Vectors
Containers are used to manage collections of objects of a certain kind. Every kind of
container has its own advantages and disadvantages, so having different container types
reflect different requirements for collections in programs.
Iterators are used to step through the elements of collections of objects. These collections
may be containers or subsets of containers. The major advantage of iterators is that they
offer a small but common interface for any arbitrary container type. For example, one
operation of this interface lets the iterator step to the next element in the collection. This is
done independently of the internal structure of the collection. Regardless of whether the
collection is an array, a linked list, or a tree, it works. The interface for iterators is almost the
same as for ordinary pointers. To increment an iterator, you call operator ++ . To access the
value of an iterator, you use operator * . So, you might consider an iterator a kind of a smart
pointer that translates the call "go to the next element" into whatever is appropriate.
Algorithms are used to process the elements of collections. For example, algorithms can
search, sort, modify, or simply use the elements for various purposes. Algorithms use
iterators. Thus, because the iterator interface for iterators is common for all container types,
an algorithm has to be written only once to work with arbitrary containers.
The concept of the STL is based on a separation of data and operations. The data is managed by
container classes, and the separation are defined by configurable algorithms. Iterators are the
glue between these two components. They let any algorithm interact with any container.
In a way, the STL concept contradicts the original idea of object-oriented programming: the STL
separates data and algorithms rather than combining them. However, the reason for doing so is
very important. In principle, you can combine every kind of container with every kind of algorithm,
so the result is a very flexible but still rather small framework.
Background
Static arrays are used when programs require fixed-sized sequence of elements of a given type
where the number of elements is specified at compile time. C/C++ provide the subscript operator
[] to randomly access any element of the array. Such arrays are provided contiguous storage on
the stack or in static storage. When used in an expression, the name of a static array evaluates to
a pointer to the first element of the array. Thus, the use of the subscript operator in expressions
is syntactic sugarcoating for pointer arithmetic involving the array name and the index of the
element being accessed. Static arrays have three disadvantages:
Indices are not checked before accessing an array. Such out-of-bound reads can generate
incorrect program results while out-of-bound writes can cause undefined program behavior.
The array size must be known at compile time. For instance, static arrays will not work when
a sequence of unknown number of integers have to be read from a file.
When the static array is passed to a function, the array size is lost because of the array-
name-to-pointer-to-first-element conversion. This means that we need to pass functions not
only the array (more correctly, a pointer) but also the number of elements that must be
handled by the function.
The first and third problems are solved in C++ by the introduction of array type. This type has a
member function array::at that performs runtime bounds checking on indices albeit at the cost
of diminished program performance. Here is a simple program that illustrates the array type:
The standard C++ library provides the std::vector type as an alternative to dynamically
allocating arrays on the free store using operator new[] . The vector data type is an abstraction
of the dynamic array and is the simplest and most common way to represent data in C++. Since a
vector holds many values, we call them containers. A vector has the ability to expand or shrink
to hold as many elements as required. An element can be inserted at a specified position without
the programmer having to shift other elements "down". An element can be removed from a
specified position without writing code to shift other elements "up" to fill the resulting hole(s). The
management of dynamically allocated memory is completely abstracted away from users - the
appropriate memory is allocated and released automatically.
This introduction will expose you to the following basic concepts about vector s:
Defining vector s
To define objects of type vector , programmers must include the appropriate header file, as in:
Since vector data type is implemented as a class template, the element type must be specified in
angle brackets:
1 // t is an empty container of int values
2 // this is called default construction because the user has not provided
3 // any information about how to initialize the object
4 std::vector <int> t;
5 // u is container of 4 float elements whose values are initialized to 0.0f
6 std::vector<float> u(4);
7 // v is container of 5 double elements whose values are initialized to 1.1
8 std::vector <double> v(5, 1.1);
9 // w is a container of 6 int elements initialized with
10 // values in initialization list
11 std::vector<int> w {11, 21, 31, 41, 51, 1};
A vector is simply a sequence of elements that you can access by an index. For example, here is
the vector called w that was defined above:
That is, the first element has index , the second index , and so on. We refer to an element by
subscripting the name of the vector with the element's index, so here the value of w[0] is 11 ,
the value of w[1] is 21 , and so on. Indices for a vector always start with and increase by .
This should look familiar: the standard library vector is simply the C++ standard library's version
of an old and well-known idea called array except that programmers using vector s don't have to
worry about low-level details involving dynamic memory allocation and deallocation. The picture
above is drawn so as to emphasize that it "knows its size"; that is, a vector doesn't just store its
elements, it also stores its size.
The entire list of constructor functions are listed here. When a vector object goes out of scope,
the destructor function is automatically invoked - the destructor will return the dynamically
allocated memory that holds the vector 's elements back to the free store.
In addition to primitive types, vector s can hold elements of other types, as long as the element
type is copy constructible and copy assignable.
Recall that static arrays are not first-class objects in both C and C++. Therefore, static arrays
cannot be specified as class template types. However, defining and accessing two-dimensional
arrays is straightforward with vector :
Just as with static arrays, out-of-bounds reads and writes will result in undefined behaviors at
runtime:
Member functions vector::front and vector::back return references to the first and last
elements of the vector , respectively. Here, we use std::swap standard library function
(declared in <utility> ) to swap a vector 's first and last elements:
We can obtain pointers to the first and last elements of a vector by applying the address-of
operator to the reference values returned by front and back , respectively. Here, we use these
pointers to iterate over the container from first-to-last and last-to-first orders:
Using a vector , we can initialize a two-dimensional array of int s and print the table using the
overloaded subscript operator:
1 1
2 11 22
3 33 44 55
4 66 77 88 99
5
Using begin() and end()
We've used subscripts to access elements of a vector and individual char elements of a
string . We can get pointers to first and last members of vector s and string s using member
functions front and back , respectively. These pointers can be used to access individual
elements of vector s and string s:
The C++ standard library provides the concept of iterators that abstract the interface of ordinary
pointers. Just like pointers, iterators give us indirect access to an object. As with pointers, we can
use operators ++ and -- to step forward and backward, respectively. We can use operators ==
and != to determine whether two iterators represent the same position. We can assign iterators
using operator = . We'll study iterators in more detail at a later date in the semester. For now,
we'll use iterators to traverse string s since they simply encapsulate a char * . Likewise, we use
iterators to traverse vector s since they encapsulate an ordinary pointer.
All container classes provide the same member functions that enable them to use iterators to
navigate over their elements. The most important of these functions are as follows:
begin returns an iterator that points to the first element in the container.
end returns an iterator that points to the position after the last element of the container.
Thus, as shown in the picture, member functions begin and end define a half-open range that
includes the first element but excludes the last. A half-open range has two advantages:
There is a simple criterion to terminate loops that iterate over elements of a container.
Loops simply continue as long as function end is not reached.
Empty containers will not require special handling since function begin is equal to function
end .
Rather than using subscripts, the following example demonstrates the use of iterators to traverse
the elements of a string and convert every alphabetic character to uppercase:
1 #include <cctype> // for std::isalpha, std::islower, std::toupper
2
3 std::string s {"Runtime error!!!"};
4 for (std::string::iterator it = s.begin(); it != s.end(); ++it) {
5 if (std::isalpha(*it) && std::islower(*it)) {
6 *it = std::toupper(*it);
7 }
8 }
The following example demonstrates the use of read-only iterators to traverse the elements of a
vector :
The range- for will automatically iterate only over elements in a container. On the other hand,
buffer overflow errors will arise when incorrect subscripts are inadvertently used with the
overloaded operator[] . Therefore, a good way to ensure that subscripts are in range is to avoid
subscripting altogether by using a range- for whenever possible. In fact, range- for loops for a
container are implemented using the container's begin and end member functions. In general,
the following range- for loop
1 for (decl : coll) {
2 statement
3 }
is equivalent to the following, if container provides member functions begin and end :
Member function vector::push_back adds an additional element to the end of object cities ,
thereby increasing the size of cities by :
1 // there are two things going on. first, a temporary unnamed string object
2 // is constructed from string literal "Santiago"
3 // next, the temporary string object is used to construct element 5 of cities
4 cities.push_back("Santiago");
5
6 // the two steps are more explicitly specified in this push_back statement
7 // that creates a temporary string object to initialize element 6 of cities
8 cities.push_back(std::string("Stockholm"));
Member function vector::push_back is useful for filling a vector from an input stream, as in:
1 std::vector<std::string> cities;
2
3 // continue adding strings to vector container until end-of-file is signalled
4 // use CTRL-D to signal end-of-file to standard input stream
5 for (std::string tmp_str; std::cin >> tmp_str; cities.push_back(tmp_str)) {
6 // empty by design
7 }
Member function vector::pop_back removes the last element of the vector effectively
reducing the vector 's size by :
Before calling member function pop_back on a potentially empty vector , it is better to check
whether the container is empty or not by calling member function vector::empty :
Assignments
The vector class overloads the assignment operator = :
Recall that C and C++ functions implement pass-by-value semantics - that is, the value resulting
from argument evaluation is copied on the stack and is then used to initialize the parameter. This
means that argument cities in function call print_vec_str() is used to initialize parameter
vs in the definition of function print_vec_str . We have previously learnt that copy constructor
vector::vector(vector const&) is used to initialize vs with a copy of argument cities .
We've also learnt that destructor vector::~vector() is automatically invoked when parameter
vs goes out of scope when function print_vec_str terminates. Since vector s can encapsulate
large amounts of data, making copies of arguments and their subsequent destruction can
significantly impair the program's performance. Therefore, pass-by-reference semantics must be
used when passing containers to functions to remove unnecessary calls to constructors and
destructors:
We can still do better. In function print_vec_str , we iterate over each element of the vector
container in a read-only mode. We make this intent clear to the compiler and to clients of the
function by using pass-by-read-only-reference semantics:
It makes mathematical sense that to add two vector s, both must have the same number of
elements. Proper continuation of the program is not possible if function add receives two
vector s with different sizes. The principal ways to deal with unexpected behavior in C++ are
exceptions while assertions are C's way of dealing with such behavior. Today, we'll describe the
assert macro and leave C++ exceptions for a later date. The assert macro is declared in the C
header <cassert> . The macro evaluates an expression and when the result is false then the
program is terminated immediately. On lines and , if assertions that the three vector s have
the same lengths is false, the program will be terminated immediately and a diagnostic message
is printed to standard error stream cerr .
A great advantage of assert is that we can let it disappear entirely by a simple macro
declaration. Before including <cassert> define macro NDEBUG :
and all assertions are disabled if we pass macro NDEBUG to the compiler. For gcc , clang , g++ ,
and clang++ compilers, we use the -D flag to pass the macro, as in -NDEBUG . For Microsoft C/C++
compiler, we use the /D flag, as in /DNDEBUG .
Returning vector s from functions
vector s are copyable and can be returned by functions. This allows us to use a more natural
notation. For example, the previously defined function add can make its intent clearer by
returning the resultant vector :
For well designed containers and types, C++ compilers since C++11 can optimize away
unnecessary constructors and destructors even though the function returns a value. Both
string s and vector s fall into the category of well designed types that will not generate
unnecessary constructors and destructors.
Let's begin by authoring overloaded functions PRINT to print string and vector<string>
containers:
min and max algorithms can be used to determine the smaller and larger of two values,
respectively:
The following values are printed to standard output by the min and max algorithms on the
merged container cities :
1 Min of (Seattle, Montreal): Montreal
2 Max of (Seattle, Montreal): Seattle
Unlike min and max algorithms that return values, min_element and max_element return an
iterator pointing to the smallest and largest values in a half-open range:
The following values are printed to standard output by the min_element and max_element
algorithms on the merged container cities :
The swap algorithm (now declared in <utility> ) can be used to swap elements between two
containers:
As the name implies, the reverse algorithm reverses the order of elements in a half-open range:
The find algorithm returns an iterator to the first element in a half-open range equal to a search
value:
To change the default sorting criterion from ascending to descending order, we author a
comparison binary function that accepts two values from the range as parameters and returns a
bool value if the first parameter is greater-than the second parameter:
1. Allocate a new memory block that is some multiple of the vector 's current capacity. In most
implementations, the capacity increases by a factor of two each time, that is, the capacity is
doubled each time the vector must be expanded.
2. Copy all elements from the vector 's old memory into its new memory.
3. Destroy objects in the old memory.
4. Deallocate old memory.
Given all of this allocation, deallocation, copying, and destruction, it shouldn't be of surprise to
learn that these steps can be expensive. Naturally, we don't want to perform these steps any
more than frequently than we have to. Member function vector::reserve can be used to
minimize the number of reallocations that must be performed. Before understanding how
vector::reserve can help reduce reallocations, we must understand four interrelated and
confusing member functions that manage the size and capacity of vector s. Among the standard
containers, only vector and string provide all of these four functions. Although we limit our
discussion to vector s, the principles are also relevant to the efficient and correct use of
string s.
From these descriptions, it should be clear that reallocations (including raw memory allocations
and deallocations, object copying and destruction) will occur whenever an element needs to be
inserted and the container's capacity is insufficient. The key to avoiding reallocations, then, is to
use vector::reserve to set a vector 's capacity to a sufficiently large value as soon as possible,
ideally right after the vector container is constructed. For example, to create a vector<int>
holding the values through without using vector::reserve , we might do it like this:
1 std::vector<int> v;
2 for (size_t i {1}; i <= 1000; ++i) {
3 v.push_back(i);
4 }
In g++ , this code will result in reallocations during the course of the loop while Microsoft's cl
will result in about reallocations. You can check the progress of memory reallocations using
this code:
1 std::vector<int> v;
2 std::vector<int>::size_type curr_cap = v.capacity();
3 std::cout << "size: " << v.size() << " | cap: " << v.capacity() << "\n";
4 for (size_t i {1}; i <= 1000; ++i) {
5 v.push_back(i);
6 if (v.capacity() != curr_cap) {
7 curr_cap = v.capacity();
8 std::cout << "size: " << v.size() << " | cap: " << v.capacity() << "\n";
9 }
10 }
1 std::vector<int> v;
2 v.reserve(1000); // pre-allocate memory for 1000 elements
3 std::vector<int>::size_type curr_cap = v.capacity();
4 std::cout << "size: " << v.size() << " | cap: " << v.capacity() << "\n";
5 for (size_t i {1}; i <= 1000; ++i) {
6 v.push_back(i);
7 if (v.capacity() != curr_cap) {
8 curr_cap = v.capacity();
9 std::cout << "size: " << v.size() << " | cap: " << v.capacity() << "\n";
10 }
11 }
There are two common ways to use vector::reserve to avoid unneeded reallocations. The first
is applicable when the size of the vector container is known exactly or approximately. In that
case, as shown in the code above, the appropriate amount of space can be reserved in advance.
The second way is to reserve the maximum space the program might ever need, then, once all the
data has been added, excess capacity is trimmed off using member function shrink_to_fit :
1 std::vector<int> v;
2 v.reserve(10'000); // pre-allocate memory for 10,000 elements
3 std::vector<int>::size_type curr_cap = v.capacity();
4 std::cout << "size: " << v.size() << " | cap: " << v.capacity() << "\n";
5 for (size_t i {1}; i <= 1000; ++i) {
6 v.push_back(i);
7 if (v.capacity() != curr_cap) {
8 curr_cap = v.capacity();
9 std::cout << "size: " << v.size() << " | cap: " << v.capacity() << "\n";
10 }
11 }
12 // we trim off excess capacity ...
13 v.shrink_to_fit();
14 std::cout << "size: " << v.size() << " | cap: " << v.capacity() << "\n";