<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Olivier Coudert&#039;s Blog &#187; C++</title>
	<atom:link href="http://www.ocoudert.com/blog/tag/c/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ocoudert.com/blog</link>
	<description>My take on tech --and other topics</description>
	<lastBuildDate>Sat, 21 Jan 2012 20:30:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>A practical guide to C++ serialization</title>
		<link>http://www.ocoudert.com/blog/2011/07/09/a-practical-guide-to-c-serialization/</link>
		<comments>http://www.ocoudert.com/blog/2011/07/09/a-practical-guide-to-c-serialization/#comments</comments>
		<pubDate>Sun, 10 Jul 2011 04:52:41 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=1168</guid>
		<description><![CDATA[CodeProject In a nutshell, serialization consists of writing data and objects on a support (a file, a buffer, a socket), so that they can be reconstructed later in the memory of the same or another computing host. The reconstruction process is also known as deserialization. Serializing a primitive type like a bool, int, or float, [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2011/07/09/a-practical-guide-to-c-serialization/">A practical guide to C++ serialization</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a><br />
In a nutshell, serialization consists of writing data and objects on a support (a file, a buffer, a socket), so that they can be reconstructed later in the memory of the same or another computing host. The reconstruction process is also known as deserialization.</p>
<p>Serializing a primitive type like a bool, int, or float, is trivial: just write the data as it is (assuming that no compression is used). Serializing a pointer is different: the object it points to must be serialized first. That way deserializing the pointer simply consists of setting its value to the memory address at which the object has been reconstructed.</p>
<p>We can distinguish three levels of complexity in serialization, depending on how complex the pointer (and reference) graph is:</p>
<ol>
<li>The pointer graph is a <em>forest</em> (i.e., a set of <em>trees</em>). Data can simply be serialized bottom up with a depth first traversal of the trees.</li>
<li>The pointer graph is a <em>directed acyclic graph</em> (DAG), i.e., a graph without loop. We can still serialize the data bottom up, making sure we write and restore shared data only once.</li>
<li>The pointer graph is a general graph, i.e., it may have loops. We need to write and restore data with forward references so that loops are handled properly.</li>
</ol>
<p>&nbsp;</p>
<div class="mceTemp mceIEcenter" style="text-align: center;">
<dl id="attachment_1169" class="wp-caption aligncenter" style="width: 610px;">
<dt class="wp-caption-dt"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/pointers-graph.png"><img class="size-full wp-image-1169 " title="pointers graph" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/pointers-graph.png" alt="" width="600" height="362" /></a></dt>
<dd class="wp-caption-dd">Pointer graph as a tree, a DAG, and with loops</dd>
</dl>
</div>
<p>&nbsp;</p>
<p>It is always an option to serialize objects using your own customized code. However serialization is much more complex than a simple pretty-print method. One would like serialization to support the following features:</p>
<ol>
<li>Serialization should be able to handle any pointer graph (i.e., with loops).</li>
<li>Serializing a pointer or a reference should automatically trigger the serialization of the referred object.</li>
<li>Serializing an entire data model can require a lot of code –from simple scalar fields (bool, int, float), to containers (vector, list, hash table, etc), to intricate data structures (graph, quad-tree, sparse matrices, etc). One would like templates that carry most of the burden.</li>
<li>The save and load functions must always be in sync: if the ‘save’ function is modified, the ‘load’ function must be changed appropriately. One would like that process to be automated as much as possible.</li>
<li>One should have a way of serializing objects without changing their .hpp files –this is known as non-intrusive serialization. The reason is that in many case one does not want (or one cannot) change the source files of existing libraries.</li>
<li>Serialization needs to support versioning. As objects evolve, data members are added or removed, and it is desirable to be back compatible –meaning, one can still deserialize archives from older versions into the most recent data model.</li>
<li>Serialization should be cross-platform compatible (32 and 64 bits machines, Windows, Linux, Solaris, etc).</li>
</ol>
<p>The boost library provides a serialization that meets all the requirements above, and more:</p>
<ul>
<li>It is extremely efficient, it supports versioning, and it automatically serializes STL containers.</li>
<li>Serialization (the save function) and deserialization (the load function) are expressed with one single template, which reduces the size of the code, and resolves the synchronization problem.</li>
<li>With a little bit of help, boost serialization is also 32 and 64 bit compatible, which means that a database serialized on a 32 bit machine can be read on a 64 bit machine <em>and conversely</em>.</li>
<li>Also boost serialization (respectively deserialization) takes an output (respectively input) argument that is very similar to a std::ostream (respectively std::istream), meaning that it can be a file on a disk, a buffer, or a socket. You can literally serialize your data over a network.</li>
</ul>
<p>The best way to understand how to serialize with boost is to walk through increasingly complex serialization scenarios.</p>
<h2>Basic serialization</h2>
<p>The code for serialization, as well as an example that saves and restores simple objects, is given below.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once
// File obj.hpp

// Forward declaration of class boost::serialization::access
namespace boost {
namespace serialization {
class access;
}
}

class Obj {
public:
  // Serialization expects the object to have a default constructor
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }
private:
  int  d1_;
  bool d2_;

  // Allow serialization to access non-public data members.
  friend class boost::serialization::access;

  template&lt;typename Archive&gt;
  void serialize(Archive&amp; ar, const unsigned version) {
    ar &amp; d1_ &amp; d2_;  // Simply serialize the data members of Obj
  }
};
</small></pre>
<p>The template ‘serialize’ defines both the save and load. This is achieved because the operator ‘&amp;’ will be defined as ‘&lt;&lt;’ (respectively ‘&gt;&gt;’) for an output (respectively input) archive. Note the friend declaration to allow the save/load template to access the private data members of the objects. Also note that serialization expects the object to have a default constructor (which can be private).</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include "obj.hpp"
#include &lt;assert.h&gt;
#include &lt;fstream&gt;
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

int main() {
  const char* fileName = "saved.txt";

  // Create some objects
  const Obj o1(-2, false);
  const Obj o2;
  const Obj o3(21, true);
  const Obj* const p1 = &amp;o1;

  // Save data
  {
    // Create an output archive
    std::ofstream ofs(fileName);
    boost::archive::text_oarchive ar(ofs);

    // Write data
    ar &amp; o1 &amp; o2 &amp; o3 &amp; p1;
  }

  // Restore data
  Obj restored_o1;
  Obj restored_o2;
  Obj restored_o3;
  Obj* restored_p1;
  {
    // Create and input archive
    std::ifstream ifs(fileName);
    boost::archive::text_iarchive ar(ifs);

    // Load data
    ar &amp; restored_o1 &amp; restored_o2 &amp; restored_o3 &amp; restored_p1;
  }

  // Make sure we restored the data exactly as it was saved
  assert(restored_o1 == o1);
  assert(restored_o2 == o2);
  assert(restored_o3 == o3);
  assert(restored_p1 != p1);
  assert(restored_p1 == &amp;restored_o1);

  return 0;
}
</small></pre>
<p>In main.cpp, we first include the files declaring the input and output text archives, where objects will be loaded from and saved to, respectively. We create an output archive (here, a file on a disk), and write three instances of class Obj, as well as a pointer to one of the instances. We then read them back and make sure we restore the data as they were. Note how the restored pointer restored_p1 points to the restored object restored_o1.</p>
<h2>More on pointer serialization</h2>
<p>Whenever we call serialization on a pointer (or reference), this triggers the serialization of the object it points to (or refers to) whenever necessary. So we do not need to explicitly serialize pointed objects as boost serialization will make sure the appropriate objects reached in the pointers graph are serialized.</p>
<p>For instance, the code below shows that serializing the pointer p1 triggers the serialization of o1, the object it point to. When restoring the pointer restored_p1, we automatically create a clone of the object o1.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include "obj.hpp"
#include &lt;assert.h&gt;
#include &lt;fstream&gt;
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

int main()
{
  const char* fileName = "saved.txt";

  // Create one object o1.
  const Obj o1(-2, false);
  const Obj* const p1 = &amp;o1;

  // Save data
  {
    // Create an output archive
    std::ofstream ofs(fileName);
    boost::archive::text_oarchive ar(ofs);
    // Save only the pointer. This will trigger serialization
    // of the object it points too, i.e., o1.
    ar &amp; p1;
  }

  // Restore data
  Obj* restored_p1;
  {
    // Create and input archive
    std::ifstream ifs(fileName);
    boost::archive::text_iarchive ar(ifs);
    // Load
    ar &amp; restored_p1;
  }

  // Make sure we read exactly what we saved.
  assert(restored_p1 != p1);
  assert(*restored_p1 == o1);

  return 0;
}
</small></pre>
<p>When deserializing a pointer, the object it points to will be automatically deserialized if this object has not been deserialized yet. This means that one should not attempt to deserialize an object <em>after</em> a pointer to this object has been deserialized. The reason is that once the pointer deserialization has forced the object deserialization, one cannot rebuild this object at a different address.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include "obj.hpp"
#include &lt;fstream&gt;
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

int main()
{
  const char* fileName = "saved.txt";
  std::ofstream ofs(fileName);

  // Create one object o1 and a pointer p1 to that object.
  const Obj o1(-2, false);
  const Obj* const p1 = &amp;o1;

  // Serialize object, then pointer.
  // This works fine: after the object is deserialized, we can
  // deserialize the pointer by assigning it to the object’s address.
  {
    boost::archive::text_oarchive ar(ofs);
    ar &amp; o1 &amp; p1;
  }

  // Serialize pointer, then object.
  // This does not work: once p1 has been serialized, the object
  // has already been deserialized and its address cannot change.
  // This will throw an instance of 'boost::archive::archive_exception'
  // at runtime.
  {
    boost::archive::text_oarchive ar(ofs);
    ar &amp; p1 &amp; o1;
  }

  return 0;
}
</small></pre>
<p>In the example above, the second serialization will result in a runtime error:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>ocoudert@MyMacBookPro $ a.out
terminate called after throwing an instance of 'boost::archive::archive_exception'
    what():  pointer conflict
Abort trap
coudert@MyMacBookPro $
</small></pre>
<p>This means that when pointers need to be serialized, we should never explicitly serialize the objects they point to.</p>
<h2>Explicit save and load function definitions</h2>
<p>We need an explicit definition of the save and load functions whenever they are not fully symmetric. This is typical when versioning is involved. Note the use of the macro BOOST_SERIALIZATION_SPLIT_MEMBER(), which is responsible for calling save/load when using an output/input archive.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

#include &lt;boost/serialization/split_member.hpp&gt;

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }

private:
  int  d1_;
  bool d2_;

  friend class boost::serialization::access;

  template&lt;class Archive&gt;
  void save(Archive &amp; ar, const unsigned int version) const {
    ar &amp; d1_ &amp; d2_;
  }

  template&lt;class Archive&gt;
  void load(Archive &amp; ar, const unsigned int version) {
    ar &amp; d1_ &amp; d2_;
  }

  BOOST_SERIALIZATION_SPLIT_MEMBER()
};
</small></pre>
<h2>Serialization of C-strings</h2>
<p>A C-string cannot be directly serialized because it assumes a specific interpretation of a char*, namely an array of char terminated by a null character (‘\0’). Thus we need to explicitly serialized C-string. The class below is a simple helper to serialize C-strings (note that this can be optimized by avoiding the construction of the sdt::string).</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once
// File SerializeCStringHelper.hpp

#include &lt;string&gt;
#include &lt;boost/serialization/string.hpp&gt;
#include &lt;boost/serialization/split_member.hpp&gt;

class SerializeCStringHelper {
public:
  SerializeCStringHelper(char*&amp; s) : s_(s) {}
  SerializeCStringHelper(const char*&amp; s) : s_(const_cast&lt;char*&amp;&gt;(s)) {}

private:

  friend class boost::serialization::access;

  template&lt;class Archive&gt;
  void save(Archive&amp; ar, const unsigned version) const {
    bool isNull = (s_ == 0);
    ar &amp; isNull;
    if (!isNull) {
      std::string s(s_);
      ar &amp; s;
    }
  }

  template&lt;class Archive&gt;
  void load(Archive&amp; ar, const unsigned version) {
    bool isNull;
    ar &amp; isNull;
    if (!isNull) {
      std::string s;
      ar &amp; s;
      s_ = strdup(s.c_str());
    } else {
      s_ = 0;
    }
  }

  BOOST_SERIALIZATION_SPLIT_MEMBER();

private:
  char*&amp; s_;
};
</small></pre>
<p>A simple example of its usage is as follows.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include "SerializeCStringHelper.hpp"
#include &lt;assert.h&gt;
#include &lt;fstream&gt;
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

int main()
{
  const char* fileName = "saved.txt";
  const char* str = "This is an example a C-string";

  // Save data
  {
    // Create an output archive
    std::ofstream ofs(fileName);

    boost::archive::text_oarchive ar(ofs);
    // Save
    SerializeCStringHelper helper(str);
    ar &amp; helper;
  }

  // Restore data
  char* restored_str;
  {
    // Create and input archive
    std::ifstream ifs(fileName);
    boost::archive::text_iarchive ar(ifs);

    // Load
    SerializeCStringHelper helper(restored_str);
    ar &amp; helper;
  }

  // Make sure we read exactly what we saved
  assert(restored_str!= str);
  assert(strcmp(restored_str, str) == 0);

  return 0;
}
</small></pre>
<h2>Non-intrusive serialization</h2>
<p>So far the serialization code is added in the class definition. A non-intrusive serialization, outside of the class, might be preferable. For instance we would like to serialize a class from a library without altering the library’s hpp file. This is easy when the data members are public:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }

public:
  int  d1_;
  bool d2_;
};

namespace boost {
namespace serialization {

template&lt;typename Archive&gt;
void serialize(Archive&amp; ar, Obj&amp; o, const unsigned int version) {
  ar &amp; o.d1_ &amp; o.d2_;
}

} // namespace serialization
} // namespace boost
</small></pre>
<p>If we want to protect the data members, the code is a bit more complicated because the serialization template needs to be declared as a friend. This requires a forward declaration of the template.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

//// Declaration of the template
class Obj;

namespace boost {
namespace serialization {

template&lt;typename Archive&gt;
void serialize(Archive&amp; ar, Obj&amp; o, const unsigned int version);

} // namespace serialization
} // namespace boost

//// Definition of the class
class Obj {
public
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
}

private:
  int  d1_;
  bool d2_;

  // Allow serialization to access data members.
  template&lt;typename Archive&gt; friend
  void boost::serialization::serialize(Archive&amp; ar, Obj&amp; o, const unsigned int version);
};

//// Definition of the template
namespace boost {
namespace serialization {

template&lt;typename Archive&gt;
void serialize(Archive&amp; ar, Obj&amp; o, const unsigned int version) {
ar &amp; o.d1_ &amp; o.d2_;
}

} // namespace serialization
} // namespace boost
</small></pre>
<h2>Non-intrusive explicit save and load function definitions</h2>
<p>This combines the two previous serialization styles, except that the include file and macro are different. For the sake of simplicity, we give the version for public data members.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

#include &lt;boost/serialization/split_free.hpp&gt;

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }

public:
  int  d1_;
  bool d2_;
};

namespace boost {
namespace serialization {

template&lt;class Archive&gt;
void save(Archive &amp; ar, const Obj&amp; o, const unsigned int version) {
  ar &amp; o.d1_ &amp; o.d2_;
}

template&lt;class Archive&gt;
void load(Archive &amp; ar, Obj&amp; o, const unsigned int version) {
  ar &amp; o.d1_ &amp; o.d2_;
}

} // namespace serialization
} // namespace boost

BOOST_SERIALIZATION_SPLIT_FREE(Obj)

</small></pre>
<h2>Serialization of STL containers</h2>
<p>The boost library comes with templates to automatically serialize STL containers, as well as some STL objects (e.g., std::string). Instead of saving/loading a vector with the following code:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>template&lt;typename Archive&gt;
void save(Archive&amp; ar, const std::vector&lt;Obj&gt;&amp; objs, const unsigned version) {
  ar &lt;&lt; objs.size();
  for (size_t i = 0; i &lt; objs.size(); ++i) {
    ar &lt;&lt; objs[i];
  }
}

template&lt;typename Archive&gt;
void load(Archive&amp; ar, std::vector&lt;Obj&gt;&amp; objs, const unsigned version) {
  size_t size;
  ar &gt;&gt; size;
  objs.resize(size);
  for (size_t i = 0; i &lt; size; ++i) {
    ar &gt;&gt; objs[i];
  }
}
</small></pre>
<p>One simply writes:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include &lt;boost/serialization/vector.hpp&gt;

template&lt;typename Archive&gt;
void serialize(Archive&amp; ar, std::vector&lt;Obj&gt;&amp; objs, const unsigned version) {
  ar &amp; objs;
}
</small></pre>
<p>All the STL containers are supported using the appropriate include files:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include &lt;boost/serialization/array.hpp&gt;
#include &lt;boost/serialization/vector.hpp&gt;
#include &lt;boost/serialization/hash_map.hpp&gt;
#include &lt;boost/serialization/hash_set.hpp&gt;
#include &lt;boost/serialization/list.hpp&gt;
#include &lt;boost/serialization/slist.hpp&gt;
#include &lt;boost/serialization/map.hpp&gt;
#include &lt;boost/serialization/set.hpp&gt;
#include &lt;boost/serialization/bitset.hpp&gt;
#include &lt;boost/serialization/string.hpp&gt;
</small></pre>
<h2>Serialization of base class</h2>
<p>When a class inherits from another, the base class needs to be serialized as well.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include &lt;boost/serialization/base_object.hpp&gt;

class Base {
public:
  Base() : c_('\0') {}
  Base(char c) : c_(c) {}
  bool operator==(const Base&amp; o) const { return c_ == o.c_; }

private:
  char c_;

  friend class boost::serialization::access;

  template &lt;typename Archive&gt;
  void serialize(Archive&amp; ar, const unsigned version) {
    ar &amp; c_;
  }
};

class Obj : public Base {
private:
  typedef Base _Super;
public:
  Obj() : _Super(), d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : _Super('a'), d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return _Super::operator==(o) &amp;&amp; d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }

private:
  int  d1_;
  bool d2_;

  friend class boost::serialization::access;

  template &lt;typename Archive&gt;
  void serialize(Archive&amp; ar, const unsigned version) {
    ar &amp; boost::serialization::base_object&lt;_Super&gt;(*this);
    ar &amp; d1_ &amp; d2_;
  }
};
</small></pre>
<h2>Versioning</h2>
<p>We want maintain back-compatibility when the class Obj evolves. For instance, if a new data member ‘ID_’ is added, we want to read an old archive and build new Obj, with the missing data member taking the default value.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

#include &lt;boost/serialization/split_member.hpp&gt;
#include &lt;boost/serialization/version.hpp&gt;

class Obj {
public:
  Obj() : d1_(-1), d2_(false), ID_(0) {}
  Obj(int d1, bool d2, unsigned ID id) : d1_(d1), d2_(d2), ID_(id) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_ &amp;&amp; ID_ == o.ID_;
  }

private:
  int  d1_;
  bool d2_;
  unsigned ID_;

  friend class boost::serialization::access;

  template&lt;class Archive&gt;
  void save(Archive &amp; ar, const unsigned int version) const {
    ar &amp; d1_ &amp; d2_ &amp; ID_;
  }

  template&lt;class Archive&gt;
  void load(Archive &amp; ar, const unsigned int version) {
    ar &amp; d1_ &amp; d2_;
    // If archive’s version is 0 (i.e., is old), ID_ keeps
    // its default value from the new data model,
    // else we read ID_’s value from the archive.
    if (version &gt; 0) {
      ar &amp; ID_;
    }
  }

  BOOST_SERIALIZATION_SPLIT_MEMBER()

};
</small></pre>
<h2>Serialization of const data or objects</h2>
<p>Attempting to serialize a const data or object triggers a long trail of error messages, which includes something that looks like:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>[snip]

/opt/local/include/boost/archive/detail/check.hpp:162: error:
  invalid application of ‘sizeof’ to incomplete type ‘boost::STATIC_ASSERTION_FAILURE&lt;false&gt;‘

[snip]

/opt/local/include/boost/archive/basic_text_iprimitive.hpp:88: error:
  ambiguous overload for ‘operator&gt;&gt;‘ in
  ‘((boost::archive::basic_text_iprimitive&lt;std::basic_istream&lt;char,
       std::char_traits&lt;char&gt; &gt; &gt;*)this)-&gt;boost::archive::basic_text_iprimitive&lt;std::basic_istream&lt;char,
         std::char_traits&lt;char&gt; &gt; &gt;::is &gt;&gt; t’
</small></pre>
<p>This means that the input archive expects the recipient of the data to be non-const. Thus const data members must be const_cast&lt;&gt;()’ed to be serialized. For example:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}

private:
  const int d1_;
  bool d2_;

  // Allow serialization to access data members.
  friend class boost::serialization::access;

  template&lt;typename A&gt;
  void serialize(A&amp; ar, const unsigned version) {
    ar &amp; const_cast&lt;int&amp;&gt;(d1_) &amp; d2_;
  }
};
</small></pre>
<h2>Text, XML, and binary archives</h2>
<p>The text archive is an ASCII file that is somewhat human readable. There are other archive types available in boost/archive/*.hpp, e.g.:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>// Text archive that defines boost::archive::text_oarchive
// and boost::archive::text_iarchive
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

// XML archive that defines boost::archive::xml_oarchive
// and boost::archive::xml_iarchive
#include &lt;boost/archive/xml_oarchive.hpp&gt;
#include &lt;boost/archive/xml_iarchive.hpp&gt;

// XML archive which uses wide characters (use for UTF-8 output ),
// defines boost::archive::xml_woarchive
// and boost::archive::xml_wiarchive
#include &lt;boost/archive/xml_woarchive.hpp&gt;
#include &lt;boost/archive/xml_wiarchive.hpp&gt;

// Binary archive that defines boost::archive::binary_oarchive
// and boost::archive::binary_iarchive
#include &lt;boost/archive/binary_oarchive.hpp&gt;
#include &lt;boost/archive/binary_iarchive.hpp&gt;
</small></pre>
<p>The text and XML archives are portable across 32 and 64 bits platforms.</p>
<p>Having a binary archive that is portable between 32 and 64 bits is not trivial, because C++ does not specify exactly the size of primitive types. For instance a long is usually 4 bytes on a 32 bits machine, and 8 bytes on a 64 bits machine. In practice though it is pretty portable –there is a non-official version for a portable binary archive.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2011/07/09/a-practical-guide-to-c-serialization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to make software deterministic</title>
		<link>http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/</link>
		<comments>http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/#comments</comments>
		<pubDate>Mon, 30 May 2011 17:04:40 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=1247</guid>
		<description><![CDATA[CodeProject A program is deterministic, or repeatable, if it produces the very same output when given the same input no matter how many times it is run. Refining this definition, we should consider whether a program produces the same result on any platform (32 and 64 bits machines, running Windows, Mac OS, Linux, Solaris, etc). [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/">How to make software deterministic</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display:none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a><br />
A program is deterministic, or repeatable, if it produces the very same output when given the same input no matter how many times it is run.</p>
<p>Refining this definition, we should consider whether a program produces the same result on any platform (32 and 64 bits machines, running Windows, Mac OS, Linux, Solaris, etc). Or whether the program is insensitive to the form of its inputs. For example, the problem of generating the shortest route to visit all the capitals of Europe should not depend on how the map of Europe is entered, nor it should depend on which language is used to name the capitals.</p>
<p>Determinism is obviously very desirable. For the user, a non-deterministic program can be confusing and frustrating. For the developer, a non-deterministic program is extremely hard to test and debug, since bugs and specific configuration cannot be easily reproduced.</p>
<p>Repeatability looks like a given for most applications. For instance, if we add two numbers in a spreadsheet, we expect the same result no matter how many times we perform this operation and regardless of the platform we run on (PC, Mac, etc). Or if we run a spell checker several times, we expect it to flag the very same errors.</p>
<p>But it is not that obvious for more complex applications. This is especially true when there are multiple solutions to a problem, or when heuristics are used to produce a result –because an exact solution is too computationally expensive. For example, it is not uncommon to see slightly different outcomes when running the same EDA synthesis or P&amp;R tool on the same input several times.</p>
<p>Even more, a user would like to see the same result when only minor changes are applied to the input. For instances, running a P&amp;R tool on two netlists that differ only by the names of their cells should produce exactly the same result –a P&amp;R tools should produce a result that only depend on the netlist structure. But experience shows that industrial synthesis and P&amp;R tool does not meet that requirement. Closest to software, it is not uncommon to generate slightly different object codes with gcc by changing the names of a few variables.</p>
<p>Among the causes of non-deterministic response, we can distinguish the following types:</p>
<ol>
<li>A <a href="#random">random number</a> generator</li>
<li>Reading an <a href="#uninitialized">uninitialized</a> data</li>
<li>A <a href="#race">race condition</a> on concurrent threads</li>
<li>An <a href="#unordered">unordered iteration</a> that is assumed ordered</li>
<li>A total order that depends on <a href="#memory_address">memory address</a></li>
<li>A total order that depends on <a href="#time_stamp">time stamps</a></li>
<li>A total order that depends on a <a href="#non_canonical_labelling">non-canonical labeling</a></li>
</ol>
<h4><strong><a name="random"></a>1. Random number generator</strong></h4>
<p>There are a lot of applications that use stochastic processes (e.g., simulated annealing, genetic algorithms, Monte-Carlo simulations), but that we would like to be repeatable. Using a pseudorandom number generator with a known seed makes possible to reproduce the same long sequence of seemingly random numbers over and over again.</p>
<p>Note that some applications (e.g., gaming, cryptography, statistical sampling) <em>require</em> a non-deterministic behavior. In that case the seed of the random number generator must be an always-changing value, for example the host’s current time.</p>
<p>There are more deliberate efforts to produce true random values by relying on natural, chaotic events. For instance <a title="Lavarand" href="http://www.lavarnd.org/">Lavarand</a> produces random numbers by hashing the frames of a video stream of lava lamps. <a href="http://www.fourmilab.ch/hotbits/">HotBits</a> generates random bits by timing successive pairs of radioactive decays detected by a Geiger-Müller tube interfaced to a computer. <a title="Random.org" href="http://www.random.org/">Random.org</a> uses variations in the amplitude of atmospheric noise recorded with a normal radio.<strong> </strong><strong> </strong></p>
<h4><strong><a name="uninitialized"></a>2. Uninitialized or random data read</strong></h4>
<p>Initialized data may not exist in languages that have systematic default values and no memory management control, as opposed to high performance languages like C/C++.</p>
<p>Finding and fixing this kind of issues is relatively simple. For instance, tools like <a href="http://www-01.ibm.com/software/awdtools/purify/">Purify</a> and <a href="http://valgrind.org/">Valgrind</a> can report when a C/C++ code reads arbitrary values in memory. To use Purify’s terminology, such errors are UMR (Uninitialized Memory Read), ABR (Array Bound Read, i.e., dereferencing an array outside of its bounds), and FMR (Free Memory Read). These defects all consist in reading some random value in memory.  The code below illustrates some of these errors.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include
#include 

int main() {
  bool large;
  int size = 100;

  if (large) {                    // UMR
    size *= 10;
  }

  char* const str    = new char(size);
  char* const middle = str + size/2;
  // Set the end-of-string.
  str[size - 1] = '\0';
  // The loop intends to fill up the string str with 'a'.
  // But this loop is faulty, because *++c is used instead of *c++.
  char* c = str;
  for (int i = 1; i &lt; size; ++i, *++c = 'a');

  printf("%c\n", str[0]);         // UMR
  printf("%c\n", str[1]);         // Ok, will print 'a'.
  printf("%c\n", str[2]);         // Ok, will print 'a'.
  printf("%c\n", str[size - 1]);  // Ok, but will print 'a'
                                  // instead of the expected '\0'.
  printf("%c\n", str[size]);      // ABR
  printf("%lu\n", strlen(str));   // UMR, because we overwrote
                                  // the final '\0' in the loop.
  delete [] str;
  printf("%c\n", *middle);        // FMR

  return 0;
}
</small></pre>
<p>Note that an ABW (Array Bound Write, i.e., writing outside of an array’s bounds), FMW (Free Memory Write), FNH (Freeing Non-Heap memory) and FUM (Freeing Unallocated Memory), although severe bugs also reported by dynamic analysis tools, are not an original source of non-determinism: they consistently reproduce the same bug.<strong> </strong><strong> </strong></p>
<h4><strong><a name="race"></a>3. Thread races</strong></h4>
<p>Thread races are difficult to detect, and fixing them can be very costly. A typical example is when one thread writes a value at some address, and another thread reads the value at that address. Depending on which thread access the address first, the outcome of the program will be different. Two threads performing a non-atomic write at the same address simultaneously results in some unpredictable value.</p>
<p>One can use a mutex to prevent conflicting read/write for non-atomic operations. But racing threads (e.g., who reads/writes first) must be resolved with synchronization, which can be quite complicated. Moreover it can hurt performances.</p>
<h4><strong><a name="unordered"></a>4. Iteration on unordered data</strong></h4>
<p>Iterating data with some random order can make a program non repeatable. This pattern is often encountered, and is easy to fix.</p>
<p>For example an algorithm produces a result via a visitor that assumes a total order. The developer uses an incorrect visitor, which enumerates data in a random order, usually depending on the memory allocation of the data container. E.g., instead of using a <span style="font-family: courier;">std::set</span> as a container, the developer uses a <span style="font-family: courier;">std::hash_set</span> (or a <span style="font-family: courier;">tr1::unordered_set</span> instead of a <span style="font-family: courier;">tr1::ordered_set</span>). Forcing a total order on the data fixes the problem.</p>
<p>Note that the fix may be incomplete if it simply transforms a type (4) non-determinism into a type (5) or (6) non-determinism, which we discuss below.</p>
<h4><strong><a name="memory_address"></a>5. Ordering by pointer value</strong></h4>
<p>This type of non-determinism is extremely common. For instance, a developer uses a <span style="font-family: courier;">tr1::ordered_set</span> as a container of pointers, and feels that the visitor is deterministic. It is indeed deterministic, but only w.r.t. the memory addresses allocated to the data, which depend on factors out of the application’s control.</p>
<p>One way of addressing the problem is to force a specific memory addressing scheme, but that requires a very fine control of the memory allocator and is therefore complicated.  A more common way consists in assigning a unique ID to an object at the time of its creation. The ID can be a 32 bit unsigned integer that is incremented for every new object. A total ordering, independent from the objects’ actual memory addresses, is then obtained from the IDs. It is a simple solution, as long as one can afford the extra 4 bytes for every object. ID-based sorting with no memory penalty can be obtained using custom memory allocators.</p>
<h4><strong><a name="time_stamp"></a>6. Ordering by time stamps</strong></h4>
<p>Note though that the total ID-based ordering described above is exactly the order of creation of the objects. It is no different from an order that depends on time stamps. Therefore two equal sets of inputs that only differ in the order will be visited in a different order, which can lead to different results. This leads us to the type (7) of non-determinism.</p>
<h4><strong><a name="non_canonical_labelling"></a>7. Ordering induced by a non-canonical labeling</strong></h4>
<p>Type (7) non-determinism often goes unrecognized, or is simply ignored. The idea is that as long as the same input is given to a program (but possibly in a different order or form), the output should be the same. If the input can somehow be normalized to a form that captures the notion of “same input”, then the program can be made insensitive to the format of the input. That is of course assuming that the normalization process run time penalty is not too high.</p>
<p>This normalization process is better defined as canonization. Formally, let O be a set of objects, and let EQ be an equivalence relation that captures the notion of “same” on these objects. A function Canon maps an object onto its canonical form, and is such that for any two objects o1 and o2, o1 and o2 are the same (i.e., o1 EQ o2) if and only if Canon(o1) = Canon(o2).</p>
<p>For instance, a set of integers can be represented by a number of containers (a list, an array, a hash set, a binary tree, etc). A canonical form can simply consist in sorting the integers. Two sets that are equal because they contain the very same integers, but that are initially given in different orders and forms, will end up in the same canonical form. Since sorting is an O(n log n) algorithm, this is an efficient canonization.</p>
<p>Canonization can be much more costly. A Boolean function can be represented in many ways, e.g., with a truth table, a Conjunctive Normal Form (CNF), a Disjunctive normal form (DNF), a decision diagram, etc (see below). Boolean function canonization is at least NP-hard, since it solves the satisfiability problem (SAT). In practice this means that Boolean function canonization algorithms have an exponential complexity.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/Boolean-function.png"><img class="aligncenter size-full wp-image-1257" title="Boolean function" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/Boolean-function.png" alt="" width="500" /></a></p>
<p>&nbsp;</p>
<p>Canonization can also be elusive. Let us consider the problem of drawing a graph in some aesthetic way (e.g., such that the nodes are evenly distributed and such that there is a minimum number of crossing edges). One would like the graph to be drawn the very same way, regardless of its representation (adjacency list or adjacency matrix), and regardless of the order the adjacency information is given. For instances the three graphs below, although looking different, are exactly the same, and can be drawn without any edge crossing as shown on the right side.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/graph-drawing-2.png"><img class="size-full wp-image-1259 aligncenter" title="graph drawing 2" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/graph-drawing-2.png" alt="" width="500" /></a></p>
<p>&nbsp;</p>
<p>Graph canonization is also known as graph labeling. It is at least as hard as graph isomorphism, one of these rare problems that are in NP but that are not known to be NP-complete or in P. Although all existing graph canonization algorithm have an exponential worst-case complexity, it is believed that graph canonization can be done in polynomial time.</p>
<h4><strong>Conclusion </strong></h4>
<p><strong> </strong></p>
<p>The most common  cause for non-determinism is related to some unreliable data order. The ultimate solution to make a program insensitive to the form of its input is to canonize its input as a pre-processing step. This proves to be a challenging and costly task in some cases. Whenever possible, canonization (or some imperfect normalization) goes a long way to make the application consistently repeatable.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is software quality?</title>
		<link>http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/</link>
		<comments>http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/#comments</comments>
		<pubDate>Sun, 10 Apr 2011 06:21:24 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=1125</guid>
		<description><![CDATA[CodeProject The quality of software is assessed by a number of variables. These variables can be divided into external and internal quality criteria. External quality is what a user experiences when running the software in its operational mode. Internal quality refers to aspects that are code-dependent, and that are not visible to the end-user. External [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/">What is software quality?</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043" rel="tag">CodeProject</a><br />
<a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/04/qualityassurance.jpg"><img class="alignright size-medium wp-image-1156" title="qualityassurance" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/04/qualityassurance-300x200.jpg" alt="" width="300" height="200" /></a>The quality of software is assessed by a number of variables. These variables can be divided into external and internal quality criteria. External quality is what a user experiences when running the software in its operational mode. Internal quality refers to aspects that are code-dependent, and that are not visible to the end-user. External quality is critical to the user, while internal quality is meaningful to the developer only.</p>
<p>Some quality criteria are objective, and can be measured accordingly. Some quality criteria are subjective, and are therefore captured with more arbitrary measurements.</p>
<p>The table below lists the most obvious software quality criteria, as well as some lesser known.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="118"></td>
<td style="text-align: center;" valign="top" width="59">User</td>
<td style="text-align: center;" valign="top" width="63">Developer</td>
<td style="text-align: center;" valign="top" width="95">Measurable</td>
</tr>
<tr>
<td colspan="4" valign="top" width="334">External quality</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#features">features</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#speed">speed</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#space">space</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#network">network usage</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#stability">stability</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#robustness">robustness</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#eou">ease-of-use</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">subjective</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#determinism">determinism</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#compatibility">back-compatibility</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#security">security</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">difficult</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#power">power consumption</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">difficult</td>
</tr>
<tr>
<td colspan="4" valign="top" width="334">Internal quality</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#coverage">test coverage</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#testability">testability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">hard</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#portability">portability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#thread">thread-safeness</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">hard</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#conciseness">conciseness</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#maintainability">maintainability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">hard</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#documentation">documentation</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">subjective</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#legibility">legibility</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">subjective</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#scalability">scalability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
</tbody>
</table>
<p>By definition the internal quality (code characteristics) is a concern to the developer only, while all the external quality aspects (coming from using the software) are critical to the end user. However the developer has also interests in performances (speed, space, network usage) and determinism, because they make testing the software easier. Developers treat ease-of-use, back-compatibility, security, and power consumption as requirements.</p>
<p>It is important to consider how difficult it is to measure each of these criteria. It can be difficult because there is no simple variable to look at, or because the measurement process is costly, or because it requires a complex infrastructure. For instance, speed has an objective measurement that is easy to measure. Power consumption has a simple measurement (how many µW the application consumes), but it is complex to measure. Security is difficult and costly to estimate.</p>
<p><a name="features"></a> <strong>Features</strong>. This is the very reason for the software to be written: to provide a service. By feature we really mean the output produced by the software –e.g., a numerical result, a string, a screen shot, a web page, an audio, etc&#8211;, regardless of the performances (speed, memory).</p>
<p><a name="speed"></a><strong>Speed</strong>. How quickly does the application provide the service? The user experiences the actual time elapsed between the moment she request the service, and the moment the service is delivered. The real elapsed time, or wall time, is the sum of the CPU time, system time, and network latency. Thus the developer should not only focus on the CPU time (how much time the CPU actually spends on executing the program). The CPU time can easily be overshadowed by disk access (a write on the disk is very costly), swapping (due to an excessive virtual memory size), or time spent by the network (latency issue, or too many round trips).</p>
<p><a name="space"></a><strong>Space</strong>. How much RAM and disk space is taken by the application? The aggregate numbers are important –peak memory, virtual memory size, etc. But even more so, how often do we move data that triggers a CPU cache miss or a disk write, has a dominant impact on the speed of the application. A mediocre data design can lead to very poor performances.</p>
<p><a name="network"></a><strong>Network usage</strong>. It is a matter of bandwidth and latency. Mismanaging sockets and channels can lead to unnecessary extra time spent in opening and closing sockets, handshakes, and round trips. As for memory, caching techniques can be used to reduce consuming network resources.</p>
<p><a name="stability"></a><strong>Stability</strong>. How often does one need to patch the software to fix problems? For the user, this is an inconvenience. For the developer, it means that the code is fragile and might benefit from better testing or partial rewrite.</p>
<p><a name="robustness"></a><strong>Robustness</strong>. How often does the application stale, freeze, or crash? How tolerant is it to extreme conditions –limited CPU and memory/disk/network resources, corner cases, system failure or unresponsive 3<sup>rd</sup>-party resources? This aspect is strongly related to testability and coverage.</p>
<p><a name="eou"></a><strong>Ease-of-use</strong>. It can be a very subjective factor, hard to quantify. It includes user documentation, clarity of the error message, management of exceptions, and recovery after failure.</p>
<p><a name="determinism"></a><strong>Determinism</strong>. Also known as repeatability: does the program produce the very same result given the same input? There are many reasons for which a program can exhibit a non-repeatable behavior. A non-repeatable behavior is confusing and frustrating for the user. This also makes the program very difficult to test and debug. Repeatability is strongly dependent on a good data model design.</p>
<p><a name="compatibility"></a><strong>Back-compatibility</strong>. Can a new version of the application be used with an older version’s data? It is essential to the user, because a new version should not require a costly migration of the existing data.</p>
<p><a name="security"></a><strong>Security</strong>. Who is authorized to access the data? Can the data processed by the application be compromised? This is a crucial aspect of many applications, and it is getting more and more difficult to assess with the dissemination of mobile and web-based software.</p>
<p><a name="power"></a><strong>Power consumption</strong>. It is increasingly important with mobile applications, as a program may have to consider how it manages the device’s power producers and consumers (battery, cores, wireless, screen, audio), and not to rely entirely on the operating system.</p>
<p><a name="coverage"></a><strong>Test coverage</strong>. What is the proportion of code that is executed by some unit or regression test? This is measured by the number of lines, number of functions, and number of control branches that are exercised by the tests. Usually one expects coverage of at least 85% for any moderately complex application. In practice reaching high coverage can be achieved only if testability is high, which has deep implication on the architecture and development methodology.</p>
<p><a name="testability"></a><strong>Testability</strong>. An often overlooked or simply ignored aspect of code development, testability is the ability to trigger any specific line of code or branching condition. Highly testable code requires a discipline of architecture and development that is difficult to find. It very costly to fix poorly testable software, as this requires major redesign. This justifies major investment in software architecture, design, and development methodologies.</p>
<p><a name="portability"></a><strong>Portability</strong>. Can the application run on 32 and 64 bits machines? Should it run on a mobile phone? Does it run on multiple OS (e.g., Windows, Linux, Mac OS-X, Solaris, iOS, Android, RIM)? Does it run smoothly on all web browsers (IE, Firefox, Chrome, Safari, Opera)?</p>
<p><a name="thread"></a><strong>Thread-safeness</strong>. Is a specific component thread-safe? Can two threads collide on non-atomic operations? Can the application get into a deadlock? As concurrency is still mostly the result of a manual process (there no compiler that automatically parallelizes the code), these questions are critical to ensure the good functioning of a program, as well as its performance –it is not rare to see the a program running <em>slower</em> when two many threads are available, as the cost of synchronization can become dominant.</p>
<p><a name="conciseness"></a><strong>Conciseness</strong>. Also known as compactness. Is there any dead code, or duplicated code? Is the code shared and factorized properly? A compact code usually means faster compilation and smaller binary size. Also compactness naturally leads to fewer bugs, because the number of bugs is historically <a href="../2009/10/13/test-driven-design/#kloc_per_defect">constant</a> w.r.t. code size.</p>
<p><a name="maintainability"></a><strong>Maintainability</strong>. How easy it is to debug the code? How fast is it to provide a fix? How quickly can a new developer understand the code? Maintainability is a very important aspect, quite difficult to quantify. Maintainability is increased with good testability and flexible (abstract) design.</p>
<p><a name="documentation"></a><strong>Documentation</strong>. This is a pretty subjective topic. Some people claim that a separate documentation written in plain English is necessary. Some others state that at least 30% of the code should be comments. Some finally argue that the code itself is the best documentation –the names of the types, classes, functions and arguments, together with plenty of assertions.</p>
<p><a name="legibility"></a><strong>Legibility</strong>. Also known as readability. This is another subjective topic. It is about how easy it is to read the code. Guidelines are established to unify the style of the code, so that a developer can easily read code written by another developer. Code guidelines abound, and they go from a small set of directives, to a full set of rules that specify every syntactical aspect of the language. For example, see <a href="http://hem.passagen.se/erinyq/industrial/" rel="nofollow">Industrial Strength C++</a>, <a href="http://www.codingstandard.com/HICPPCM/index.html" rel="nofollow">High Integrity C++</a>, <a href="http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml" rel="nofollow">Google C++ Style Guide</a>, and <a href="http://www.maultech.com/chrislott/resources/cstyle/" rel="nofollow">many</a> <a href="http://www.possibility.com/Cpp/CppCodingStandard.html" rel="nofollow">more</a>.</p>
<p><a name="scalability"></a><strong>Scalability</strong>. How easy it is to extend a feature? Or to add a new one? Or to add extra cores, or increase the size of the cluster the application runs on? Again, this is all about software architecture and anticipating future needs.</p>
<p>Software quality is the result of the user experience. But software quality should not and cannot be a reactive action to external defects. Software quality is built from the ground up, with design and development methodologies, and with a special focus on testability, coverage, and flexibility.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to write abstract iterators in C++</title>
		<link>http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/</link>
		<comments>http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 21:18:32 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=859</guid>
		<description><![CDATA[CodeProject When developing in C++, an impeccable API is a must have: it has to be as simple as possible, abstract, generic, and extensible. One important generic concept that STL made C++ developers familiar with is the concept of iterator. An iterator is used to visit the elements of a container without exposing how the [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/">How to write abstract iterators in C++</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a></p>
<p>When developing in C++, an <a href="../2009/10/08/api-design-101/">impeccable API</a> is a must have: it has to be as simple as possible, abstract, generic, and extensible. One important generic concept that STL made C++ developers familiar with is the concept of iterator.</p>
<p>An iterator is used to visit the elements of a container without exposing how the container is implemented (e.g., a vector, a list, a red-black tree, a hash set, a queue, etc). Iterators are central to generic programming because they are an interface between containers and applications. Applications need access to the elements of containers, but they usually do not need to know how elements are stored in containers. Iterators make possible to write generic algorithms that operate on different kinds of containers.</p>
<p>For example, the following code snippet exposes the nature of the container –a vector.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="cpp">     void process(const std::vector&lt;E&gt;&amp; v)
     {
         for (unsigned i = 0; i &lt; v.size(); ++i) {
             process(v[i]);
         }
     }</pre>
<p>If we want to have the same function operating on a list, we have to write a separate function. Or if we later decide that a list or a hash set is more appropriate as a container, we need to rewrite the code everywhere we access the vector. This may require a lot of changes in many files. Contrast this container-specific visitation scheme to the following:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template &lt;typename Container&gt;
     void process(const Container&amp; c)
     {
         typename Container::const_iterator itr = c.begin();
         typename Container::const_iterator end = c.end();
         for (; itr != end; ++itr) {
             process(*itr);
         }
     }</pre>
<p>Using the notion of iterator, we have a generic processing of a container ‘c’, whether it is a vector, a list, a hash set, or any data structure that provides iterators in its API. Even better, we can write a generic process function that only takes an iterator range, without assuming that the container has a begin() and end() method:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template &lt;typename Iterator&gt;
     void process(Iterator begin, Iterator end)
     {
         for (; itr != end; ++itr) {
             process(*itr);
         }
     }</pre>
<p>An STL iterator is a commodity that behaves as a scalar type:</p>
<ul>
<li>It can      be allocated on the heap</li>
<li>It can      be copied</li>
<li>It can      be passed by value</li>
<li>It can      be assigned to</li>
</ul>
<p>The essence of an iterator is captured by the following API.</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template &lt;typename T&gt;
     class Itr {
     public:
         Itr();
         ~Itr();
         Itr(const Itr&amp; o);                   <span style="color: #ff0000;">// Copy constructor</span>
         Itr&amp; operator=(const Itr&amp; o);        <span style="color: #ff0000;">// Assignment operator</span>
         Itr&amp; operator++();                   <span style="color: #ff0000;">// Next element</span>
         T&amp;   operator*();                    <span style="color: #ff0000;">// Dereference</span>
         bool operator==(const Itr&amp; o) const; <span style="color: #ff0000;">// Comparison</span>
         bool operator!=(const Itr&amp; o) const { return !(*this == o); }
     }</pre>
<p>Usually the container will provide a begin() and end() method, which build the iterators that denote the container’s range. Writing these begin/end methods is an easy task if the container is derived from a STL container, if the container has a data member that is an STL container, or if the iterator is a scalar type, like a pointer or an index.</p>
<p>It is more complicated if we want iterators that dereference to the same type of object, but that must visit several containers, possibly of different types, or iterators that visit containers in different manners. For instance let us assume that we have objects with some property (say, a color) stored in several containers, some of them of different types. We would like to visit all the objects, independently of the number of containers and their type, or we would like to visit objects of a given color, or we would like to visit objects that satisfy some predicate:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     class E;

     Itr&lt;E&gt; begin(); <span style="color: #ff0000;">// This give the range to visit</span>
     Itr&lt;E&gt; end();   <span style="color: #ff0000;">// all the elements of type E  </span>    

     Itr&lt;E&gt; begin(const Color&amp; color); <span style="color: #ff0000;">// Same as above but only for the</span>
     Itr&lt;E&gt; end(const Coir&amp; color);    <span style="color: #ff0000;">// elements of the given color</span>      

     class Predicate {
     public:
         bool operator()(const E&amp; e);
     };      

     Itr&lt;E&gt; begin(Predicate&amp; p); <span style="color: #ff0000;">// Same as above but only for the</span>
     Itr&lt;E&gt; end(Predicate&amp; p);   <span style="color: #ff0000;">// elements that satisfy the predicate</span></pre>
<p>In this case the iterator is more complex than a scalar type like a pointer or an index: it needs to keep track of which container it is currently visiting, or which color or predicate it needs to check. In general, the iterator may have data members so that it can fulfill its task. Also we want to factorize the code and reuse general purpose iterators’ methods when writing more targeted iterators –e.g., visiting elements of a specific color should make use of the next-element method Itr&lt;E&gt;::operator++(). This can be done by having Itr&lt;E&gt; be a virtual class, and having derived classes to implement the different iterators. For example:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     class E {
     public:
         Color&amp; color() const;
     };      

     template &lt;typename E&gt;
     class ColoredItr&lt;E&gt; : public Itr&lt;E&gt; {
     private:
         typedef Itr&lt;E&gt; _Super;
     public:
         ColoredItr&lt;E&gt;(const Color&amp; color) : Itr&lt;E&gt;(), color_(color) {}
         virtual ~ColoredItr&lt;E&gt;;
         virtual ColoredItr&lt;E&gt;&amp; Operator++() {
            for (; _Super::operator*().color() != color_; _Super::operator++());
            return *this;
         }
     private:
         Color color_;
    };</pre>
<p>We would like a generic iterator that meets all the requirements described above:</p>
<ul>
<li>It can      be allocated on the heap</li>
<li>It can      be copied</li>
<li>It can      be passed by value</li>
<li>It can      be assigned to</li>
<li>It dereferences      to the same type</li>
<li>It can      visit several containers</li>
<li>It can      visit containers of different types</li>
<li>It can      visit containers in arbitrary manners</li>
</ul>
<p>This can be implemented as follows.</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template&lt;typename E&gt;
     class ItrBase {
     public:
         ItrBase() {}
         virtual ~ItrBase() {}
         virtual void  operator++() {}
         virtual E&amp;    operator*() const { return E(); }
         virtual ItrBase* clone() const { return new ItrBase(*this); }
         <span style="color: #ff0000;">// The == operator is non-virtual. It checks that the
         // derived objects have compatible types, then calls the
         // virtual comparison function equal.</span>
         bool operator==(const ItrBase&amp; o) const {
             return typeid(*this) == typeid(o) &amp;&amp; equal(o);
         }
     protected:
         virtual bool equal(const ItrBase&amp; o) const { return true; }
     };      

     template&lt;typename E&gt;
     class Itr {
     public:
         Itr() : itr_(0) {}
         ~Itr() { delete itr_; }
         Itr(const Itr&amp; o) : itr_(o.itr_-&gt;clone()) {}
         Itr&amp; operator=(const Itr&amp; o) {
             if (itr_ != o.itr_) { delete itr_; itr_ = o.itr_-&gt;clone(); }
             return *this;
         }
         Itr&amp;  operator++() { ++(*itr_); return *this; }
         E&amp;    operator*() const { return *(*itr_); }
         bool  operator==(const Itr&amp; o) const {
             return (itr_ == o.itr_) || (*itr_ == *o.itr_);
         }
         bool  operator!=(const Itr&amp; o) const { return !(*this == o); }      

     protected:
         ItrBase&lt;E&gt;* itr_;
     };</pre>
<p>The ItrBase class is the top class of the hierarchy. Itr is simply a wrapper on a pointer to an ItrBase, so that it can be allocated on the heap –the actual implementation of the class deriving from ItrBase can have an arbitrary size. Note how the Itr copy and assignment operators are implemented via the ItrBase::clone() method, so that Itr behaves as a scalar type. Last but not least, the (non-virtual) ItrBase::operator== equality operator first checks for type equality before calling the (virtual) equality method equal on the virtual subclass. The reason ItrBase is not a pure virtual is that it can conveniently be used to denote an empty range, i.e., the range (ItrBase(), ItrBase()) is empty.</p>
<p>Iterators on containers of elements of type E just need to derive from ItrBase&lt;E&gt;, and a factory providing the begin() and end() methods for any specialized iterator returns object of type Itr&lt;E&gt;.</p>
<p>For example, let us assume that we have a container c of E&#8217;s, and that we want an iterator to visit (1) all the elements of c, possibly with repetition; (2) all the elements of c without repetition. This can be done as follows.</p>
<pre style="color: #000000; background-color: #ffe3c1;">    class E;

    class ItrAll : public ItrBase&lt;E&gt; {
    private:
        typedef ItrAll     _Self;
        typedef ItrBase&lt;E&gt; _Super;
    public:
        ItrAll(Container&amp; c) : _Super(), c_(c) {}
        virtual ~ItrAll() {}
        virtual void  operator++() { ++itr_; }
        virtual E&amp;    operator*() const { return *itr_; }
        virtual ItrBase&lt;E&gt;* clone() const { return new _Self(*this); }
    protected:
        virtual bool equal(const ItrBase&lt;E&gt;&amp; o) const {
            <span style="color: #ff0000;">// Casting is safe since types have been checked by _Super::operator==</span>
            const _Self&amp; o2 = static_cast&lt;const _Self&amp;&gt;(o);
            return &amp;c_ == &amp;o2.c_ &amp;&amp; itr_ == o2.itr_;
        }
    protected:
        Container&amp;          c_;
        Container::iterator itr_;
    };     

    class ItrNoRepeat : public ItrAll {
    private:
        typedef ItrNoRepeat _Self;
        typedef ItrAll      _Super;
    public:
        ItrNoRepeat (Container&amp; c) : _Super(c) {}
        virtual ~ItrNoRepeat () {}
        virtual void  operator++() {
            _Super::operator++(); <span style="color: #ff0000;">// Go to the next element then
            // look for an element that has not been visited yet.</span>
            for (; itr_ != c_.end(); _Super::operator++()) {
                E&amp; e = _Super::operator*();
                if (visited_.find(e) == visited_.end()) {
                    visited_.insert(e);
                    return;
                }
            }
        }
        virtual E&amp;    operator*() const { return _Super::operator*(); }
        virtual ItrBase&lt;E&gt;* clone() const { return new _Self(*this); }
    protected:
        virtual bool equal(const ItrBase&lt;E&gt;&amp; o) const { return _Super::equal(o); }
    protected:
        set&lt;E&gt; visited_;
    };     

    <span style="color: #ff0000;">// Build the container’s range w/ and w/o repetition</span>
    Itr&lt;E&gt; begin(Container&amp; c, bool noRepeat = false)
    {
        Itr&lt;E&gt; o;
        if (noRepeat) {
            o.itr_ = new ItrNoRepeat(c);
        } else {
            o.itr_ = new ItrAll(c);
        }
        o.itr_-&gt;itr_ = c.begin();
        return o;
    }     

    Itr&lt;E&gt; end(Container&amp; c, bool noRepeat = false)
    {
        Itr&lt;E&gt; o;
        if (noRepeat) {
            o.itr_ = new ItrNoRepeat(c);
        } else {
            o.itr_ = new ItrAll(c);
        }
        o.itr_-&gt;itr_ = c.end();
        return o;
    }</pre>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

