<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Olivier Coudert&#039;s Blog &#187; software</title>
	<atom:link href="http://www.ocoudert.com/blog/tag/software/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ocoudert.com/blog</link>
	<description>My take on tech --and other topics</description>
	<lastBuildDate>Sat, 21 Jan 2012 20:30:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>A practical guide to C++ serialization</title>
		<link>http://www.ocoudert.com/blog/2011/07/09/a-practical-guide-to-c-serialization/</link>
		<comments>http://www.ocoudert.com/blog/2011/07/09/a-practical-guide-to-c-serialization/#comments</comments>
		<pubDate>Sun, 10 Jul 2011 04:52:41 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=1168</guid>
		<description><![CDATA[CodeProject In a nutshell, serialization consists of writing data and objects on a support (a file, a buffer, a socket), so that they can be reconstructed later in the memory of the same or another computing host. The reconstruction process is also known as deserialization. Serializing a primitive type like a bool, int, or float, [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2011/07/09/a-practical-guide-to-c-serialization/">A practical guide to C++ serialization</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a><br />
In a nutshell, serialization consists of writing data and objects on a support (a file, a buffer, a socket), so that they can be reconstructed later in the memory of the same or another computing host. The reconstruction process is also known as deserialization.</p>
<p>Serializing a primitive type like a bool, int, or float, is trivial: just write the data as it is (assuming that no compression is used). Serializing a pointer is different: the object it points to must be serialized first. That way deserializing the pointer simply consists of setting its value to the memory address at which the object has been reconstructed.</p>
<p>We can distinguish three levels of complexity in serialization, depending on how complex the pointer (and reference) graph is:</p>
<ol>
<li>The pointer graph is a <em>forest</em> (i.e., a set of <em>trees</em>). Data can simply be serialized bottom up with a depth first traversal of the trees.</li>
<li>The pointer graph is a <em>directed acyclic graph</em> (DAG), i.e., a graph without loop. We can still serialize the data bottom up, making sure we write and restore shared data only once.</li>
<li>The pointer graph is a general graph, i.e., it may have loops. We need to write and restore data with forward references so that loops are handled properly.</li>
</ol>
<p>&nbsp;</p>
<div class="mceTemp mceIEcenter" style="text-align: center;">
<dl id="attachment_1169" class="wp-caption aligncenter" style="width: 610px;">
<dt class="wp-caption-dt"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/pointers-graph.png"><img class="size-full wp-image-1169 " title="pointers graph" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/pointers-graph.png" alt="" width="600" height="362" /></a></dt>
<dd class="wp-caption-dd">Pointer graph as a tree, a DAG, and with loops</dd>
</dl>
</div>
<p>&nbsp;</p>
<p>It is always an option to serialize objects using your own customized code. However serialization is much more complex than a simple pretty-print method. One would like serialization to support the following features:</p>
<ol>
<li>Serialization should be able to handle any pointer graph (i.e., with loops).</li>
<li>Serializing a pointer or a reference should automatically trigger the serialization of the referred object.</li>
<li>Serializing an entire data model can require a lot of code –from simple scalar fields (bool, int, float), to containers (vector, list, hash table, etc), to intricate data structures (graph, quad-tree, sparse matrices, etc). One would like templates that carry most of the burden.</li>
<li>The save and load functions must always be in sync: if the ‘save’ function is modified, the ‘load’ function must be changed appropriately. One would like that process to be automated as much as possible.</li>
<li>One should have a way of serializing objects without changing their .hpp files –this is known as non-intrusive serialization. The reason is that in many case one does not want (or one cannot) change the source files of existing libraries.</li>
<li>Serialization needs to support versioning. As objects evolve, data members are added or removed, and it is desirable to be back compatible –meaning, one can still deserialize archives from older versions into the most recent data model.</li>
<li>Serialization should be cross-platform compatible (32 and 64 bits machines, Windows, Linux, Solaris, etc).</li>
</ol>
<p>The boost library provides a serialization that meets all the requirements above, and more:</p>
<ul>
<li>It is extremely efficient, it supports versioning, and it automatically serializes STL containers.</li>
<li>Serialization (the save function) and deserialization (the load function) are expressed with one single template, which reduces the size of the code, and resolves the synchronization problem.</li>
<li>With a little bit of help, boost serialization is also 32 and 64 bit compatible, which means that a database serialized on a 32 bit machine can be read on a 64 bit machine <em>and conversely</em>.</li>
<li>Also boost serialization (respectively deserialization) takes an output (respectively input) argument that is very similar to a std::ostream (respectively std::istream), meaning that it can be a file on a disk, a buffer, or a socket. You can literally serialize your data over a network.</li>
</ul>
<p>The best way to understand how to serialize with boost is to walk through increasingly complex serialization scenarios.</p>
<h2>Basic serialization</h2>
<p>The code for serialization, as well as an example that saves and restores simple objects, is given below.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once
// File obj.hpp

// Forward declaration of class boost::serialization::access
namespace boost {
namespace serialization {
class access;
}
}

class Obj {
public:
  // Serialization expects the object to have a default constructor
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }
private:
  int  d1_;
  bool d2_;

  // Allow serialization to access non-public data members.
  friend class boost::serialization::access;

  template&lt;typename Archive&gt;
  void serialize(Archive&amp; ar, const unsigned version) {
    ar &amp; d1_ &amp; d2_;  // Simply serialize the data members of Obj
  }
};
</small></pre>
<p>The template ‘serialize’ defines both the save and load. This is achieved because the operator ‘&amp;’ will be defined as ‘&lt;&lt;’ (respectively ‘&gt;&gt;’) for an output (respectively input) archive. Note the friend declaration to allow the save/load template to access the private data members of the objects. Also note that serialization expects the object to have a default constructor (which can be private).</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include "obj.hpp"
#include &lt;assert.h&gt;
#include &lt;fstream&gt;
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

int main() {
  const char* fileName = "saved.txt";

  // Create some objects
  const Obj o1(-2, false);
  const Obj o2;
  const Obj o3(21, true);
  const Obj* const p1 = &amp;o1;

  // Save data
  {
    // Create an output archive
    std::ofstream ofs(fileName);
    boost::archive::text_oarchive ar(ofs);

    // Write data
    ar &amp; o1 &amp; o2 &amp; o3 &amp; p1;
  }

  // Restore data
  Obj restored_o1;
  Obj restored_o2;
  Obj restored_o3;
  Obj* restored_p1;
  {
    // Create and input archive
    std::ifstream ifs(fileName);
    boost::archive::text_iarchive ar(ifs);

    // Load data
    ar &amp; restored_o1 &amp; restored_o2 &amp; restored_o3 &amp; restored_p1;
  }

  // Make sure we restored the data exactly as it was saved
  assert(restored_o1 == o1);
  assert(restored_o2 == o2);
  assert(restored_o3 == o3);
  assert(restored_p1 != p1);
  assert(restored_p1 == &amp;restored_o1);

  return 0;
}
</small></pre>
<p>In main.cpp, we first include the files declaring the input and output text archives, where objects will be loaded from and saved to, respectively. We create an output archive (here, a file on a disk), and write three instances of class Obj, as well as a pointer to one of the instances. We then read them back and make sure we restore the data as they were. Note how the restored pointer restored_p1 points to the restored object restored_o1.</p>
<h2>More on pointer serialization</h2>
<p>Whenever we call serialization on a pointer (or reference), this triggers the serialization of the object it points to (or refers to) whenever necessary. So we do not need to explicitly serialize pointed objects as boost serialization will make sure the appropriate objects reached in the pointers graph are serialized.</p>
<p>For instance, the code below shows that serializing the pointer p1 triggers the serialization of o1, the object it point to. When restoring the pointer restored_p1, we automatically create a clone of the object o1.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include "obj.hpp"
#include &lt;assert.h&gt;
#include &lt;fstream&gt;
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

int main()
{
  const char* fileName = "saved.txt";

  // Create one object o1.
  const Obj o1(-2, false);
  const Obj* const p1 = &amp;o1;

  // Save data
  {
    // Create an output archive
    std::ofstream ofs(fileName);
    boost::archive::text_oarchive ar(ofs);
    // Save only the pointer. This will trigger serialization
    // of the object it points too, i.e., o1.
    ar &amp; p1;
  }

  // Restore data
  Obj* restored_p1;
  {
    // Create and input archive
    std::ifstream ifs(fileName);
    boost::archive::text_iarchive ar(ifs);
    // Load
    ar &amp; restored_p1;
  }

  // Make sure we read exactly what we saved.
  assert(restored_p1 != p1);
  assert(*restored_p1 == o1);

  return 0;
}
</small></pre>
<p>When deserializing a pointer, the object it points to will be automatically deserialized if this object has not been deserialized yet. This means that one should not attempt to deserialize an object <em>after</em> a pointer to this object has been deserialized. The reason is that once the pointer deserialization has forced the object deserialization, one cannot rebuild this object at a different address.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include "obj.hpp"
#include &lt;fstream&gt;
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

int main()
{
  const char* fileName = "saved.txt";
  std::ofstream ofs(fileName);

  // Create one object o1 and a pointer p1 to that object.
  const Obj o1(-2, false);
  const Obj* const p1 = &amp;o1;

  // Serialize object, then pointer.
  // This works fine: after the object is deserialized, we can
  // deserialize the pointer by assigning it to the object’s address.
  {
    boost::archive::text_oarchive ar(ofs);
    ar &amp; o1 &amp; p1;
  }

  // Serialize pointer, then object.
  // This does not work: once p1 has been serialized, the object
  // has already been deserialized and its address cannot change.
  // This will throw an instance of 'boost::archive::archive_exception'
  // at runtime.
  {
    boost::archive::text_oarchive ar(ofs);
    ar &amp; p1 &amp; o1;
  }

  return 0;
}
</small></pre>
<p>In the example above, the second serialization will result in a runtime error:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>ocoudert@MyMacBookPro $ a.out
terminate called after throwing an instance of 'boost::archive::archive_exception'
    what():  pointer conflict
Abort trap
coudert@MyMacBookPro $
</small></pre>
<p>This means that when pointers need to be serialized, we should never explicitly serialize the objects they point to.</p>
<h2>Explicit save and load function definitions</h2>
<p>We need an explicit definition of the save and load functions whenever they are not fully symmetric. This is typical when versioning is involved. Note the use of the macro BOOST_SERIALIZATION_SPLIT_MEMBER(), which is responsible for calling save/load when using an output/input archive.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

#include &lt;boost/serialization/split_member.hpp&gt;

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }

private:
  int  d1_;
  bool d2_;

  friend class boost::serialization::access;

  template&lt;class Archive&gt;
  void save(Archive &amp; ar, const unsigned int version) const {
    ar &amp; d1_ &amp; d2_;
  }

  template&lt;class Archive&gt;
  void load(Archive &amp; ar, const unsigned int version) {
    ar &amp; d1_ &amp; d2_;
  }

  BOOST_SERIALIZATION_SPLIT_MEMBER()
};
</small></pre>
<h2>Serialization of C-strings</h2>
<p>A C-string cannot be directly serialized because it assumes a specific interpretation of a char*, namely an array of char terminated by a null character (‘\0’). Thus we need to explicitly serialized C-string. The class below is a simple helper to serialize C-strings (note that this can be optimized by avoiding the construction of the sdt::string).</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once
// File SerializeCStringHelper.hpp

#include &lt;string&gt;
#include &lt;boost/serialization/string.hpp&gt;
#include &lt;boost/serialization/split_member.hpp&gt;

class SerializeCStringHelper {
public:
  SerializeCStringHelper(char*&amp; s) : s_(s) {}
  SerializeCStringHelper(const char*&amp; s) : s_(const_cast&lt;char*&amp;&gt;(s)) {}

private:

  friend class boost::serialization::access;

  template&lt;class Archive&gt;
  void save(Archive&amp; ar, const unsigned version) const {
    bool isNull = (s_ == 0);
    ar &amp; isNull;
    if (!isNull) {
      std::string s(s_);
      ar &amp; s;
    }
  }

  template&lt;class Archive&gt;
  void load(Archive&amp; ar, const unsigned version) {
    bool isNull;
    ar &amp; isNull;
    if (!isNull) {
      std::string s;
      ar &amp; s;
      s_ = strdup(s.c_str());
    } else {
      s_ = 0;
    }
  }

  BOOST_SERIALIZATION_SPLIT_MEMBER();

private:
  char*&amp; s_;
};
</small></pre>
<p>A simple example of its usage is as follows.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include "SerializeCStringHelper.hpp"
#include &lt;assert.h&gt;
#include &lt;fstream&gt;
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

int main()
{
  const char* fileName = "saved.txt";
  const char* str = "This is an example a C-string";

  // Save data
  {
    // Create an output archive
    std::ofstream ofs(fileName);

    boost::archive::text_oarchive ar(ofs);
    // Save
    SerializeCStringHelper helper(str);
    ar &amp; helper;
  }

  // Restore data
  char* restored_str;
  {
    // Create and input archive
    std::ifstream ifs(fileName);
    boost::archive::text_iarchive ar(ifs);

    // Load
    SerializeCStringHelper helper(restored_str);
    ar &amp; helper;
  }

  // Make sure we read exactly what we saved
  assert(restored_str!= str);
  assert(strcmp(restored_str, str) == 0);

  return 0;
}
</small></pre>
<h2>Non-intrusive serialization</h2>
<p>So far the serialization code is added in the class definition. A non-intrusive serialization, outside of the class, might be preferable. For instance we would like to serialize a class from a library without altering the library’s hpp file. This is easy when the data members are public:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }

public:
  int  d1_;
  bool d2_;
};

namespace boost {
namespace serialization {

template&lt;typename Archive&gt;
void serialize(Archive&amp; ar, Obj&amp; o, const unsigned int version) {
  ar &amp; o.d1_ &amp; o.d2_;
}

} // namespace serialization
} // namespace boost
</small></pre>
<p>If we want to protect the data members, the code is a bit more complicated because the serialization template needs to be declared as a friend. This requires a forward declaration of the template.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

//// Declaration of the template
class Obj;

namespace boost {
namespace serialization {

template&lt;typename Archive&gt;
void serialize(Archive&amp; ar, Obj&amp; o, const unsigned int version);

} // namespace serialization
} // namespace boost

//// Definition of the class
class Obj {
public
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
}

private:
  int  d1_;
  bool d2_;

  // Allow serialization to access data members.
  template&lt;typename Archive&gt; friend
  void boost::serialization::serialize(Archive&amp; ar, Obj&amp; o, const unsigned int version);
};

//// Definition of the template
namespace boost {
namespace serialization {

template&lt;typename Archive&gt;
void serialize(Archive&amp; ar, Obj&amp; o, const unsigned int version) {
ar &amp; o.d1_ &amp; o.d2_;
}

} // namespace serialization
} // namespace boost
</small></pre>
<h2>Non-intrusive explicit save and load function definitions</h2>
<p>This combines the two previous serialization styles, except that the include file and macro are different. For the sake of simplicity, we give the version for public data members.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

#include &lt;boost/serialization/split_free.hpp&gt;

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }

public:
  int  d1_;
  bool d2_;
};

namespace boost {
namespace serialization {

template&lt;class Archive&gt;
void save(Archive &amp; ar, const Obj&amp; o, const unsigned int version) {
  ar &amp; o.d1_ &amp; o.d2_;
}

template&lt;class Archive&gt;
void load(Archive &amp; ar, Obj&amp; o, const unsigned int version) {
  ar &amp; o.d1_ &amp; o.d2_;
}

} // namespace serialization
} // namespace boost

BOOST_SERIALIZATION_SPLIT_FREE(Obj)

</small></pre>
<h2>Serialization of STL containers</h2>
<p>The boost library comes with templates to automatically serialize STL containers, as well as some STL objects (e.g., std::string). Instead of saving/loading a vector with the following code:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>template&lt;typename Archive&gt;
void save(Archive&amp; ar, const std::vector&lt;Obj&gt;&amp; objs, const unsigned version) {
  ar &lt;&lt; objs.size();
  for (size_t i = 0; i &lt; objs.size(); ++i) {
    ar &lt;&lt; objs[i];
  }
}

template&lt;typename Archive&gt;
void load(Archive&amp; ar, std::vector&lt;Obj&gt;&amp; objs, const unsigned version) {
  size_t size;
  ar &gt;&gt; size;
  objs.resize(size);
  for (size_t i = 0; i &lt; size; ++i) {
    ar &gt;&gt; objs[i];
  }
}
</small></pre>
<p>One simply writes:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include &lt;boost/serialization/vector.hpp&gt;

template&lt;typename Archive&gt;
void serialize(Archive&amp; ar, std::vector&lt;Obj&gt;&amp; objs, const unsigned version) {
  ar &amp; objs;
}
</small></pre>
<p>All the STL containers are supported using the appropriate include files:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include &lt;boost/serialization/array.hpp&gt;
#include &lt;boost/serialization/vector.hpp&gt;
#include &lt;boost/serialization/hash_map.hpp&gt;
#include &lt;boost/serialization/hash_set.hpp&gt;
#include &lt;boost/serialization/list.hpp&gt;
#include &lt;boost/serialization/slist.hpp&gt;
#include &lt;boost/serialization/map.hpp&gt;
#include &lt;boost/serialization/set.hpp&gt;
#include &lt;boost/serialization/bitset.hpp&gt;
#include &lt;boost/serialization/string.hpp&gt;
</small></pre>
<h2>Serialization of base class</h2>
<p>When a class inherits from another, the base class needs to be serialized as well.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include &lt;boost/serialization/base_object.hpp&gt;

class Base {
public:
  Base() : c_('\0') {}
  Base(char c) : c_(c) {}
  bool operator==(const Base&amp; o) const { return c_ == o.c_; }

private:
  char c_;

  friend class boost::serialization::access;

  template &lt;typename Archive&gt;
  void serialize(Archive&amp; ar, const unsigned version) {
    ar &amp; c_;
  }
};

class Obj : public Base {
private:
  typedef Base _Super;
public:
  Obj() : _Super(), d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : _Super('a'), d1_(d1), d2_(d2) {}
  bool operator==(const Obj&amp; o) const {
    return _Super::operator==(o) &amp;&amp; d1_ == o.d1_ &amp;&amp; d2_ == o.d2_;
  }

private:
  int  d1_;
  bool d2_;

  friend class boost::serialization::access;

  template &lt;typename Archive&gt;
  void serialize(Archive&amp; ar, const unsigned version) {
    ar &amp; boost::serialization::base_object&lt;_Super&gt;(*this);
    ar &amp; d1_ &amp; d2_;
  }
};
</small></pre>
<h2>Versioning</h2>
<p>We want maintain back-compatibility when the class Obj evolves. For instance, if a new data member ‘ID_’ is added, we want to read an old archive and build new Obj, with the missing data member taking the default value.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

#include &lt;boost/serialization/split_member.hpp&gt;
#include &lt;boost/serialization/version.hpp&gt;

class Obj {
public:
  Obj() : d1_(-1), d2_(false), ID_(0) {}
  Obj(int d1, bool d2, unsigned ID id) : d1_(d1), d2_(d2), ID_(id) {}
  bool operator==(const Obj&amp; o) const {
    return d1_ == o.d1_ &amp;&amp; d2_ == o.d2_ &amp;&amp; ID_ == o.ID_;
  }

private:
  int  d1_;
  bool d2_;
  unsigned ID_;

  friend class boost::serialization::access;

  template&lt;class Archive&gt;
  void save(Archive &amp; ar, const unsigned int version) const {
    ar &amp; d1_ &amp; d2_ &amp; ID_;
  }

  template&lt;class Archive&gt;
  void load(Archive &amp; ar, const unsigned int version) {
    ar &amp; d1_ &amp; d2_;
    // If archive’s version is 0 (i.e., is old), ID_ keeps
    // its default value from the new data model,
    // else we read ID_’s value from the archive.
    if (version &gt; 0) {
      ar &amp; ID_;
    }
  }

  BOOST_SERIALIZATION_SPLIT_MEMBER()

};
</small></pre>
<h2>Serialization of const data or objects</h2>
<p>Attempting to serialize a const data or object triggers a long trail of error messages, which includes something that looks like:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>[snip]

/opt/local/include/boost/archive/detail/check.hpp:162: error:
  invalid application of ‘sizeof’ to incomplete type ‘boost::STATIC_ASSERTION_FAILURE&lt;false&gt;‘

[snip]

/opt/local/include/boost/archive/basic_text_iprimitive.hpp:88: error:
  ambiguous overload for ‘operator&gt;&gt;‘ in
  ‘((boost::archive::basic_text_iprimitive&lt;std::basic_istream&lt;char,
       std::char_traits&lt;char&gt; &gt; &gt;*)this)-&gt;boost::archive::basic_text_iprimitive&lt;std::basic_istream&lt;char,
         std::char_traits&lt;char&gt; &gt; &gt;::is &gt;&gt; t’
</small></pre>
<p>This means that the input archive expects the recipient of the data to be non-const. Thus const data members must be const_cast&lt;&gt;()’ed to be serialized. For example:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#pragma once

#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

class Obj {
public:
  Obj() : d1_(-1), d2_(false) {}
  Obj(int d1, bool d2) : d1_(d1), d2_(d2) {}

private:
  const int d1_;
  bool d2_;

  // Allow serialization to access data members.
  friend class boost::serialization::access;

  template&lt;typename A&gt;
  void serialize(A&amp; ar, const unsigned version) {
    ar &amp; const_cast&lt;int&amp;&gt;(d1_) &amp; d2_;
  }
};
</small></pre>
<h2>Text, XML, and binary archives</h2>
<p>The text archive is an ASCII file that is somewhat human readable. There are other archive types available in boost/archive/*.hpp, e.g.:</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>// Text archive that defines boost::archive::text_oarchive
// and boost::archive::text_iarchive
#include &lt;boost/archive/text_iarchive.hpp&gt;
#include &lt;boost/archive/text_oarchive.hpp&gt;

// XML archive that defines boost::archive::xml_oarchive
// and boost::archive::xml_iarchive
#include &lt;boost/archive/xml_oarchive.hpp&gt;
#include &lt;boost/archive/xml_iarchive.hpp&gt;

// XML archive which uses wide characters (use for UTF-8 output ),
// defines boost::archive::xml_woarchive
// and boost::archive::xml_wiarchive
#include &lt;boost/archive/xml_woarchive.hpp&gt;
#include &lt;boost/archive/xml_wiarchive.hpp&gt;

// Binary archive that defines boost::archive::binary_oarchive
// and boost::archive::binary_iarchive
#include &lt;boost/archive/binary_oarchive.hpp&gt;
#include &lt;boost/archive/binary_iarchive.hpp&gt;
</small></pre>
<p>The text and XML archives are portable across 32 and 64 bits platforms.</p>
<p>Having a binary archive that is portable between 32 and 64 bits is not trivial, because C++ does not specify exactly the size of primitive types. For instance a long is usually 4 bytes on a 32 bits machine, and 8 bytes on a 64 bits machine. In practice though it is pretty portable –there is a non-official version for a portable binary archive.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2011/07/09/a-practical-guide-to-c-serialization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to make software deterministic</title>
		<link>http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/</link>
		<comments>http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/#comments</comments>
		<pubDate>Mon, 30 May 2011 17:04:40 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=1247</guid>
		<description><![CDATA[CodeProject A program is deterministic, or repeatable, if it produces the very same output when given the same input no matter how many times it is run. Refining this definition, we should consider whether a program produces the same result on any platform (32 and 64 bits machines, running Windows, Mac OS, Linux, Solaris, etc). [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/">How to make software deterministic</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display:none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a><br />
A program is deterministic, or repeatable, if it produces the very same output when given the same input no matter how many times it is run.</p>
<p>Refining this definition, we should consider whether a program produces the same result on any platform (32 and 64 bits machines, running Windows, Mac OS, Linux, Solaris, etc). Or whether the program is insensitive to the form of its inputs. For example, the problem of generating the shortest route to visit all the capitals of Europe should not depend on how the map of Europe is entered, nor it should depend on which language is used to name the capitals.</p>
<p>Determinism is obviously very desirable. For the user, a non-deterministic program can be confusing and frustrating. For the developer, a non-deterministic program is extremely hard to test and debug, since bugs and specific configuration cannot be easily reproduced.</p>
<p>Repeatability looks like a given for most applications. For instance, if we add two numbers in a spreadsheet, we expect the same result no matter how many times we perform this operation and regardless of the platform we run on (PC, Mac, etc). Or if we run a spell checker several times, we expect it to flag the very same errors.</p>
<p>But it is not that obvious for more complex applications. This is especially true when there are multiple solutions to a problem, or when heuristics are used to produce a result –because an exact solution is too computationally expensive. For example, it is not uncommon to see slightly different outcomes when running the same EDA synthesis or P&amp;R tool on the same input several times.</p>
<p>Even more, a user would like to see the same result when only minor changes are applied to the input. For instances, running a P&amp;R tool on two netlists that differ only by the names of their cells should produce exactly the same result –a P&amp;R tools should produce a result that only depend on the netlist structure. But experience shows that industrial synthesis and P&amp;R tool does not meet that requirement. Closest to software, it is not uncommon to generate slightly different object codes with gcc by changing the names of a few variables.</p>
<p>Among the causes of non-deterministic response, we can distinguish the following types:</p>
<ol>
<li>A <a href="#random">random number</a> generator</li>
<li>Reading an <a href="#uninitialized">uninitialized</a> data</li>
<li>A <a href="#race">race condition</a> on concurrent threads</li>
<li>An <a href="#unordered">unordered iteration</a> that is assumed ordered</li>
<li>A total order that depends on <a href="#memory_address">memory address</a></li>
<li>A total order that depends on <a href="#time_stamp">time stamps</a></li>
<li>A total order that depends on a <a href="#non_canonical_labelling">non-canonical labeling</a></li>
</ol>
<h4><strong><a name="random"></a>1. Random number generator</strong></h4>
<p>There are a lot of applications that use stochastic processes (e.g., simulated annealing, genetic algorithms, Monte-Carlo simulations), but that we would like to be repeatable. Using a pseudorandom number generator with a known seed makes possible to reproduce the same long sequence of seemingly random numbers over and over again.</p>
<p>Note that some applications (e.g., gaming, cryptography, statistical sampling) <em>require</em> a non-deterministic behavior. In that case the seed of the random number generator must be an always-changing value, for example the host’s current time.</p>
<p>There are more deliberate efforts to produce true random values by relying on natural, chaotic events. For instance <a title="Lavarand" href="http://www.lavarnd.org/">Lavarand</a> produces random numbers by hashing the frames of a video stream of lava lamps. <a href="http://www.fourmilab.ch/hotbits/">HotBits</a> generates random bits by timing successive pairs of radioactive decays detected by a Geiger-Müller tube interfaced to a computer. <a title="Random.org" href="http://www.random.org/">Random.org</a> uses variations in the amplitude of atmospheric noise recorded with a normal radio.<strong> </strong><strong> </strong></p>
<h4><strong><a name="uninitialized"></a>2. Uninitialized or random data read</strong></h4>
<p>Initialized data may not exist in languages that have systematic default values and no memory management control, as opposed to high performance languages like C/C++.</p>
<p>Finding and fixing this kind of issues is relatively simple. For instance, tools like <a href="http://www-01.ibm.com/software/awdtools/purify/">Purify</a> and <a href="http://valgrind.org/">Valgrind</a> can report when a C/C++ code reads arbitrary values in memory. To use Purify’s terminology, such errors are UMR (Uninitialized Memory Read), ABR (Array Bound Read, i.e., dereferencing an array outside of its bounds), and FMR (Free Memory Read). These defects all consist in reading some random value in memory.  The code below illustrates some of these errors.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include
#include 

int main() {
  bool large;
  int size = 100;

  if (large) {                    // UMR
    size *= 10;
  }

  char* const str    = new char(size);
  char* const middle = str + size/2;
  // Set the end-of-string.
  str[size - 1] = '\0';
  // The loop intends to fill up the string str with 'a'.
  // But this loop is faulty, because *++c is used instead of *c++.
  char* c = str;
  for (int i = 1; i &lt; size; ++i, *++c = 'a');

  printf("%c\n", str[0]);         // UMR
  printf("%c\n", str[1]);         // Ok, will print 'a'.
  printf("%c\n", str[2]);         // Ok, will print 'a'.
  printf("%c\n", str[size - 1]);  // Ok, but will print 'a'
                                  // instead of the expected '\0'.
  printf("%c\n", str[size]);      // ABR
  printf("%lu\n", strlen(str));   // UMR, because we overwrote
                                  // the final '\0' in the loop.
  delete [] str;
  printf("%c\n", *middle);        // FMR

  return 0;
}
</small></pre>
<p>Note that an ABW (Array Bound Write, i.e., writing outside of an array’s bounds), FMW (Free Memory Write), FNH (Freeing Non-Heap memory) and FUM (Freeing Unallocated Memory), although severe bugs also reported by dynamic analysis tools, are not an original source of non-determinism: they consistently reproduce the same bug.<strong> </strong><strong> </strong></p>
<h4><strong><a name="race"></a>3. Thread races</strong></h4>
<p>Thread races are difficult to detect, and fixing them can be very costly. A typical example is when one thread writes a value at some address, and another thread reads the value at that address. Depending on which thread access the address first, the outcome of the program will be different. Two threads performing a non-atomic write at the same address simultaneously results in some unpredictable value.</p>
<p>One can use a mutex to prevent conflicting read/write for non-atomic operations. But racing threads (e.g., who reads/writes first) must be resolved with synchronization, which can be quite complicated. Moreover it can hurt performances.</p>
<h4><strong><a name="unordered"></a>4. Iteration on unordered data</strong></h4>
<p>Iterating data with some random order can make a program non repeatable. This pattern is often encountered, and is easy to fix.</p>
<p>For example an algorithm produces a result via a visitor that assumes a total order. The developer uses an incorrect visitor, which enumerates data in a random order, usually depending on the memory allocation of the data container. E.g., instead of using a <span style="font-family: courier;">std::set</span> as a container, the developer uses a <span style="font-family: courier;">std::hash_set</span> (or a <span style="font-family: courier;">tr1::unordered_set</span> instead of a <span style="font-family: courier;">tr1::ordered_set</span>). Forcing a total order on the data fixes the problem.</p>
<p>Note that the fix may be incomplete if it simply transforms a type (4) non-determinism into a type (5) or (6) non-determinism, which we discuss below.</p>
<h4><strong><a name="memory_address"></a>5. Ordering by pointer value</strong></h4>
<p>This type of non-determinism is extremely common. For instance, a developer uses a <span style="font-family: courier;">tr1::ordered_set</span> as a container of pointers, and feels that the visitor is deterministic. It is indeed deterministic, but only w.r.t. the memory addresses allocated to the data, which depend on factors out of the application’s control.</p>
<p>One way of addressing the problem is to force a specific memory addressing scheme, but that requires a very fine control of the memory allocator and is therefore complicated.  A more common way consists in assigning a unique ID to an object at the time of its creation. The ID can be a 32 bit unsigned integer that is incremented for every new object. A total ordering, independent from the objects’ actual memory addresses, is then obtained from the IDs. It is a simple solution, as long as one can afford the extra 4 bytes for every object. ID-based sorting with no memory penalty can be obtained using custom memory allocators.</p>
<h4><strong><a name="time_stamp"></a>6. Ordering by time stamps</strong></h4>
<p>Note though that the total ID-based ordering described above is exactly the order of creation of the objects. It is no different from an order that depends on time stamps. Therefore two equal sets of inputs that only differ in the order will be visited in a different order, which can lead to different results. This leads us to the type (7) of non-determinism.</p>
<h4><strong><a name="non_canonical_labelling"></a>7. Ordering induced by a non-canonical labeling</strong></h4>
<p>Type (7) non-determinism often goes unrecognized, or is simply ignored. The idea is that as long as the same input is given to a program (but possibly in a different order or form), the output should be the same. If the input can somehow be normalized to a form that captures the notion of “same input”, then the program can be made insensitive to the format of the input. That is of course assuming that the normalization process run time penalty is not too high.</p>
<p>This normalization process is better defined as canonization. Formally, let O be a set of objects, and let EQ be an equivalence relation that captures the notion of “same” on these objects. A function Canon maps an object onto its canonical form, and is such that for any two objects o1 and o2, o1 and o2 are the same (i.e., o1 EQ o2) if and only if Canon(o1) = Canon(o2).</p>
<p>For instance, a set of integers can be represented by a number of containers (a list, an array, a hash set, a binary tree, etc). A canonical form can simply consist in sorting the integers. Two sets that are equal because they contain the very same integers, but that are initially given in different orders and forms, will end up in the same canonical form. Since sorting is an O(n log n) algorithm, this is an efficient canonization.</p>
<p>Canonization can be much more costly. A Boolean function can be represented in many ways, e.g., with a truth table, a Conjunctive Normal Form (CNF), a Disjunctive normal form (DNF), a decision diagram, etc (see below). Boolean function canonization is at least NP-hard, since it solves the satisfiability problem (SAT). In practice this means that Boolean function canonization algorithms have an exponential complexity.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/Boolean-function.png"><img class="aligncenter size-full wp-image-1257" title="Boolean function" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/Boolean-function.png" alt="" width="500" /></a></p>
<p>&nbsp;</p>
<p>Canonization can also be elusive. Let us consider the problem of drawing a graph in some aesthetic way (e.g., such that the nodes are evenly distributed and such that there is a minimum number of crossing edges). One would like the graph to be drawn the very same way, regardless of its representation (adjacency list or adjacency matrix), and regardless of the order the adjacency information is given. For instances the three graphs below, although looking different, are exactly the same, and can be drawn without any edge crossing as shown on the right side.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/graph-drawing-2.png"><img class="size-full wp-image-1259 aligncenter" title="graph drawing 2" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/graph-drawing-2.png" alt="" width="500" /></a></p>
<p>&nbsp;</p>
<p>Graph canonization is also known as graph labeling. It is at least as hard as graph isomorphism, one of these rare problems that are in NP but that are not known to be NP-complete or in P. Although all existing graph canonization algorithm have an exponential worst-case complexity, it is believed that graph canonization can be done in polynomial time.</p>
<h4><strong>Conclusion </strong></h4>
<p><strong> </strong></p>
<p>The most common  cause for non-determinism is related to some unreliable data order. The ultimate solution to make a program insensitive to the form of its input is to canonize its input as a pre-processing step. This proves to be a challenging and costly task in some cases. Whenever possible, canonization (or some imperfect normalization) goes a long way to make the application consistently repeatable.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is software quality?</title>
		<link>http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/</link>
		<comments>http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/#comments</comments>
		<pubDate>Sun, 10 Apr 2011 06:21:24 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=1125</guid>
		<description><![CDATA[CodeProject The quality of software is assessed by a number of variables. These variables can be divided into external and internal quality criteria. External quality is what a user experiences when running the software in its operational mode. Internal quality refers to aspects that are code-dependent, and that are not visible to the end-user. External [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/">What is software quality?</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043" rel="tag">CodeProject</a><br />
<a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/04/qualityassurance.jpg"><img class="alignright size-medium wp-image-1156" title="qualityassurance" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/04/qualityassurance-300x200.jpg" alt="" width="300" height="200" /></a>The quality of software is assessed by a number of variables. These variables can be divided into external and internal quality criteria. External quality is what a user experiences when running the software in its operational mode. Internal quality refers to aspects that are code-dependent, and that are not visible to the end-user. External quality is critical to the user, while internal quality is meaningful to the developer only.</p>
<p>Some quality criteria are objective, and can be measured accordingly. Some quality criteria are subjective, and are therefore captured with more arbitrary measurements.</p>
<p>The table below lists the most obvious software quality criteria, as well as some lesser known.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="118"></td>
<td style="text-align: center;" valign="top" width="59">User</td>
<td style="text-align: center;" valign="top" width="63">Developer</td>
<td style="text-align: center;" valign="top" width="95">Measurable</td>
</tr>
<tr>
<td colspan="4" valign="top" width="334">External quality</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#features">features</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#speed">speed</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#space">space</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#network">network usage</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#stability">stability</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#robustness">robustness</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#eou">ease-of-use</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">subjective</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#determinism">determinism</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#compatibility">back-compatibility</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#security">security</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">difficult</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#power">power consumption</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">difficult</td>
</tr>
<tr>
<td colspan="4" valign="top" width="334">Internal quality</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#coverage">test coverage</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#testability">testability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">hard</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#portability">portability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#thread">thread-safeness</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">hard</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#conciseness">conciseness</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#maintainability">maintainability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">hard</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#documentation">documentation</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">subjective</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#legibility">legibility</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">subjective</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#scalability">scalability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
</tbody>
</table>
<p>By definition the internal quality (code characteristics) is a concern to the developer only, while all the external quality aspects (coming from using the software) are critical to the end user. However the developer has also interests in performances (speed, space, network usage) and determinism, because they make testing the software easier. Developers treat ease-of-use, back-compatibility, security, and power consumption as requirements.</p>
<p>It is important to consider how difficult it is to measure each of these criteria. It can be difficult because there is no simple variable to look at, or because the measurement process is costly, or because it requires a complex infrastructure. For instance, speed has an objective measurement that is easy to measure. Power consumption has a simple measurement (how many µW the application consumes), but it is complex to measure. Security is difficult and costly to estimate.</p>
<p><a name="features"></a> <strong>Features</strong>. This is the very reason for the software to be written: to provide a service. By feature we really mean the output produced by the software –e.g., a numerical result, a string, a screen shot, a web page, an audio, etc&#8211;, regardless of the performances (speed, memory).</p>
<p><a name="speed"></a><strong>Speed</strong>. How quickly does the application provide the service? The user experiences the actual time elapsed between the moment she request the service, and the moment the service is delivered. The real elapsed time, or wall time, is the sum of the CPU time, system time, and network latency. Thus the developer should not only focus on the CPU time (how much time the CPU actually spends on executing the program). The CPU time can easily be overshadowed by disk access (a write on the disk is very costly), swapping (due to an excessive virtual memory size), or time spent by the network (latency issue, or too many round trips).</p>
<p><a name="space"></a><strong>Space</strong>. How much RAM and disk space is taken by the application? The aggregate numbers are important –peak memory, virtual memory size, etc. But even more so, how often do we move data that triggers a CPU cache miss or a disk write, has a dominant impact on the speed of the application. A mediocre data design can lead to very poor performances.</p>
<p><a name="network"></a><strong>Network usage</strong>. It is a matter of bandwidth and latency. Mismanaging sockets and channels can lead to unnecessary extra time spent in opening and closing sockets, handshakes, and round trips. As for memory, caching techniques can be used to reduce consuming network resources.</p>
<p><a name="stability"></a><strong>Stability</strong>. How often does one need to patch the software to fix problems? For the user, this is an inconvenience. For the developer, it means that the code is fragile and might benefit from better testing or partial rewrite.</p>
<p><a name="robustness"></a><strong>Robustness</strong>. How often does the application stale, freeze, or crash? How tolerant is it to extreme conditions –limited CPU and memory/disk/network resources, corner cases, system failure or unresponsive 3<sup>rd</sup>-party resources? This aspect is strongly related to testability and coverage.</p>
<p><a name="eou"></a><strong>Ease-of-use</strong>. It can be a very subjective factor, hard to quantify. It includes user documentation, clarity of the error message, management of exceptions, and recovery after failure.</p>
<p><a name="determinism"></a><strong>Determinism</strong>. Also known as repeatability: does the program produce the very same result given the same input? There are many reasons for which a program can exhibit a non-repeatable behavior. A non-repeatable behavior is confusing and frustrating for the user. This also makes the program very difficult to test and debug. Repeatability is strongly dependent on a good data model design.</p>
<p><a name="compatibility"></a><strong>Back-compatibility</strong>. Can a new version of the application be used with an older version’s data? It is essential to the user, because a new version should not require a costly migration of the existing data.</p>
<p><a name="security"></a><strong>Security</strong>. Who is authorized to access the data? Can the data processed by the application be compromised? This is a crucial aspect of many applications, and it is getting more and more difficult to assess with the dissemination of mobile and web-based software.</p>
<p><a name="power"></a><strong>Power consumption</strong>. It is increasingly important with mobile applications, as a program may have to consider how it manages the device’s power producers and consumers (battery, cores, wireless, screen, audio), and not to rely entirely on the operating system.</p>
<p><a name="coverage"></a><strong>Test coverage</strong>. What is the proportion of code that is executed by some unit or regression test? This is measured by the number of lines, number of functions, and number of control branches that are exercised by the tests. Usually one expects coverage of at least 85% for any moderately complex application. In practice reaching high coverage can be achieved only if testability is high, which has deep implication on the architecture and development methodology.</p>
<p><a name="testability"></a><strong>Testability</strong>. An often overlooked or simply ignored aspect of code development, testability is the ability to trigger any specific line of code or branching condition. Highly testable code requires a discipline of architecture and development that is difficult to find. It very costly to fix poorly testable software, as this requires major redesign. This justifies major investment in software architecture, design, and development methodologies.</p>
<p><a name="portability"></a><strong>Portability</strong>. Can the application run on 32 and 64 bits machines? Should it run on a mobile phone? Does it run on multiple OS (e.g., Windows, Linux, Mac OS-X, Solaris, iOS, Android, RIM)? Does it run smoothly on all web browsers (IE, Firefox, Chrome, Safari, Opera)?</p>
<p><a name="thread"></a><strong>Thread-safeness</strong>. Is a specific component thread-safe? Can two threads collide on non-atomic operations? Can the application get into a deadlock? As concurrency is still mostly the result of a manual process (there no compiler that automatically parallelizes the code), these questions are critical to ensure the good functioning of a program, as well as its performance –it is not rare to see the a program running <em>slower</em> when two many threads are available, as the cost of synchronization can become dominant.</p>
<p><a name="conciseness"></a><strong>Conciseness</strong>. Also known as compactness. Is there any dead code, or duplicated code? Is the code shared and factorized properly? A compact code usually means faster compilation and smaller binary size. Also compactness naturally leads to fewer bugs, because the number of bugs is historically <a href="../2009/10/13/test-driven-design/#kloc_per_defect">constant</a> w.r.t. code size.</p>
<p><a name="maintainability"></a><strong>Maintainability</strong>. How easy it is to debug the code? How fast is it to provide a fix? How quickly can a new developer understand the code? Maintainability is a very important aspect, quite difficult to quantify. Maintainability is increased with good testability and flexible (abstract) design.</p>
<p><a name="documentation"></a><strong>Documentation</strong>. This is a pretty subjective topic. Some people claim that a separate documentation written in plain English is necessary. Some others state that at least 30% of the code should be comments. Some finally argue that the code itself is the best documentation –the names of the types, classes, functions and arguments, together with plenty of assertions.</p>
<p><a name="legibility"></a><strong>Legibility</strong>. Also known as readability. This is another subjective topic. It is about how easy it is to read the code. Guidelines are established to unify the style of the code, so that a developer can easily read code written by another developer. Code guidelines abound, and they go from a small set of directives, to a full set of rules that specify every syntactical aspect of the language. For example, see <a href="http://hem.passagen.se/erinyq/industrial/" rel="nofollow">Industrial Strength C++</a>, <a href="http://www.codingstandard.com/HICPPCM/index.html" rel="nofollow">High Integrity C++</a>, <a href="http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml" rel="nofollow">Google C++ Style Guide</a>, and <a href="http://www.maultech.com/chrislott/resources/cstyle/" rel="nofollow">many</a> <a href="http://www.possibility.com/Cpp/CppCodingStandard.html" rel="nofollow">more</a>.</p>
<p><a name="scalability"></a><strong>Scalability</strong>. How easy it is to extend a feature? Or to add a new one? Or to add extra cores, or increase the size of the cluster the application runs on? Again, this is all about software architecture and anticipating future needs.</p>
<p>Software quality is the result of the user experience. But software quality should not and cannot be a reactive action to external defects. Software quality is built from the ground up, with design and development methodologies, and with a special focus on testability, coverage, and flexibility.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to write abstract iterators in C++</title>
		<link>http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/</link>
		<comments>http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 21:18:32 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=859</guid>
		<description><![CDATA[CodeProject When developing in C++, an impeccable API is a must have: it has to be as simple as possible, abstract, generic, and extensible. One important generic concept that STL made C++ developers familiar with is the concept of iterator. An iterator is used to visit the elements of a container without exposing how the [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/">How to write abstract iterators in C++</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a></p>
<p>When developing in C++, an <a href="../2009/10/08/api-design-101/">impeccable API</a> is a must have: it has to be as simple as possible, abstract, generic, and extensible. One important generic concept that STL made C++ developers familiar with is the concept of iterator.</p>
<p>An iterator is used to visit the elements of a container without exposing how the container is implemented (e.g., a vector, a list, a red-black tree, a hash set, a queue, etc). Iterators are central to generic programming because they are an interface between containers and applications. Applications need access to the elements of containers, but they usually do not need to know how elements are stored in containers. Iterators make possible to write generic algorithms that operate on different kinds of containers.</p>
<p>For example, the following code snippet exposes the nature of the container –a vector.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="cpp">     void process(const std::vector&lt;E&gt;&amp; v)
     {
         for (unsigned i = 0; i &lt; v.size(); ++i) {
             process(v[i]);
         }
     }</pre>
<p>If we want to have the same function operating on a list, we have to write a separate function. Or if we later decide that a list or a hash set is more appropriate as a container, we need to rewrite the code everywhere we access the vector. This may require a lot of changes in many files. Contrast this container-specific visitation scheme to the following:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template &lt;typename Container&gt;
     void process(const Container&amp; c)
     {
         typename Container::const_iterator itr = c.begin();
         typename Container::const_iterator end = c.end();
         for (; itr != end; ++itr) {
             process(*itr);
         }
     }</pre>
<p>Using the notion of iterator, we have a generic processing of a container ‘c’, whether it is a vector, a list, a hash set, or any data structure that provides iterators in its API. Even better, we can write a generic process function that only takes an iterator range, without assuming that the container has a begin() and end() method:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template &lt;typename Iterator&gt;
     void process(Iterator begin, Iterator end)
     {
         for (; itr != end; ++itr) {
             process(*itr);
         }
     }</pre>
<p>An STL iterator is a commodity that behaves as a scalar type:</p>
<ul>
<li>It can      be allocated on the heap</li>
<li>It can      be copied</li>
<li>It can      be passed by value</li>
<li>It can      be assigned to</li>
</ul>
<p>The essence of an iterator is captured by the following API.</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template &lt;typename T&gt;
     class Itr {
     public:
         Itr();
         ~Itr();
         Itr(const Itr&amp; o);                   <span style="color: #ff0000;">// Copy constructor</span>
         Itr&amp; operator=(const Itr&amp; o);        <span style="color: #ff0000;">// Assignment operator</span>
         Itr&amp; operator++();                   <span style="color: #ff0000;">// Next element</span>
         T&amp;   operator*();                    <span style="color: #ff0000;">// Dereference</span>
         bool operator==(const Itr&amp; o) const; <span style="color: #ff0000;">// Comparison</span>
         bool operator!=(const Itr&amp; o) const { return !(*this == o); }
     }</pre>
<p>Usually the container will provide a begin() and end() method, which build the iterators that denote the container’s range. Writing these begin/end methods is an easy task if the container is derived from a STL container, if the container has a data member that is an STL container, or if the iterator is a scalar type, like a pointer or an index.</p>
<p>It is more complicated if we want iterators that dereference to the same type of object, but that must visit several containers, possibly of different types, or iterators that visit containers in different manners. For instance let us assume that we have objects with some property (say, a color) stored in several containers, some of them of different types. We would like to visit all the objects, independently of the number of containers and their type, or we would like to visit objects of a given color, or we would like to visit objects that satisfy some predicate:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     class E;

     Itr&lt;E&gt; begin(); <span style="color: #ff0000;">// This give the range to visit</span>
     Itr&lt;E&gt; end();   <span style="color: #ff0000;">// all the elements of type E  </span>    

     Itr&lt;E&gt; begin(const Color&amp; color); <span style="color: #ff0000;">// Same as above but only for the</span>
     Itr&lt;E&gt; end(const Coir&amp; color);    <span style="color: #ff0000;">// elements of the given color</span>      

     class Predicate {
     public:
         bool operator()(const E&amp; e);
     };      

     Itr&lt;E&gt; begin(Predicate&amp; p); <span style="color: #ff0000;">// Same as above but only for the</span>
     Itr&lt;E&gt; end(Predicate&amp; p);   <span style="color: #ff0000;">// elements that satisfy the predicate</span></pre>
<p>In this case the iterator is more complex than a scalar type like a pointer or an index: it needs to keep track of which container it is currently visiting, or which color or predicate it needs to check. In general, the iterator may have data members so that it can fulfill its task. Also we want to factorize the code and reuse general purpose iterators’ methods when writing more targeted iterators –e.g., visiting elements of a specific color should make use of the next-element method Itr&lt;E&gt;::operator++(). This can be done by having Itr&lt;E&gt; be a virtual class, and having derived classes to implement the different iterators. For example:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     class E {
     public:
         Color&amp; color() const;
     };      

     template &lt;typename E&gt;
     class ColoredItr&lt;E&gt; : public Itr&lt;E&gt; {
     private:
         typedef Itr&lt;E&gt; _Super;
     public:
         ColoredItr&lt;E&gt;(const Color&amp; color) : Itr&lt;E&gt;(), color_(color) {}
         virtual ~ColoredItr&lt;E&gt;;
         virtual ColoredItr&lt;E&gt;&amp; Operator++() {
            for (; _Super::operator*().color() != color_; _Super::operator++());
            return *this;
         }
     private:
         Color color_;
    };</pre>
<p>We would like a generic iterator that meets all the requirements described above:</p>
<ul>
<li>It can      be allocated on the heap</li>
<li>It can      be copied</li>
<li>It can      be passed by value</li>
<li>It can      be assigned to</li>
<li>It dereferences      to the same type</li>
<li>It can      visit several containers</li>
<li>It can      visit containers of different types</li>
<li>It can      visit containers in arbitrary manners</li>
</ul>
<p>This can be implemented as follows.</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template&lt;typename E&gt;
     class ItrBase {
     public:
         ItrBase() {}
         virtual ~ItrBase() {}
         virtual void  operator++() {}
         virtual E&amp;    operator*() const { return E(); }
         virtual ItrBase* clone() const { return new ItrBase(*this); }
         <span style="color: #ff0000;">// The == operator is non-virtual. It checks that the
         // derived objects have compatible types, then calls the
         // virtual comparison function equal.</span>
         bool operator==(const ItrBase&amp; o) const {
             return typeid(*this) == typeid(o) &amp;&amp; equal(o);
         }
     protected:
         virtual bool equal(const ItrBase&amp; o) const { return true; }
     };      

     template&lt;typename E&gt;
     class Itr {
     public:
         Itr() : itr_(0) {}
         ~Itr() { delete itr_; }
         Itr(const Itr&amp; o) : itr_(o.itr_-&gt;clone()) {}
         Itr&amp; operator=(const Itr&amp; o) {
             if (itr_ != o.itr_) { delete itr_; itr_ = o.itr_-&gt;clone(); }
             return *this;
         }
         Itr&amp;  operator++() { ++(*itr_); return *this; }
         E&amp;    operator*() const { return *(*itr_); }
         bool  operator==(const Itr&amp; o) const {
             return (itr_ == o.itr_) || (*itr_ == *o.itr_);
         }
         bool  operator!=(const Itr&amp; o) const { return !(*this == o); }      

     protected:
         ItrBase&lt;E&gt;* itr_;
     };</pre>
<p>The ItrBase class is the top class of the hierarchy. Itr is simply a wrapper on a pointer to an ItrBase, so that it can be allocated on the heap –the actual implementation of the class deriving from ItrBase can have an arbitrary size. Note how the Itr copy and assignment operators are implemented via the ItrBase::clone() method, so that Itr behaves as a scalar type. Last but not least, the (non-virtual) ItrBase::operator== equality operator first checks for type equality before calling the (virtual) equality method equal on the virtual subclass. The reason ItrBase is not a pure virtual is that it can conveniently be used to denote an empty range, i.e., the range (ItrBase(), ItrBase()) is empty.</p>
<p>Iterators on containers of elements of type E just need to derive from ItrBase&lt;E&gt;, and a factory providing the begin() and end() methods for any specialized iterator returns object of type Itr&lt;E&gt;.</p>
<p>For example, let us assume that we have a container c of E&#8217;s, and that we want an iterator to visit (1) all the elements of c, possibly with repetition; (2) all the elements of c without repetition. This can be done as follows.</p>
<pre style="color: #000000; background-color: #ffe3c1;">    class E;

    class ItrAll : public ItrBase&lt;E&gt; {
    private:
        typedef ItrAll     _Self;
        typedef ItrBase&lt;E&gt; _Super;
    public:
        ItrAll(Container&amp; c) : _Super(), c_(c) {}
        virtual ~ItrAll() {}
        virtual void  operator++() { ++itr_; }
        virtual E&amp;    operator*() const { return *itr_; }
        virtual ItrBase&lt;E&gt;* clone() const { return new _Self(*this); }
    protected:
        virtual bool equal(const ItrBase&lt;E&gt;&amp; o) const {
            <span style="color: #ff0000;">// Casting is safe since types have been checked by _Super::operator==</span>
            const _Self&amp; o2 = static_cast&lt;const _Self&amp;&gt;(o);
            return &amp;c_ == &amp;o2.c_ &amp;&amp; itr_ == o2.itr_;
        }
    protected:
        Container&amp;          c_;
        Container::iterator itr_;
    };     

    class ItrNoRepeat : public ItrAll {
    private:
        typedef ItrNoRepeat _Self;
        typedef ItrAll      _Super;
    public:
        ItrNoRepeat (Container&amp; c) : _Super(c) {}
        virtual ~ItrNoRepeat () {}
        virtual void  operator++() {
            _Super::operator++(); <span style="color: #ff0000;">// Go to the next element then
            // look for an element that has not been visited yet.</span>
            for (; itr_ != c_.end(); _Super::operator++()) {
                E&amp; e = _Super::operator*();
                if (visited_.find(e) == visited_.end()) {
                    visited_.insert(e);
                    return;
                }
            }
        }
        virtual E&amp;    operator*() const { return _Super::operator*(); }
        virtual ItrBase&lt;E&gt;* clone() const { return new _Self(*this); }
    protected:
        virtual bool equal(const ItrBase&lt;E&gt;&amp; o) const { return _Super::equal(o); }
    protected:
        set&lt;E&gt; visited_;
    };     

    <span style="color: #ff0000;">// Build the container’s range w/ and w/o repetition</span>
    Itr&lt;E&gt; begin(Container&amp; c, bool noRepeat = false)
    {
        Itr&lt;E&gt; o;
        if (noRepeat) {
            o.itr_ = new ItrNoRepeat(c);
        } else {
            o.itr_ = new ItrAll(c);
        }
        o.itr_-&gt;itr_ = c.begin();
        return o;
    }     

    Itr&lt;E&gt; end(Container&amp; c, bool noRepeat = false)
    {
        Itr&lt;E&gt; o;
        if (noRepeat) {
            o.itr_ = new ItrNoRepeat(c);
        } else {
            o.itr_ = new ItrAll(c);
        }
        o.itr_-&gt;itr_ = c.end();
        return o;
    }</pre>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>What EDA needs to change for 2020 success?</title>
		<link>http://www.ocoudert.com/blog/2009/11/06/what-eda-needs-to-change-for-2020-success/</link>
		<comments>http://www.ocoudert.com/blog/2009/11/06/what-eda-needs-to-change-for-2020-success/#comments</comments>
		<pubDate>Sat, 07 Nov 2009 01:24:52 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ASIC]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[verification]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=504</guid>
		<description><![CDATA[ICCAD’09 was a fairly good vintage. It started Monday morning with an excellent keynote from Hamid Pirahesh about cloud computing. The same day in the afternoon, a more EDA-focused discussion was initiated by Jim Hogan and Paul McLellan (slides can be found here), asking the question “What EDA needs to change for 2020 success?” Paul [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2009/11/06/what-eda-needs-to-change-for-2020-success/">What EDA needs to change for 2020 success?</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/19/the-formal-verification-market-is-still-untapped/' rel='bookmark' title='The formal verification market is still untapped'>The formal verification market is still untapped</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.iccad.com/2009/index.html" target="_blank">ICCAD’09</a> was a fairly good vintage. It started Monday morning with an excellent <a href="http://www.iccad.com/events/eventdetails.aspx?id=106-100">keynote</a> from Hamid Pirahesh about cloud computing. The same day in the afternoon, a more EDA-focused discussion was initiated by Jim Hogan and Paul McLellan (slides can be found <a href="http://leepr.com/PDF/iccad09_20091030.pdf">here</a>), asking the question “What EDA needs to change for 2020 success?”</p>
<p>Paul rightly <a href="http://www.edn.com/blog/920000692/post/920050292.html">emphasized</a> three trends. The first one is well know: the continuously rising cost of IC designs, about $50M for today’s 45nm node. The second trend is that the fastest growing part of the design cost is software –more than half of the overall cost, Paul even claiming close to 2/3 of the overall cost. The third trend is an increasingly fragmented consumer market: the number of end products goes into the 10’s of billions, but these products are declined in many more different kinds, which means that most of them are shipping in smaller individual volumes.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/11/units_versus_time_and_market_size.png"><img class="aligncenter size-full wp-image-506" title="units_versus_time_and_market_size" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/11/units_versus_time_and_market_size.png" alt="units_versus_time_and_market_size" width="500" /></a></p>
<p>Source: <em>Morgan Stanley, <a href="http://www.morganstanley.com/institutional/techresearch/pdfs/MS_Economy_Internet_Trends_102009_FINAL.pdf" target="_blank">Economy + Internet Trends</a>, Web 2.0 Summit, San Francisco, Oct 2009.</em></p>
<p>This is bad news for EDA as we know it: the rising cost of design can no longer be justified if the number of units does not grow fast enough (a $50M chip starts to make sense only if it is produced for 250M units and more). Also EDA has been slow to climb up the food chain and proposes solutions for software design, which dominates the overall chip design cost.</p>
<p>Rising IC design cost and smaller number of units is the call for FPGA to growth even faster. Mobile applications require FPGA to do much better in terms of power consumption, but this is a hot topic (no pun intended) drawing a lot of attention and investment, and some competitive solution will emerge in the next few years. So EDA, which makes its bread and butter on IC design, should better re-align its growth strategy on software, embedded systems, HW/SW co-design, and verification. Else EDA will continue to shrink to only service the few that can still afford chip design.</p>
<p>The end product, as a SoC, is a puzzle where the designer mostly assembles existing cores and IPs, and decides of the tradeoff between the software and hardware parts, based on flexibility and cost factors.</p>
<p>I see two strong needs that EDA could build its growth on. One is functional validation of the whole system &#8211;software plus hardware. EDA has started to address the issue, even though it is still short of proposing a scalable and automated environment. To functional validation, I would also add <em>functional flexibility</em>: how much of the behavior can be upgraded thanks to the software part? The other need is a design navigator that would estimate the speed, area, power consumption, and cost of a SoC by exploring alternatives between cores (ARM, MIPS, etc), IPs, FPGA, and software.</p>
<p>Last but not least, the eternal question of an EDA serving a $250B semiconductor industry, but making less than $5B. The time-based license model has only served the interests of the semiconductor companies, to the expenses of R&amp;D investment in EDA. Claiming a lack of innovation in the EDA industry is sometimes fair, but EDA should also innovate in business solutions instead of cannibalizing itself by cutting costs to only survive another quarter. The semiconductor industry needs a healthy EDA if it wants to address the system-level design challenges of the next 10 years. Unless, of course, a new player coming from the software world with the experience of scalable systems signs the death of the EDA industry as we know it.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/19/the-formal-verification-market-is-still-untapped/' rel='bookmark' title='The formal verification market is still untapped'>The formal verification market is still untapped</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2009/11/06/what-eda-needs-to-change-for-2020-success/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>How can Xilinx improve its bottom line</title>
		<link>http://www.ocoudert.com/blog/2009/10/30/how-can-xilinx-improve-its-bottom-line/</link>
		<comments>http://www.ocoudert.com/blog/2009/10/30/how-can-xilinx-improve-its-bottom-line/#comments</comments>
		<pubDate>Fri, 30 Oct 2009 22:45:05 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[FPGA]]></category>
		<category><![CDATA[Altera]]></category>
		<category><![CDATA[India]]></category>
		<category><![CDATA[outsourcing]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[Xilinx]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=473</guid>
		<description><![CDATA[Last week I wrote a post discussing Xilinx and Altera Q3’09 results, and I mentioned Xilinx’ operation margin consistently trailing Altera’s by 3-4%. I had a few emails regarding that gap, and why that gap would be closed eventually. Let me address this topic with this post. Comparing the yearly fiscal exercises directly would be [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2009/10/30/how-can-xilinx-improve-its-bottom-line/">How can Xilinx improve its bottom line</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/15/what-to-read-in-xilinx%e2%80%99-and-altera%e2%80%99s-third-quarter-results/' rel='bookmark' title='What to read in Xilinx’ and Altera’s third quarter results'>What to read in Xilinx’ and Altera’s third quarter results</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/06/11/who-should-worry-about-xilinx-and-oasys-partnership/' rel='bookmark' title='Who should worry about Xilinx and Oasys partnership?'>Who should worry about Xilinx and Oasys partnership?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Last week I wrote a <a href="../2009/10/15/what-to-read-in-xilinx%E2%80%99-and-altera%E2%80%99s-third-quarter-results/">post</a> discussing Xilinx and Altera Q3’09 results, and I mentioned Xilinx’ operation margin consistently trailing Altera’s by 3-4%. I had a few emails regarding that gap, and why that gap would be closed eventually. Let me address this topic with this post.</p>
<p>Comparing the yearly fiscal exercises directly would be biased (Xilinx’ fiscal year end on March 31<sup>st</sup>, and Altera’s fiscal year on Dec 31<sup>st</sup>). Instead we can look at a quarter by quarter comparison, even though that can be too low a level. Better is to look for ttm (trailing twelve months) comparison to smooth out the local variations.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/XLNX-ALTR-income-statements1.png"><img class="aligncenter" title="XLNX ALTR income statements" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/XLNX-ALTR-income-statements1.png" alt="" width="450" /></a>Source: <em>Yahoo! Finance. All figures in thousands.<br />
</em></p>
<p>One can see that Altera’s operating margin is overall better. Also in their respective Q3’09 revenue reports, Xilinx expects its Q4’09 gross margin to improve to 62-63%, and Altera sees his to be 67-68%.  So a 3-4% operating margin gap will remain, which is significant.</p>
<p>On the other hand, Xilinx quotes 3145 full time employees, and Altera 2760. This means that a Xilinx employee brings back revenue about 26% higher than an Altera employee! So it all boils down to the question: how can Xilinx be more cost efficient?</p>
<p>One of the differences is the way software is developed. Altera’s software is mostly done in their technology center of Penang, Malaysia, with a very small core technology group in Toronto,  Canada. Xilinx’s software team is mostly in the US, and only 5% of the team is in their R&amp;D facilities in Hyderabad, India. A back-of-the-envelop calculation shows that if Xilinx had the same software team but with a US/India ratio 1/3-2/3, which is a healthy ratio for a company that can leverage its India facility, Xilinx would improve its operating margin by one point.</p>
<p>If you extend the same reasoning to whole R&amp;D –not only software&#8211;, then it is clear that Xilinx can get the upper hand. Looking at the R&amp;D job listings, it is also clear that Xilinx is moving into that direction. The question then is whether Xilinx has the structure and the drive to achieve such a transformation successfully.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/15/what-to-read-in-xilinx%e2%80%99-and-altera%e2%80%99s-third-quarter-results/' rel='bookmark' title='What to read in Xilinx’ and Altera’s third quarter results'>What to read in Xilinx’ and Altera’s third quarter results</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/06/11/who-should-worry-about-xilinx-and-oasys-partnership/' rel='bookmark' title='Who should worry about Xilinx and Oasys partnership?'>Who should worry about Xilinx and Oasys partnership?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2009/10/30/how-can-xilinx-improve-its-bottom-line/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Test-driven design, a methodology for low-defect software</title>
		<link>http://www.ocoudert.com/blog/2009/10/13/test-driven-design/</link>
		<comments>http://www.ocoudert.com/blog/2009/10/13/test-driven-design/#comments</comments>
		<pubDate>Tue, 13 Oct 2009 12:29:52 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[verification]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=294</guid>
		<description><![CDATA[CodeProject I wrote earlier about the good practices in designing APIs, which is so important when developing complex software. However one usually does not have the chance to start a product from scratch. This means that more often than ever, a software manager picks up an existing tool with an existing team. Making the tool [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2009/10/13/test-driven-design/">Test-driven design, a methodology for low-defect software</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/08/api-design-101/' rel='bookmark' title='API design 101'>API design 101</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a><br />
I wrote <a href="http://www.ocoudert.com/blog/2009/10/08/api-design-101/" target="_blank">earlier</a> about the good practices in designing APIs, which is so important when developing complex software. However one usually does not have the chance to start a product from scratch. This means that more often than ever, a software manager picks up an existing tool with an existing team. Making the tool more efficient –better QoR, faster runtime, smaller memory footprints, more stability, new features, etc— is made difficult by legacy code, awkward APIs, or plain wrong architecture. What to do then? We usually cannot afford to rewrite all or major parts of the product. Does that mean that we are stuck with an endless cycle of resource-intensive software incremental changes, often creating as many bugs that they are intended to fix?</p>
<p><strong>Defect rate</strong></p>
<p>First I would like to discuss the notion of software reliability and how it evolved over the past 40+ years. A defect causes an invalid behavior of a program with respect to its specification (e.g., incorrect output, performance issue, crash). One of many ways to look at software quality is to estimate its defect rate, i.e., the number of defects per line of code (loc), or more conveniently per 1,000 lines of code (kloc).</p>
<p>The first observation is that the larger the code, the higher its defect rate. It is estimated that the bug rate increases logarithmically with code size.</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/IBM_defect_study.png"><img class="aligncenter" title="IBM defect study" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/IBM_defect_study.png" alt="IBM defect study" align="middle" /></a><br />
Source: <em>Program Quality and Programmer Productivity, Capers Jones, IBM 1977</em></p>
<p>Thus the total number of defects for a specific application can be reduced by the following:</p>
<ol>
<li>Continuous      code factorization (direct loc reduction).</li>
<li>Use of      libraries (which have a reduced bug rate, thanks to the extensive exposure      they receive due to their long lifespan and high usage).</li>
<li>Increase      the expressive power of the programming language (indirect loc reduction).</li>
</ol>
<p>Since the introduction of FORTRAN in 1957, many languages and operating systems have been created and have grown more powerful and sophisticated. What could be typically coded in 10 klocs of FORTRAN can be coded today with less than 5 klocs of C++, and about 3-4 klocs of Java. Raising the level of abstraction of programming languages helps decreasing the total number of defects because it results in smaller programs with a lower defect rate.</p>
<p>Evidently, testing reduces the defect rate. A software powerhouse like Microsoft reports about 10-20 defects/klocs before QA, and claims that the rate drops to 1/kloc in released code. Looking at long lifespan and very critical code, statistic from the Jet Propulsion Laboratory shows that spacecraft software (which is typically only 20 klocs, and must run without interruption for years) reaches 6-10 defects/klocs after 2-5 years of testing. The code developed for the shuttle program is estimated to have less than 0.1 defect/klocs.</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/JPL_defect_data.png"><img class="aligncenter" title="JPL_defect_data" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/JPL_defect_data.png" alt="JPL defect data" /></a><br />
Source: <em>Nikora, Allen P., “Error Discovery Rate by Severity Category and Time to Repair Software Failures for Three JPL Flight Projects”, Software Product Assurance Section, Jet Propulsion Laboratory, November 5, 1991&#8243; </em></p>
<p><a name="kloc_per_defect"></a>Over the past 40 years, independent researches from academia and the private sector have shown that on average an application has a defect rate of 5.5/klocs, regardless of the programming language and the operating system used for development. This looks counterintuitive, since increasing the abstraction level of the programming language reduces the bug rate and the actual size of one specific application. But that progress is neutralized by the ever-increasing size and complexity of the programs, made possible by better software development methodologies and powerful development environments. To put a defect rate of 5.5/kloc in perspective, consider your typical EDA place-and-route product, say 3Mlocs of C/C++, with a likely high turnover rate (i.e., percentage of locs that are modified in every release). You can expect about 16,000 defects…</p>
<p><strong>Test-Driven Design</strong></p>
<p>Now I will present a method that I successfully used for both existing and from-scratch products. It is based on the observation that independently from the quality of the team and the advancement of the tool, the software complexity and the unpredictable evolution of the product makes managing the software quality quite problematic. Think EDA, where customers ask for new capabilities every week and salespeople sell features 6 or 12 months before they are actually developed. It is difficult, if not impossible, to have an upfront, clean, and frozen specification, from which an architecture and a set of APIs can be derived. One needs to change the architecture and the APIs because of new unpredicted features and unforeseen problems, or simply because the software is written in a hurry without the adequate resources &#8211;I have no doubt that most readers will agree on that last point. This creates bloated code with a high defect rate, which result in application with a larger number of bugs.</p>
<p>Test-driven design flips the traditional software development scheme upside-down. In most cases, the software development flow consist of (1) specify the requirements in some language (e.g., English, ML, C++ or Java header files), and (2) iterate a code/test loop until the software reaches a point where it is deemed stable enough to go through a full QA regression release process. This often leads to slow iterations between the release team and the R&amp;D team before the release is fully qualified. Also the essence of the original specification may be lost because there is no concrete way (read: operational semantics) to check whether the released product actually meets its intended requirements.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/classic_vs_tdd_software_development_flow.png"><img class="aligncenter size-full wp-image-309" title="classic_vs_tdd_software_development_flow" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/classic_vs_tdd_software_development_flow.png" alt="classic_vs_tdd_software_development_flow" width="600" /></a></p>
<p style="text-align: center;"><em>Traditional vs. test-driven software development flow</em></p>
<p>Contrast this with a test-driven design approach. In that methodology, the tests are written <em>before</em> anything else. The goal is to capture the specification with a set of small (positive <em>and</em> negative) unit tests. Then some code is written and run on the unit tests. Some of the tests fail, which lead to further refinement of both the unit tests and the code. This iteration write-test/code/test converges until one cannot design a new test that would break the code. The next step, QA regression release process, can then be carried on.</p>
<p>A few things are important to recognize in a test-driven software development methodology: (1) the spec <em>is</em> the set of unit tests; (2) therefore the release can be validated as meeting the spec; (3) the testing iteration handled by R&amp;D is closed when the unit tests <em>and</em> the code are fully stable, which leads to fewer iterations between the release and R&amp;D teams; and (4) this methodology does not assume anything about the intrinsic quality of the code and the strength of the development team. Indeed this approach can be used on very badly architected code and still lead to substantial improvements.  Also note that the unit tests can be internal, e.g., written in C++ and providing a self-testing mechanism, or more traditional with external data that are fed to the application.</p>
<p><strong>Case studies</strong></p>
<p>Let me give a few concrete examples. A tool I was in charge of contained some legacy code that performed an essential task in EDA: constant propagation (it consists of propagating logic values through a logic network, following basic computation rules, e.g., NOT(0) = 1, AND(0, 1) = 0, and AND(1, 1) = 1). The computational principles are simple, but a good constant propagation system should be lazy, incremental, support undo, may explain to the user why some constant occurs in some part of the network, etc.  This makes the development of the system much more challenging.</p>
<p>The legacy code produced crashes now and then. It was difficult to read, it contained suspicious piece of code to handle corner cases (e.g., multi-driver nets, user-set constants), and it had a poor testing coverage (&lt;50%). I decided to go for a full rewrite with a clean API, and unit tests were developed together with the new code following a TDD methodology. This resulted in 6267 loc of C++, 40% of which being unit tests (click the screenshot of the C++ unit tests below), made of 1415 asserts. That code was release in May 2007, got 3 reported defects until November 2007, and has been without defect since then.</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/screenshot_constant_annot_unit_test.png"><img class="size-full wp-image-298  aligncenter" title="screenshot_constant_annot_unit_test" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/screenshot_constant_annot_unit_test.png" alt="screenshot_constant_annot_unit_test" width="300" /></a></p>
<p>Another example is a C++ template’ized bitwise four-valued simulator, written to match the Verilog semantics. This was done with 8014 loc of C++, including 40% of unit tests, made of 1015 asserts (click the screenshot below: you can recognize the basic four-valued logic truth tables).  The template was self-tested with three different concrete instances of logic representation (on 2-tuples of bool, on strings made of 32 or 64 characters &#8217;0&#8242;, &#8217;1&#8242;, &#8216;x&#8217;, and &#8216;z&#8217;, and finally on an actual logic netlist).  No defect was ever found on the semantics.</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/screenshot_simulator_unit_test.png"><img class="size-full wp-image-299  aligncenter" title="screenshot_simulator_unit_test" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/screenshot_simulator_unit_test.png" alt="screenshot_simulator_unit_test" width="300" /></a></p>
<p>In both these cases, I had the opportunity of rewriting or starting from scratch. What if one has to improve on an existing system too large to be rewritten?</p>
<p>The third example is about a complex feature (sequential clock gating) that at the time had been released 6 months before. The field complained about inconsistencies and erratic behavior, so I decided to apply a TDD methodology to rectify the code. First hurdle, we established a unit test campaign, which consists of describing the spec in terms of unit tests in plain English and sketches. This produced 49 unit tests, as shown below (click to enlarge).</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/seq_clock_gating_unit_test_campaign.png"><img class="aligncenter size-full wp-image-300" title="seq_clock_gating_unit_test_campaign" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/seq_clock_gating_unit_test_campaign.png" alt="seq_clock_gating_unit_test_campaign" width="300" /></a></p>
<p>Second hurdle, we proceeded to translate these informal unit test descriptions into elementary RTL descriptions. The idea was that if the code was compliant to the spec, we could predict exactly which optimized netlist it would produce. Third hurdle, a 3<sup>rd</sup> party reviewed these 49 RTL tests, and found that 9 of them were faulty because they did not capture what was specified in the document. Once we fixed these tests came the fourth hurdle: we run the code.</p>
<p>The results were brutal: the code crashed on 3 tests, it synthesized a functionally incorrect netlist in 5 cases, and produced 13 suboptimal results. Overall, 21 failures out of 49 tests, a 43% defect rate! We then went through a 2 weeks iteration of unit test refinement and code fixing with a team that <em>never</em> touched the initial code, to eventually converge on 72 unit tests &#8211;many more than we could think of initially&#8211; and a usable feature.</p>
<p><strong>Conclusion</strong></p>
<p>Test-driven design (TDD) aims at capturing a spec with unit tests, then have some code successfully running these tests. The unit tests are more important than the code itself –any code that passed the unit tests meets the spec&#8211;. TDD initially requires a higher investment: writing unit tests to capture an expected behavior is a complex task, and a 3<sup>rd</sup> party review is needed to validate them. But the effort pays off: eventually the set of unit tests becomes the spec, and can even be used as documentation. Running unit tests is fast, so it dramatically reduces the R&amp;D testing time. Also once a code passes a comprehensive set of unit tests, the risk of iterating from QA back to R&amp;D is reduced. Overall, test-driven design increases code correctness and stability dramatically, even in the presence of a deficient architecture and legacy code.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/08/api-design-101/' rel='bookmark' title='API design 101'>API design 101</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2009/10/13/test-driven-design/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>API design 101</title>
		<link>http://www.ocoudert.com/blog/2009/10/08/api-design-101/</link>
		<comments>http://www.ocoudert.com/blog/2009/10/08/api-design-101/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 13:06:36 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://coudert.wordpress.com/?p=247</guid>
		<description><![CDATA[CodeProjectI built up products from scratch several time in my professional life. Usually it starts with a very small engineering team &#8211;sometimes I was the very first member of the team. This is a great opportunity to lay strong foundations for the subsequent software development, because one is in charge of the whole process. But [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2009/10/08/api-design-101/">API design 101</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a>I built up products from scratch several time in my professional life. Usually it starts with a very small engineering team &#8211;sometimes I was the very first member of the team. This is a great opportunity to lay strong foundations for the subsequent software development, because one is in charge of the whole process. But one does not always have the chance to start from scratch.</p>
<p>I also worked with already established products, with larger team and millions of lines of already existing code. The typical software management and development project always offers some cumbersome legacy code and API that survive years after years. The reason is not so much that people do not want to fix the problem, but that fixing the problem requires a major product architecture overhaul, which comes to a prohibitive cost. There are striking lessons in failed software architectures, and it all start with API design. I am sharing here my practical experience with  C++ projects, but most of these advices also apply to Java.</p>
<p><strong>Why is API so important?</strong></p>
<p>An API can be a company’s greatest asset: it captures communication and exchange of services in an application. A good API will naturally lead to more reuse, simpler code, and lower maintenance cost. If the API is public, a good API will also capture customers. There are examples of Java libraries that failed to be accepted not because they were inefficient, but because they very poorly designed.</p>
<p>An API can also be a company’s greatest liability: once the service has clients, one can no longer change the API!  Suspending or rewriting an API is very pricey in terms of time and money. In the case of a public API, cost also comes in terms of reputation. A public API is forever: there is only one chance to get it right.</p>
<p><strong>What is a good API?</strong></p>
<p>In today’s object-oriented software, writing an API is providing a service. Thus instead of thinking in terms of implementation and efficiency, one must first think in terms of modules and services: determine the usage model; establish the clients’ needs; and anticipate tomorrow’s needs.</p>
<p>Besides being powerful enough to satisfy the requirements, an API should be designed with two principles in mind:</p>
<ol>
<li><strong>Keep      it simple! </strong>An API must be easy to learn and use, even without documentation. The API must be hard to misuse. Functionality should be easy to explain &#8211;if it is hard to name, it is likely a bad function. Use simple, consistent naming, and the code will read like a prose –Java libraries and STL are good inspirations for naming conventions. The API should be as small as possible: you can always add to an API, but you can never remove. A method should not take more than 3-4 parameters –else wrap the parameters in a class that can be augmented later.</li>
<li><strong>Keep      it abstract!</strong> An API must allow extension for future needs. For example, it should not assume anything about the implementation. It should minimize accessibility to implementation-specific details –an API, once public, <em>will</em> be used, and you do not want to expose the ugly details of a database.</li>
</ol>
<p>In theory, an API should be written before going into some implementation. Gathering requirements is the first step. Requirements must be case-driven, specific, and should be questioned relentlessly until proven to be must-have. The API should then be written in the target language (C++, Java, etc): this will force the development team to make choices, and to keep the API simple and abstract enough –nobody wants to have too much to discuss!  Then the API should be reviewed and made final in a public forum with the two principles above in mind: keep it simple (so that it is easy to support) and abstract (so that it is easy to extend).</p>
<p>An API should be documented, but well-designed APIs are sometimes self-explanatory. An API should answer the following questions about its components.</p>
<ol>
<li>Class:      what does an instance of a class represent?  Is that a singleton class?  Is there a factory?  Who owns the memory?</li>
<li>Method:      what does it do?  What is the      contract between the client and the instance?  Is there any precondition and post-condition?  Is there any side effect?</li>
<li>Parameters:      what do they represent?  Which      information do they carry?  Who own      them?</li>
<li>Exceptions:      who throw exceptions?  What do they      mean?  What to do when catching one?</li>
</ol>
<p><strong>API and performances</strong></p>
<p><strong> </strong></p>
<p>Bad API decision can limit performances. When designing an API, it is good to consider the following rules.</p>
<ol>
<li>Avoid      mutability. If a method returns a      mutable instance, that instance needs to be created somewhere, which raises      the question of memory ownership. Also      mutable classes limit thread-safeness. Use ‘const’ whenever possible.</li>
<li>Avoid      implicit call to copy and assignment operators. This is a waste of resources if you can      use references. Declare these      operators ‘explicit’ or ‘private’ to catch any misuse at compile time.</li>
<li>A factory      is often better than constructors. A      factory has full control on how instances are created and when they should      be released (shared model, garbage collection, save/restore, caching and disk      mirroring, etc). A factory can return      an instance of a sub-class.</li>
<li>Avoid      exposing implementation details. It      may prevent later improvements of a database. Never expose data members of a class,      always use get/set accessors.</li>
<li>Question      the thread-safeness of computational-intensive methods. One day the software may run on a grid or      in a cloud.</li>
<li>Never      compromise the rules above for a small runtime or memory improvement. For the vast majority of the      applications, going a few percents faster is not worth the maintenance      nightmare it can imply.</li>
</ol>
<p><strong>Final word</strong></p>
<p>A good API is a key to produce smaller and simpler code, which makes the product more stable and easier to maintain. Designing a good API is a collaborative effort, and a formal decision process is needed to freeze an API. A good API is hard to write, get your best people on it. And finally, a public API is forever. May these simple rules guide your next project.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2009/10/08/api-design-101/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Software outsourcing, a necessary evil</title>
		<link>http://www.ocoudert.com/blog/2009/09/20/software-outsourcing-a-necessary-evil/</link>
		<comments>http://www.ocoudert.com/blog/2009/09/20/software-outsourcing-a-necessary-evil/#comments</comments>
		<pubDate>Mon, 21 Sep 2009 01:10:43 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[China]]></category>
		<category><![CDATA[India]]></category>
		<category><![CDATA[outsourcing]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://coudert.wordpress.com/?p=53</guid>
		<description><![CDATA[CodeProjectHere are the definitions of two words that have a bad press, especially in these harsh economic times: Outsourcing (included in dictionaries in 1979): the procuring of services or products, such as the parts used in manufacturing a motor vehicle, from an outside supplier or manufacturer in order to cut costs. Offshoring: relocation by a [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2009/09/20/software-outsourcing-a-necessary-evil/">Software outsourcing, a necessary evil</a></p>
No related posts.]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a>Here are the definitions of two words that have a bad press, especially in these harsh economic times:</p>
<ul>
<li><a href="http://www.tfd.com/outsourcing">Outsourcing</a> (included in      dictionaries in 1979): the procuring of services or products, such as the      parts used in manufacturing a motor vehicle, from an outside supplier or      manufacturer in order to cut costs.</li>
<li><a href="http://encyclopedia.tfd.com/offshoring">Offshoring</a>: relocation      by a company of a business process from one country to another &#8211;typically      an operational process, such as manufacturing, or supporting processes,      such as accounting.</li>
</ul>
<p>In many service and manufacturing industries, outsourcing implies that the 3<sup>rd</sup>-party provider is established abroad, where the cost of labor and production is lower, or where the environmental laws and ethic is held to a lower standard.   Signs of the times, outsourcing is often used interchangeably with offshoring.</p>
<p>Massive offshoring started with textile and clothe industry in the late 70’s.  Then came toys, TV, hotlines, help desks, cars and electronics in the 80’s, to be followed by software in the late 90’s.  Over the past 30 years, people have accepted the idea that the products they buy and use in everyday life can be produced and assembled on another continent, where the cost of labor is lower and the labor and business laws are less restrictive &#8211;today nobody expects a toy to be produced anywhere but in China.  Soon people will be insensitive to the idea that their software is designed and produced in India or China, especially if it is embedded in ubiquitous hardware like cell phones, game consoles, or digital cameras.  It will even be less of a question with web-based applications, where cloud computing and distributed data centers make physical location irrelevant.</p>
<p>Software offshoring results from a natural evolution of the industry.  Like for so many other industries, complexity required a more organized production process.  Software development evolved from a highly specialized, hand-crafted process, to an application-driven, methodology-centric, maintenance-heavy operation.  The availability of skilled labor and software development methodologies open the door to outsourcing, then offshoring.  For long held as a high-intellectual product that could only be conceived in a handful of countries in the western world, software can now be designed, produced, and maintained in any place that have access to highly educated engineers, with a relatively simple infrastructure –computers and fast internet connections.  One should rejoice to the idea of an industry that can be established anywhere innovation has the opportunity of blossoming, as opposed to a monopoly held by a few companies in a couple of countries.</p>
<p>Indeed, outsourcing of intangible products –e.g., service, consulting, design— and BPO (Business Process Outsourcing) which started in the early 80’s, got a huge boost in the 90’s.  With the tech and telecommunication bubble, massive investments in submarine cables for intercontinental high-bandwidth communication were done.  Running from Europe to India via Egypt, hub centers in Bangalore, New  Delhi, Hyderabad, Chennai, Pune, and Mumbai saw their capacity increase dramatically.  Soon real-time and reliable data exchange via the internet made high-tech outsourcing a reality.  After being a BPO bonanza, Bangalore quickly emerged as the Indian Silicon Valley.  Hundreds of software and hardware design companies set foot there, first with help centers and QA engineering, then with HW/SW supporting development teams, to finally complete design and development entities.  Other countries have developed huge HW/SW outsourcing businesses –China, Philippines, and Malaysia, to name a few, as well as some east-Europe countries.</p>
<p>Today, the cost of a software developer in India is about a third to a fourth than in the US –the figure varies depending on the industry, and it becomes cheaper as the experience and complexity requirements decrease.  In China, it will cost about a fifth to a tenth –very dependent on the industry domain, as well as the location in China.  Major US and European high-tech companies like Oracle, ST, Intel, Adobe, SAP, IBM, Microsoft, Google, Yahoo!, have very large R&amp;D campuses in India and China.  For many, their facilities in India are the biggest outside of the US.  For some, most of the R&amp;D growth is seen outside of the US/Europe.  It is not rare to see successful high-tech companies, created in the Silicon Valley 10+ years ago, but with 90% of their R&amp;D today outsourced in Asia.</p>
<p>As a consumer, pretty much nobody complains about outsourcing: in today’s world of rapid consumption of electronic gadgets and complex software, one needs to spin new products at a rapid pace and for an ever more competitive price.  However, offshoring costs jobs at home, which eventually translate to lower disposable incomes and additional social costs, both negatively impacting the local economy.  The creation of wealth in the host country has a side effect though: increasing the disposable income abroad creates new customers for the home business, thus at the end everybody may benefit from it.  This is a more positive scenario, probably true in the long run, but the lag between the disappearance of an activity and its replacement with another comes with a significant social cost.</p>
<p>Also we have seen offshoring displace industries entirely, and the intellectual-content of the displaced industry keeps increasing:  there is virtually no textile industry in Europe and in the US, and UK’s manufacturing industry is trailing in Europe.  The usual response to these displacements is that more lucrative activities replace those that moved abroad.  London has long promoted the dismantlement of its manufacturing industry via offshoring as a chance to move to a service and finance fueled economy, which produces a higher added-value.  But with the recent economic downturn driven by the finance industry, one cannot help though but question the soundness of that claim.</p>
<p>At the end, software outsourcing is here to stay: there is too much to gain for the home companies and the host countries, and the low cost of the infrastructure makes it flexible and easy to extend.  On Sand Hill it is common to hear VCs asking “What is your Indian strategy?”.  Some startups in the Bay Area even start from day one will a full software development team established in India, with just the executives, sales and support located in the US.  Needless to say, these companies could not thrive or even get started without outsourcing part of their software development.  Since they eventually contribute to the high-tech industry, one should endorse the long-term benefit.</p>
<p>Does that mean that being a SW/HW engineer in the Silicon Valley has become a high-risk job?  Software innovation still relies on individuals with bright ideas for technology and products, thus these individuals will always be in high demand.  But it has certainly become much harder for the general-purpose software developer.  The Silicon  Valley has benefited from a unique highly-educated engineer pool, entrepreneurs, and VC money.  As long as these three components remain, there is no threat in sight.  But if more and more VCs and entrepreneurs start to establish themselves in India, we will see a very serious competitor to the crown of software kingdom.  Software outsourcing will not kill the Silicon Valley.  Lack of innovation will.</p>
<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2009/09/20/software-outsourcing-a-necessary-evil/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

