<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Olivier Coudert&#039;s Blog &#187; quality</title>
	<atom:link href="http://www.ocoudert.com/blog/tag/quality/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ocoudert.com/blog</link>
	<description>My take on tech --and other topics</description>
	<lastBuildDate>Sat, 21 Jan 2012 20:30:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>How to make software deterministic</title>
		<link>http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/</link>
		<comments>http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/#comments</comments>
		<pubDate>Mon, 30 May 2011 17:04:40 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=1247</guid>
		<description><![CDATA[CodeProject A program is deterministic, or repeatable, if it produces the very same output when given the same input no matter how many times it is run. Refining this definition, we should consider whether a program produces the same result on any platform (32 and 64 bits machines, running Windows, Mac OS, Linux, Solaris, etc). [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/">How to make software deterministic</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display:none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a><br />
A program is deterministic, or repeatable, if it produces the very same output when given the same input no matter how many times it is run.</p>
<p>Refining this definition, we should consider whether a program produces the same result on any platform (32 and 64 bits machines, running Windows, Mac OS, Linux, Solaris, etc). Or whether the program is insensitive to the form of its inputs. For example, the problem of generating the shortest route to visit all the capitals of Europe should not depend on how the map of Europe is entered, nor it should depend on which language is used to name the capitals.</p>
<p>Determinism is obviously very desirable. For the user, a non-deterministic program can be confusing and frustrating. For the developer, a non-deterministic program is extremely hard to test and debug, since bugs and specific configuration cannot be easily reproduced.</p>
<p>Repeatability looks like a given for most applications. For instance, if we add two numbers in a spreadsheet, we expect the same result no matter how many times we perform this operation and regardless of the platform we run on (PC, Mac, etc). Or if we run a spell checker several times, we expect it to flag the very same errors.</p>
<p>But it is not that obvious for more complex applications. This is especially true when there are multiple solutions to a problem, or when heuristics are used to produce a result –because an exact solution is too computationally expensive. For example, it is not uncommon to see slightly different outcomes when running the same EDA synthesis or P&amp;R tool on the same input several times.</p>
<p>Even more, a user would like to see the same result when only minor changes are applied to the input. For instances, running a P&amp;R tool on two netlists that differ only by the names of their cells should produce exactly the same result –a P&amp;R tools should produce a result that only depend on the netlist structure. But experience shows that industrial synthesis and P&amp;R tool does not meet that requirement. Closest to software, it is not uncommon to generate slightly different object codes with gcc by changing the names of a few variables.</p>
<p>Among the causes of non-deterministic response, we can distinguish the following types:</p>
<ol>
<li>A <a href="#random">random number</a> generator</li>
<li>Reading an <a href="#uninitialized">uninitialized</a> data</li>
<li>A <a href="#race">race condition</a> on concurrent threads</li>
<li>An <a href="#unordered">unordered iteration</a> that is assumed ordered</li>
<li>A total order that depends on <a href="#memory_address">memory address</a></li>
<li>A total order that depends on <a href="#time_stamp">time stamps</a></li>
<li>A total order that depends on a <a href="#non_canonical_labelling">non-canonical labeling</a></li>
</ol>
<h4><strong><a name="random"></a>1. Random number generator</strong></h4>
<p>There are a lot of applications that use stochastic processes (e.g., simulated annealing, genetic algorithms, Monte-Carlo simulations), but that we would like to be repeatable. Using a pseudorandom number generator with a known seed makes possible to reproduce the same long sequence of seemingly random numbers over and over again.</p>
<p>Note that some applications (e.g., gaming, cryptography, statistical sampling) <em>require</em> a non-deterministic behavior. In that case the seed of the random number generator must be an always-changing value, for example the host’s current time.</p>
<p>There are more deliberate efforts to produce true random values by relying on natural, chaotic events. For instance <a title="Lavarand" href="http://www.lavarnd.org/">Lavarand</a> produces random numbers by hashing the frames of a video stream of lava lamps. <a href="http://www.fourmilab.ch/hotbits/">HotBits</a> generates random bits by timing successive pairs of radioactive decays detected by a Geiger-Müller tube interfaced to a computer. <a title="Random.org" href="http://www.random.org/">Random.org</a> uses variations in the amplitude of atmospheric noise recorded with a normal radio.<strong> </strong><strong> </strong></p>
<h4><strong><a name="uninitialized"></a>2. Uninitialized or random data read</strong></h4>
<p>Initialized data may not exist in languages that have systematic default values and no memory management control, as opposed to high performance languages like C/C++.</p>
<p>Finding and fixing this kind of issues is relatively simple. For instance, tools like <a href="http://www-01.ibm.com/software/awdtools/purify/">Purify</a> and <a href="http://valgrind.org/">Valgrind</a> can report when a C/C++ code reads arbitrary values in memory. To use Purify’s terminology, such errors are UMR (Uninitialized Memory Read), ABR (Array Bound Read, i.e., dereferencing an array outside of its bounds), and FMR (Free Memory Read). These defects all consist in reading some random value in memory.  The code below illustrates some of these errors.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="c++"><small>#include
#include 

int main() {
  bool large;
  int size = 100;

  if (large) {                    // UMR
    size *= 10;
  }

  char* const str    = new char(size);
  char* const middle = str + size/2;
  // Set the end-of-string.
  str[size - 1] = '\0';
  // The loop intends to fill up the string str with 'a'.
  // But this loop is faulty, because *++c is used instead of *c++.
  char* c = str;
  for (int i = 1; i &lt; size; ++i, *++c = 'a');

  printf("%c\n", str[0]);         // UMR
  printf("%c\n", str[1]);         // Ok, will print 'a'.
  printf("%c\n", str[2]);         // Ok, will print 'a'.
  printf("%c\n", str[size - 1]);  // Ok, but will print 'a'
                                  // instead of the expected '\0'.
  printf("%c\n", str[size]);      // ABR
  printf("%lu\n", strlen(str));   // UMR, because we overwrote
                                  // the final '\0' in the loop.
  delete [] str;
  printf("%c\n", *middle);        // FMR

  return 0;
}
</small></pre>
<p>Note that an ABW (Array Bound Write, i.e., writing outside of an array’s bounds), FMW (Free Memory Write), FNH (Freeing Non-Heap memory) and FUM (Freeing Unallocated Memory), although severe bugs also reported by dynamic analysis tools, are not an original source of non-determinism: they consistently reproduce the same bug.<strong> </strong><strong> </strong></p>
<h4><strong><a name="race"></a>3. Thread races</strong></h4>
<p>Thread races are difficult to detect, and fixing them can be very costly. A typical example is when one thread writes a value at some address, and another thread reads the value at that address. Depending on which thread access the address first, the outcome of the program will be different. Two threads performing a non-atomic write at the same address simultaneously results in some unpredictable value.</p>
<p>One can use a mutex to prevent conflicting read/write for non-atomic operations. But racing threads (e.g., who reads/writes first) must be resolved with synchronization, which can be quite complicated. Moreover it can hurt performances.</p>
<h4><strong><a name="unordered"></a>4. Iteration on unordered data</strong></h4>
<p>Iterating data with some random order can make a program non repeatable. This pattern is often encountered, and is easy to fix.</p>
<p>For example an algorithm produces a result via a visitor that assumes a total order. The developer uses an incorrect visitor, which enumerates data in a random order, usually depending on the memory allocation of the data container. E.g., instead of using a <span style="font-family: courier;">std::set</span> as a container, the developer uses a <span style="font-family: courier;">std::hash_set</span> (or a <span style="font-family: courier;">tr1::unordered_set</span> instead of a <span style="font-family: courier;">tr1::ordered_set</span>). Forcing a total order on the data fixes the problem.</p>
<p>Note that the fix may be incomplete if it simply transforms a type (4) non-determinism into a type (5) or (6) non-determinism, which we discuss below.</p>
<h4><strong><a name="memory_address"></a>5. Ordering by pointer value</strong></h4>
<p>This type of non-determinism is extremely common. For instance, a developer uses a <span style="font-family: courier;">tr1::ordered_set</span> as a container of pointers, and feels that the visitor is deterministic. It is indeed deterministic, but only w.r.t. the memory addresses allocated to the data, which depend on factors out of the application’s control.</p>
<p>One way of addressing the problem is to force a specific memory addressing scheme, but that requires a very fine control of the memory allocator and is therefore complicated.  A more common way consists in assigning a unique ID to an object at the time of its creation. The ID can be a 32 bit unsigned integer that is incremented for every new object. A total ordering, independent from the objects’ actual memory addresses, is then obtained from the IDs. It is a simple solution, as long as one can afford the extra 4 bytes for every object. ID-based sorting with no memory penalty can be obtained using custom memory allocators.</p>
<h4><strong><a name="time_stamp"></a>6. Ordering by time stamps</strong></h4>
<p>Note though that the total ID-based ordering described above is exactly the order of creation of the objects. It is no different from an order that depends on time stamps. Therefore two equal sets of inputs that only differ in the order will be visited in a different order, which can lead to different results. This leads us to the type (7) of non-determinism.</p>
<h4><strong><a name="non_canonical_labelling"></a>7. Ordering induced by a non-canonical labeling</strong></h4>
<p>Type (7) non-determinism often goes unrecognized, or is simply ignored. The idea is that as long as the same input is given to a program (but possibly in a different order or form), the output should be the same. If the input can somehow be normalized to a form that captures the notion of “same input”, then the program can be made insensitive to the format of the input. That is of course assuming that the normalization process run time penalty is not too high.</p>
<p>This normalization process is better defined as canonization. Formally, let O be a set of objects, and let EQ be an equivalence relation that captures the notion of “same” on these objects. A function Canon maps an object onto its canonical form, and is such that for any two objects o1 and o2, o1 and o2 are the same (i.e., o1 EQ o2) if and only if Canon(o1) = Canon(o2).</p>
<p>For instance, a set of integers can be represented by a number of containers (a list, an array, a hash set, a binary tree, etc). A canonical form can simply consist in sorting the integers. Two sets that are equal because they contain the very same integers, but that are initially given in different orders and forms, will end up in the same canonical form. Since sorting is an O(n log n) algorithm, this is an efficient canonization.</p>
<p>Canonization can be much more costly. A Boolean function can be represented in many ways, e.g., with a truth table, a Conjunctive Normal Form (CNF), a Disjunctive normal form (DNF), a decision diagram, etc (see below). Boolean function canonization is at least NP-hard, since it solves the satisfiability problem (SAT). In practice this means that Boolean function canonization algorithms have an exponential complexity.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/Boolean-function.png"><img class="aligncenter size-full wp-image-1257" title="Boolean function" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/Boolean-function.png" alt="" width="500" /></a></p>
<p>&nbsp;</p>
<p>Canonization can also be elusive. Let us consider the problem of drawing a graph in some aesthetic way (e.g., such that the nodes are evenly distributed and such that there is a minimum number of crossing edges). One would like the graph to be drawn the very same way, regardless of its representation (adjacency list or adjacency matrix), and regardless of the order the adjacency information is given. For instances the three graphs below, although looking different, are exactly the same, and can be drawn without any edge crossing as shown on the right side.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/graph-drawing-2.png"><img class="size-full wp-image-1259 aligncenter" title="graph drawing 2" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/05/graph-drawing-2.png" alt="" width="500" /></a></p>
<p>&nbsp;</p>
<p>Graph canonization is also known as graph labeling. It is at least as hard as graph isomorphism, one of these rare problems that are in NP but that are not known to be NP-complete or in P. Although all existing graph canonization algorithm have an exponential worst-case complexity, it is believed that graph canonization can be done in polynomial time.</p>
<h4><strong>Conclusion </strong></h4>
<p><strong> </strong></p>
<p>The most common  cause for non-determinism is related to some unreliable data order. The ultimate solution to make a program insensitive to the form of its input is to canonize its input as a pre-processing step. This proves to be a challenging and costly task in some cases. Whenever possible, canonization (or some imperfect normalization) goes a long way to make the application consistently repeatable.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is software quality?</title>
		<link>http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/</link>
		<comments>http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/#comments</comments>
		<pubDate>Sun, 10 Apr 2011 06:21:24 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=1125</guid>
		<description><![CDATA[CodeProject The quality of software is assessed by a number of variables. These variables can be divided into external and internal quality criteria. External quality is what a user experiences when running the software in its operational mode. Internal quality refers to aspects that are code-dependent, and that are not visible to the end-user. External [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/">What is software quality?</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043" rel="tag">CodeProject</a><br />
<a href="http://www.ocoudert.com/blog/wp-content/uploads/2011/04/qualityassurance.jpg"><img class="alignright size-medium wp-image-1156" title="qualityassurance" src="http://www.ocoudert.com/blog/wp-content/uploads/2011/04/qualityassurance-300x200.jpg" alt="" width="300" height="200" /></a>The quality of software is assessed by a number of variables. These variables can be divided into external and internal quality criteria. External quality is what a user experiences when running the software in its operational mode. Internal quality refers to aspects that are code-dependent, and that are not visible to the end-user. External quality is critical to the user, while internal quality is meaningful to the developer only.</p>
<p>Some quality criteria are objective, and can be measured accordingly. Some quality criteria are subjective, and are therefore captured with more arbitrary measurements.</p>
<p>The table below lists the most obvious software quality criteria, as well as some lesser known.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="118"></td>
<td style="text-align: center;" valign="top" width="59">User</td>
<td style="text-align: center;" valign="top" width="63">Developer</td>
<td style="text-align: center;" valign="top" width="95">Measurable</td>
</tr>
<tr>
<td colspan="4" valign="top" width="334">External quality</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#features">features</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#speed">speed</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#space">space</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#network">network usage</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#stability">stability</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#robustness">robustness</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#eou">ease-of-use</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">subjective</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#determinism">determinism</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#compatibility">back-compatibility</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#security">security</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">difficult</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#power">power consumption</a></td>
<td style="text-align: center;" valign="top" width="59">x</td>
<td valign="top" width="63"></td>
<td style="text-align: center;" valign="top" width="95">difficult</td>
</tr>
<tr>
<td colspan="4" valign="top" width="334">Internal quality</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#coverage">test coverage</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">yes</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#testability">testability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">hard</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#portability">portability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#thread">thread-safeness</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">hard</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#conciseness">conciseness</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#maintainability">maintainability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">hard</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#documentation">documentation</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">subjective</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#legibility">legibility</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">subjective</td>
</tr>
<tr>
<td valign="top" width="118"><a href="#scalability">scalability</a></td>
<td valign="top" width="59"></td>
<td style="text-align: center;" valign="top" width="63">x</td>
<td style="text-align: center;" valign="top" width="95">somewhat</td>
</tr>
</tbody>
</table>
<p>By definition the internal quality (code characteristics) is a concern to the developer only, while all the external quality aspects (coming from using the software) are critical to the end user. However the developer has also interests in performances (speed, space, network usage) and determinism, because they make testing the software easier. Developers treat ease-of-use, back-compatibility, security, and power consumption as requirements.</p>
<p>It is important to consider how difficult it is to measure each of these criteria. It can be difficult because there is no simple variable to look at, or because the measurement process is costly, or because it requires a complex infrastructure. For instance, speed has an objective measurement that is easy to measure. Power consumption has a simple measurement (how many µW the application consumes), but it is complex to measure. Security is difficult and costly to estimate.</p>
<p><a name="features"></a> <strong>Features</strong>. This is the very reason for the software to be written: to provide a service. By feature we really mean the output produced by the software –e.g., a numerical result, a string, a screen shot, a web page, an audio, etc&#8211;, regardless of the performances (speed, memory).</p>
<p><a name="speed"></a><strong>Speed</strong>. How quickly does the application provide the service? The user experiences the actual time elapsed between the moment she request the service, and the moment the service is delivered. The real elapsed time, or wall time, is the sum of the CPU time, system time, and network latency. Thus the developer should not only focus on the CPU time (how much time the CPU actually spends on executing the program). The CPU time can easily be overshadowed by disk access (a write on the disk is very costly), swapping (due to an excessive virtual memory size), or time spent by the network (latency issue, or too many round trips).</p>
<p><a name="space"></a><strong>Space</strong>. How much RAM and disk space is taken by the application? The aggregate numbers are important –peak memory, virtual memory size, etc. But even more so, how often do we move data that triggers a CPU cache miss or a disk write, has a dominant impact on the speed of the application. A mediocre data design can lead to very poor performances.</p>
<p><a name="network"></a><strong>Network usage</strong>. It is a matter of bandwidth and latency. Mismanaging sockets and channels can lead to unnecessary extra time spent in opening and closing sockets, handshakes, and round trips. As for memory, caching techniques can be used to reduce consuming network resources.</p>
<p><a name="stability"></a><strong>Stability</strong>. How often does one need to patch the software to fix problems? For the user, this is an inconvenience. For the developer, it means that the code is fragile and might benefit from better testing or partial rewrite.</p>
<p><a name="robustness"></a><strong>Robustness</strong>. How often does the application stale, freeze, or crash? How tolerant is it to extreme conditions –limited CPU and memory/disk/network resources, corner cases, system failure or unresponsive 3<sup>rd</sup>-party resources? This aspect is strongly related to testability and coverage.</p>
<p><a name="eou"></a><strong>Ease-of-use</strong>. It can be a very subjective factor, hard to quantify. It includes user documentation, clarity of the error message, management of exceptions, and recovery after failure.</p>
<p><a name="determinism"></a><strong>Determinism</strong>. Also known as repeatability: does the program produce the very same result given the same input? There are many reasons for which a program can exhibit a non-repeatable behavior. A non-repeatable behavior is confusing and frustrating for the user. This also makes the program very difficult to test and debug. Repeatability is strongly dependent on a good data model design.</p>
<p><a name="compatibility"></a><strong>Back-compatibility</strong>. Can a new version of the application be used with an older version’s data? It is essential to the user, because a new version should not require a costly migration of the existing data.</p>
<p><a name="security"></a><strong>Security</strong>. Who is authorized to access the data? Can the data processed by the application be compromised? This is a crucial aspect of many applications, and it is getting more and more difficult to assess with the dissemination of mobile and web-based software.</p>
<p><a name="power"></a><strong>Power consumption</strong>. It is increasingly important with mobile applications, as a program may have to consider how it manages the device’s power producers and consumers (battery, cores, wireless, screen, audio), and not to rely entirely on the operating system.</p>
<p><a name="coverage"></a><strong>Test coverage</strong>. What is the proportion of code that is executed by some unit or regression test? This is measured by the number of lines, number of functions, and number of control branches that are exercised by the tests. Usually one expects coverage of at least 85% for any moderately complex application. In practice reaching high coverage can be achieved only if testability is high, which has deep implication on the architecture and development methodology.</p>
<p><a name="testability"></a><strong>Testability</strong>. An often overlooked or simply ignored aspect of code development, testability is the ability to trigger any specific line of code or branching condition. Highly testable code requires a discipline of architecture and development that is difficult to find. It very costly to fix poorly testable software, as this requires major redesign. This justifies major investment in software architecture, design, and development methodologies.</p>
<p><a name="portability"></a><strong>Portability</strong>. Can the application run on 32 and 64 bits machines? Should it run on a mobile phone? Does it run on multiple OS (e.g., Windows, Linux, Mac OS-X, Solaris, iOS, Android, RIM)? Does it run smoothly on all web browsers (IE, Firefox, Chrome, Safari, Opera)?</p>
<p><a name="thread"></a><strong>Thread-safeness</strong>. Is a specific component thread-safe? Can two threads collide on non-atomic operations? Can the application get into a deadlock? As concurrency is still mostly the result of a manual process (there no compiler that automatically parallelizes the code), these questions are critical to ensure the good functioning of a program, as well as its performance –it is not rare to see the a program running <em>slower</em> when two many threads are available, as the cost of synchronization can become dominant.</p>
<p><a name="conciseness"></a><strong>Conciseness</strong>. Also known as compactness. Is there any dead code, or duplicated code? Is the code shared and factorized properly? A compact code usually means faster compilation and smaller binary size. Also compactness naturally leads to fewer bugs, because the number of bugs is historically <a href="../2009/10/13/test-driven-design/#kloc_per_defect">constant</a> w.r.t. code size.</p>
<p><a name="maintainability"></a><strong>Maintainability</strong>. How easy it is to debug the code? How fast is it to provide a fix? How quickly can a new developer understand the code? Maintainability is a very important aspect, quite difficult to quantify. Maintainability is increased with good testability and flexible (abstract) design.</p>
<p><a name="documentation"></a><strong>Documentation</strong>. This is a pretty subjective topic. Some people claim that a separate documentation written in plain English is necessary. Some others state that at least 30% of the code should be comments. Some finally argue that the code itself is the best documentation –the names of the types, classes, functions and arguments, together with plenty of assertions.</p>
<p><a name="legibility"></a><strong>Legibility</strong>. Also known as readability. This is another subjective topic. It is about how easy it is to read the code. Guidelines are established to unify the style of the code, so that a developer can easily read code written by another developer. Code guidelines abound, and they go from a small set of directives, to a full set of rules that specify every syntactical aspect of the language. For example, see <a href="http://hem.passagen.se/erinyq/industrial/" rel="nofollow">Industrial Strength C++</a>, <a href="http://www.codingstandard.com/HICPPCM/index.html" rel="nofollow">High Integrity C++</a>, <a href="http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml" rel="nofollow">Google C++ Style Guide</a>, and <a href="http://www.maultech.com/chrislott/resources/cstyle/" rel="nofollow">many</a> <a href="http://www.possibility.com/Cpp/CppCodingStandard.html" rel="nofollow">more</a>.</p>
<p><a name="scalability"></a><strong>Scalability</strong>. How easy it is to extend a feature? Or to add a new one? Or to add extra cores, or increase the size of the cluster the application runs on? Again, this is all about software architecture and anticipating future needs.</p>
<p>Software quality is the result of the user experience. But software quality should not and cannot be a reactive action to external defects. Software quality is built from the ground up, with design and development methodologies, and with a special focus on testability, coverage, and flexibility.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/' rel='bookmark' title='How to write abstract iterators in C++'>How to write abstract iterators in C++</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How to write abstract iterators in C++</title>
		<link>http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/</link>
		<comments>http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 21:18:32 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=859</guid>
		<description><![CDATA[CodeProject When developing in C++, an impeccable API is a must have: it has to be as simple as possible, abstract, generic, and extensible. One important generic concept that STL made C++ developers familiar with is the concept of iterator. An iterator is used to visit the elements of a container without exposing how the [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/">How to write abstract iterators in C++</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a></p>
<p>When developing in C++, an <a href="../2009/10/08/api-design-101/">impeccable API</a> is a must have: it has to be as simple as possible, abstract, generic, and extensible. One important generic concept that STL made C++ developers familiar with is the concept of iterator.</p>
<p>An iterator is used to visit the elements of a container without exposing how the container is implemented (e.g., a vector, a list, a red-black tree, a hash set, a queue, etc). Iterators are central to generic programming because they are an interface between containers and applications. Applications need access to the elements of containers, but they usually do not need to know how elements are stored in containers. Iterators make possible to write generic algorithms that operate on different kinds of containers.</p>
<p>For example, the following code snippet exposes the nature of the container –a vector.</p>
<pre style="color: #000000; background-color: #ffe3c1;" lang="cpp">     void process(const std::vector&lt;E&gt;&amp; v)
     {
         for (unsigned i = 0; i &lt; v.size(); ++i) {
             process(v[i]);
         }
     }</pre>
<p>If we want to have the same function operating on a list, we have to write a separate function. Or if we later decide that a list or a hash set is more appropriate as a container, we need to rewrite the code everywhere we access the vector. This may require a lot of changes in many files. Contrast this container-specific visitation scheme to the following:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template &lt;typename Container&gt;
     void process(const Container&amp; c)
     {
         typename Container::const_iterator itr = c.begin();
         typename Container::const_iterator end = c.end();
         for (; itr != end; ++itr) {
             process(*itr);
         }
     }</pre>
<p>Using the notion of iterator, we have a generic processing of a container ‘c’, whether it is a vector, a list, a hash set, or any data structure that provides iterators in its API. Even better, we can write a generic process function that only takes an iterator range, without assuming that the container has a begin() and end() method:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template &lt;typename Iterator&gt;
     void process(Iterator begin, Iterator end)
     {
         for (; itr != end; ++itr) {
             process(*itr);
         }
     }</pre>
<p>An STL iterator is a commodity that behaves as a scalar type:</p>
<ul>
<li>It can      be allocated on the heap</li>
<li>It can      be copied</li>
<li>It can      be passed by value</li>
<li>It can      be assigned to</li>
</ul>
<p>The essence of an iterator is captured by the following API.</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template &lt;typename T&gt;
     class Itr {
     public:
         Itr();
         ~Itr();
         Itr(const Itr&amp; o);                   <span style="color: #ff0000;">// Copy constructor</span>
         Itr&amp; operator=(const Itr&amp; o);        <span style="color: #ff0000;">// Assignment operator</span>
         Itr&amp; operator++();                   <span style="color: #ff0000;">// Next element</span>
         T&amp;   operator*();                    <span style="color: #ff0000;">// Dereference</span>
         bool operator==(const Itr&amp; o) const; <span style="color: #ff0000;">// Comparison</span>
         bool operator!=(const Itr&amp; o) const { return !(*this == o); }
     }</pre>
<p>Usually the container will provide a begin() and end() method, which build the iterators that denote the container’s range. Writing these begin/end methods is an easy task if the container is derived from a STL container, if the container has a data member that is an STL container, or if the iterator is a scalar type, like a pointer or an index.</p>
<p>It is more complicated if we want iterators that dereference to the same type of object, but that must visit several containers, possibly of different types, or iterators that visit containers in different manners. For instance let us assume that we have objects with some property (say, a color) stored in several containers, some of them of different types. We would like to visit all the objects, independently of the number of containers and their type, or we would like to visit objects of a given color, or we would like to visit objects that satisfy some predicate:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     class E;

     Itr&lt;E&gt; begin(); <span style="color: #ff0000;">// This give the range to visit</span>
     Itr&lt;E&gt; end();   <span style="color: #ff0000;">// all the elements of type E  </span>    

     Itr&lt;E&gt; begin(const Color&amp; color); <span style="color: #ff0000;">// Same as above but only for the</span>
     Itr&lt;E&gt; end(const Coir&amp; color);    <span style="color: #ff0000;">// elements of the given color</span>      

     class Predicate {
     public:
         bool operator()(const E&amp; e);
     };      

     Itr&lt;E&gt; begin(Predicate&amp; p); <span style="color: #ff0000;">// Same as above but only for the</span>
     Itr&lt;E&gt; end(Predicate&amp; p);   <span style="color: #ff0000;">// elements that satisfy the predicate</span></pre>
<p>In this case the iterator is more complex than a scalar type like a pointer or an index: it needs to keep track of which container it is currently visiting, or which color or predicate it needs to check. In general, the iterator may have data members so that it can fulfill its task. Also we want to factorize the code and reuse general purpose iterators’ methods when writing more targeted iterators –e.g., visiting elements of a specific color should make use of the next-element method Itr&lt;E&gt;::operator++(). This can be done by having Itr&lt;E&gt; be a virtual class, and having derived classes to implement the different iterators. For example:</p>
<pre style="color: #000000; background-color: #ffe3c1;">     class E {
     public:
         Color&amp; color() const;
     };      

     template &lt;typename E&gt;
     class ColoredItr&lt;E&gt; : public Itr&lt;E&gt; {
     private:
         typedef Itr&lt;E&gt; _Super;
     public:
         ColoredItr&lt;E&gt;(const Color&amp; color) : Itr&lt;E&gt;(), color_(color) {}
         virtual ~ColoredItr&lt;E&gt;;
         virtual ColoredItr&lt;E&gt;&amp; Operator++() {
            for (; _Super::operator*().color() != color_; _Super::operator++());
            return *this;
         }
     private:
         Color color_;
    };</pre>
<p>We would like a generic iterator that meets all the requirements described above:</p>
<ul>
<li>It can      be allocated on the heap</li>
<li>It can      be copied</li>
<li>It can      be passed by value</li>
<li>It can      be assigned to</li>
<li>It dereferences      to the same type</li>
<li>It can      visit several containers</li>
<li>It can      visit containers of different types</li>
<li>It can      visit containers in arbitrary manners</li>
</ul>
<p>This can be implemented as follows.</p>
<pre style="color: #000000; background-color: #ffe3c1;">     template&lt;typename E&gt;
     class ItrBase {
     public:
         ItrBase() {}
         virtual ~ItrBase() {}
         virtual void  operator++() {}
         virtual E&amp;    operator*() const { return E(); }
         virtual ItrBase* clone() const { return new ItrBase(*this); }
         <span style="color: #ff0000;">// The == operator is non-virtual. It checks that the
         // derived objects have compatible types, then calls the
         // virtual comparison function equal.</span>
         bool operator==(const ItrBase&amp; o) const {
             return typeid(*this) == typeid(o) &amp;&amp; equal(o);
         }
     protected:
         virtual bool equal(const ItrBase&amp; o) const { return true; }
     };      

     template&lt;typename E&gt;
     class Itr {
     public:
         Itr() : itr_(0) {}
         ~Itr() { delete itr_; }
         Itr(const Itr&amp; o) : itr_(o.itr_-&gt;clone()) {}
         Itr&amp; operator=(const Itr&amp; o) {
             if (itr_ != o.itr_) { delete itr_; itr_ = o.itr_-&gt;clone(); }
             return *this;
         }
         Itr&amp;  operator++() { ++(*itr_); return *this; }
         E&amp;    operator*() const { return *(*itr_); }
         bool  operator==(const Itr&amp; o) const {
             return (itr_ == o.itr_) || (*itr_ == *o.itr_);
         }
         bool  operator!=(const Itr&amp; o) const { return !(*this == o); }      

     protected:
         ItrBase&lt;E&gt;* itr_;
     };</pre>
<p>The ItrBase class is the top class of the hierarchy. Itr is simply a wrapper on a pointer to an ItrBase, so that it can be allocated on the heap –the actual implementation of the class deriving from ItrBase can have an arbitrary size. Note how the Itr copy and assignment operators are implemented via the ItrBase::clone() method, so that Itr behaves as a scalar type. Last but not least, the (non-virtual) ItrBase::operator== equality operator first checks for type equality before calling the (virtual) equality method equal on the virtual subclass. The reason ItrBase is not a pure virtual is that it can conveniently be used to denote an empty range, i.e., the range (ItrBase(), ItrBase()) is empty.</p>
<p>Iterators on containers of elements of type E just need to derive from ItrBase&lt;E&gt;, and a factory providing the begin() and end() methods for any specialized iterator returns object of type Itr&lt;E&gt;.</p>
<p>For example, let us assume that we have a container c of E&#8217;s, and that we want an iterator to visit (1) all the elements of c, possibly with repetition; (2) all the elements of c without repetition. This can be done as follows.</p>
<pre style="color: #000000; background-color: #ffe3c1;">    class E;

    class ItrAll : public ItrBase&lt;E&gt; {
    private:
        typedef ItrAll     _Self;
        typedef ItrBase&lt;E&gt; _Super;
    public:
        ItrAll(Container&amp; c) : _Super(), c_(c) {}
        virtual ~ItrAll() {}
        virtual void  operator++() { ++itr_; }
        virtual E&amp;    operator*() const { return *itr_; }
        virtual ItrBase&lt;E&gt;* clone() const { return new _Self(*this); }
    protected:
        virtual bool equal(const ItrBase&lt;E&gt;&amp; o) const {
            <span style="color: #ff0000;">// Casting is safe since types have been checked by _Super::operator==</span>
            const _Self&amp; o2 = static_cast&lt;const _Self&amp;&gt;(o);
            return &amp;c_ == &amp;o2.c_ &amp;&amp; itr_ == o2.itr_;
        }
    protected:
        Container&amp;          c_;
        Container::iterator itr_;
    };     

    class ItrNoRepeat : public ItrAll {
    private:
        typedef ItrNoRepeat _Self;
        typedef ItrAll      _Super;
    public:
        ItrNoRepeat (Container&amp; c) : _Super(c) {}
        virtual ~ItrNoRepeat () {}
        virtual void  operator++() {
            _Super::operator++(); <span style="color: #ff0000;">// Go to the next element then
            // look for an element that has not been visited yet.</span>
            for (; itr_ != c_.end(); _Super::operator++()) {
                E&amp; e = _Super::operator*();
                if (visited_.find(e) == visited_.end()) {
                    visited_.insert(e);
                    return;
                }
            }
        }
        virtual E&amp;    operator*() const { return _Super::operator*(); }
        virtual ItrBase&lt;E&gt;* clone() const { return new _Self(*this); }
    protected:
        virtual bool equal(const ItrBase&lt;E&gt;&amp; o) const { return _Super::equal(o); }
    protected:
        set&lt;E&gt; visited_;
    };     

    <span style="color: #ff0000;">// Build the container’s range w/ and w/o repetition</span>
    Itr&lt;E&gt; begin(Container&amp; c, bool noRepeat = false)
    {
        Itr&lt;E&gt; o;
        if (noRepeat) {
            o.itr_ = new ItrNoRepeat(c);
        } else {
            o.itr_ = new ItrAll(c);
        }
        o.itr_-&gt;itr_ = c.begin();
        return o;
    }     

    Itr&lt;E&gt; end(Container&amp; c, bool noRepeat = false)
    {
        Itr&lt;E&gt; o;
        if (noRepeat) {
            o.itr_ = new ItrNoRepeat(c);
        } else {
            o.itr_ = new ItrAll(c);
        }
        o.itr_-&gt;itr_ = c.end();
        return o;
    }</pre>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2010/07/07/how-to-write-abstract-iterators-in-c/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Twitter sure is a rollercoaster, but going up or down?</title>
		<link>http://www.ocoudert.com/blog/2009/11/24/twitter-sure-is-a-rollercoaster-but-going-up-or-down/</link>
		<comments>http://www.ocoudert.com/blog/2009/11/24/twitter-sure-is-a-rollercoaster-but-going-up-or-down/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 15:53:22 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[social network]]></category>
		<category><![CDATA[advertising]]></category>
		<category><![CDATA[apps]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[mobile]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=524</guid>
		<description><![CDATA[The last 10 days have been pretty interesting to follow in the fast moving world of Twitter. They showed a contrasting (or seemingly so) picture of where the super-hyped company is heading. Let us rewind the last few events [...] Continue reading Twitter sure is a rollercoaster, but going up or down? Related posts: The [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2009/11/24/twitter-sure-is-a-rollercoaster-but-going-up-or-down/">Twitter sure is a rollercoaster, but going up or down?</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/12/02/the-truth-about-twitter-usage/' rel='bookmark' title='The truth about Twitter usage'>The truth about Twitter usage</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/11/01/what-is-twitter%e2%80%99s-next-step/' rel='bookmark' title='What is Twitter’s next step?'>What is Twitter’s next step?</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/01/12/is-twitter-flattening-a-short-answer/' rel='bookmark' title='Is Twitter Flattening? A Short Answer'>Is Twitter Flattening? A Short Answer</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>The last 10 days have been pretty interesting to follow in the fast moving world of Twitter. They showed a contrasting (or seemingly so) picture of where the super-hyped company is heading. Let us rewind the last few events [...]</p>
<p>Continue reading <a href="http://thenextweb.com/2009/11/24/twitter-rollercoaster-alright/" target="_blank">Twitter sure is a rollercoaster, but going up or down?</a></p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/12/02/the-truth-about-twitter-usage/' rel='bookmark' title='The truth about Twitter usage'>The truth about Twitter usage</a></li>
<li><a href='http://www.ocoudert.com/blog/2009/11/01/what-is-twitter%e2%80%99s-next-step/' rel='bookmark' title='What is Twitter’s next step?'>What is Twitter’s next step?</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/01/12/is-twitter-flattening-a-short-answer/' rel='bookmark' title='Is Twitter Flattening? A Short Answer'>Is Twitter Flattening? A Short Answer</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2009/11/24/twitter-sure-is-a-rollercoaster-but-going-up-or-down/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The formal verification market is still untapped</title>
		<link>http://www.ocoudert.com/blog/2009/10/19/the-formal-verification-market-is-still-untapped/</link>
		<comments>http://www.ocoudert.com/blog/2009/10/19/the-formal-verification-market-is-still-untapped/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 16:19:31 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[ASIC]]></category>
		<category><![CDATA[FPGA]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[verification]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=418</guid>
		<description><![CDATA[Functional verification is a major bottleneck in the chip design cycle. Any misstep in closing the functional correctness of a digital system costs millions of dollars in redesign, additional testing, and silicon respins. One can argue at length about its actual cost, but people in the industry usually agree that functional verification takes between 40 [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2009/10/19/the-formal-verification-market-is-still-untapped/">The formal verification market is still untapped</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2010/01/24/has-formal-verification-technology-stalled/' rel='bookmark' title='Has formal verification technology stalled?'>Has formal verification technology stalled?</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/02/21/formal-verification-stalling-take-two/' rel='bookmark' title='Formal verification stalling, take two'>Formal verification stalling, take two</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/04/20/is-fpga-a-sustainable-market-for-eda/' rel='bookmark' title='Is FPGA a sustainable market for EDA?'>Is FPGA a sustainable market for EDA?</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/873609_33942684.jpg"><img class="alignright size-full wp-image-421" title="873609_33942684" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/873609_33942684.jpg" alt="873609_33942684" width="140" /></a>Functional verification is a major bottleneck in the chip design cycle. Any misstep in closing the functional correctness of a digital system costs millions of dollars in redesign, additional testing, and silicon respins. One can argue at length about its <a href="http://www.elsevier.com/wps/find/bookdescription.cws_home/705233/description#description">actual cost</a>, but people in the industry usually agree that functional verification takes between 40 and 70% of a project&#8217;s labor, and about 50% of the total cost. The recent <a href="http://www.eetimes.com/news/design/showArticle.jhtml?articleID=220900541" target="_self">announcement </a>of Synopsys and Freescale to <span>broaden their collaboration to cut IC verification says it all: </span>the two partners intend to manage<span> &#8220;the ever-increasing cost of verification, which can encompass up to 75 percent of the total cost of product development&#8221;.</span></p>
<p>Getting actual figures about the size of the functional verification market proves to be elusive because of the way the products are tied to synthesis license deals, and because of the lack of independent analysts in EDA. Still, the simulation and emulation market of digital systems can be estimated to be at least five times larger than today’s formal verification market. But simulation can only take you so far, so one wonders why formal verification does not have a larger share. Is it because the technology is limited, or because the market is not ready?</p>
<p><strong>Equivalence checking</strong></p>
<p>Equivalence checking (EC) consists of verifying that a netlist implements the behavior specified by a RTL description, or that two netlists are equivalent. Historically, EC is the first industrial formal verification tool brought to the ASIC world. Cadence’s <a href="http://www.cadence.com/products/ld/equivalence_checker/pages/default.aspx">Conformal</a> is still the reference (about 60% of the market), with Synopsys’ <a href="http://www.synopsys.com/tools/verification/formalequivalence/pages/formality.aspx">Formality</a> coming second.</p>
<p>EC’s technology is very mature, but this does not mean no further progress is necessary. Flip-flop matching, the primarily step that consists of determining the pairs of flip-flops that need to be compared, is expected to be done quickly and automatically, with no manual guidance. Datapath verification remains a major challenge, and proving the correctness of merged arithmetic automatically is still an open problem. Last but not least, debugging is a very complicated task. Incremental verification and rectification techniques can be quite useful to help pinpointing the functional issue.</p>
<p><strong>Model checking and property verification</strong></p>
<p>Model checking and property verification are still a fraction of the formal verification market, with many players on the field. There are two obstacles for a larger usage of the approach. The first one is that it can be complicated to write a FSM or property that captures a particular behavior. SVA (System Verilog Assertions), OVL (Open Verification Library), and PSL (Property Specification Language) help in that regard, but they need to be more systematically used in the design community. The second obstacle is that model checking techniques can only solve relatively small problem instances. This is why some go with hybrid verification techniques (read: may be incomplete), like <a href="http://www.synopsys.com/TOOLS/VERIFICATION/FUNCTIONALVERIFICATION/Pages/Magellan.aspx">Magellan</a> or <a href="http://www.mentor.com/products/fv/0-in_fv/">0-in</a>, while other stick with complete formal methods, like <a href="http://www.jasper-da.com/">Jasper</a> and <a href="http://www.onespin-solutions.com/">OneSpin</a>.</p>
<p>Because writing properties can be so complicated, specialized branches grew to address specific needs, as shown below.</p>
<ul>
<li><strong>IP verification</strong>. With SoCs using      IPs from many different sources, verifying the compliance of these IPs with      respect to standard interfaces (e.g., PCI or USB) in the context of the      application is crucial.  Conformal,      with its verification IP portfolio, is in a good position to address the      problem. Also OneSpin is known to have interesting technology in that      space, even though they are not pushing it at the moment.</li>
<li><strong>Timing verification</strong>. Incorrect      timing constraints can lead to missing a target clock cycle, or worse, to a      chip failure. Verifying timing exceptions (false paths and multi-cycle      paths), as well as CDC (Clock-Domain Crossing), has become a center of      attention. It is still unclear how big the market is. However several      discussions with IC design companies led me to believe that verifying a      set of timing exceptions (usually in the order of 10,000 SDC constraints) save      one month work of an engineer. Automation and speed are keys here. <a href="http://www.atrenta.com/">Atrenta</a>, <a href="http://www.realintent.com/">Real Intent</a>, and <a href="http://www.mentor.com/products/fv/0-in-cdc/">0-in</a> propose      interesting solutions in that space.</li>
<li><strong>Power verification</strong>. When doing <a href="../2009/10/05/automated-low-power-design-flow-is-up-for-grab-part-i/#power_gating">power      gating</a>, one needs to verify that the application is powered back up <a href="../2009/10/06/automated-low-power-design-flow-is-up-for-grabs-part-ii/#power_gating_verification">properly</a>.      Integration with UPF or CPF provides the required automation. Conformal and      CPF have an edge in that field.</li>
<li><strong>Sequential clock gating verification</strong>.      Traditional (combinatorial) clock gating is well supported by EC tools.      Sequential clock gating exploits sequential dependencies to derive      additional gating conditions, which can be used to save more dynamic      power. It has been made popular by <a href="http://www.calypto.com/">Calypto</a> &#8211;<a href="http://www.envis.com/">Envis</a> is also proposing a similar      technique at the netlist level. Sequential clock gating correctness cannot      be expressed easily with SVA or OVL without making the verification task      extremely complex, which explained why specialized verification techniques      have been developed.</li>
</ul>
<p><strong>Where formal verification will grow</strong></p>
<p>Formal verification is no longer limited to ASICs: complex systems –SoC, FPGA, and HW/SW co-design— will benefit dramatically from better formal verification techniques if they are deployed adequately.</p>
<p>With the ever-growing size of FPGAs (Altera’s <a href="http://www.altera.com/products/devices/stratix-fpgas/stratix-iv/stxiv-index.jsp">Stratix IV</a> packs 820k logic elements, and Xilinx’ <a href="http://www.xilinx.com/products/virtex6/lxt.htm">Virtex-6</a> has up to 750k logic cells), it is clear that simulation will no longer be sufficient to validate the correctness of programmable logic devices. The need for FPGA EC is real, and this requires complete automation and full support for <a href="http://en.wikipedia.org/wiki/Retiming">retiming</a> –OneSpin’s <a href="http://www.onespin-solutions.com/360ec-fpga.php">360 EC FPGA</a> has shown some competitive solution in that space. Also note that IP verification and timing verification apply to the FPGA designs too. The real question is whether FPGA designers are willing to pay for formal verification tools.</p>
<p>IP verification, and verifying the correctness of a SoC using IPs, is certainly a very strong driver for more sophisticated formal verification solutions. Power verification will become part of the ASIC design flow, as EC is part of the synthesis flow. Timing verification is still looking for its footing in the design flow –one question is the debug environment, which is still relatively limited, e.g., to showing waveforms.</p>
<p>Looking forward, formal verification techniques can be used (and have been used) in other fields than circuit design. Any critical digital system can benefit from formal verification techniques –transportation, medical equipments, security and privacy applications. The automotive industry is one of the most obvious targets. Cars are ubiquitous, they contains more and more electronics (representing about 30% of the end price today), and a functional bug can have very costly <a href="http://www.latimes.com/business/la-fi-toyota-recall18-2009oct18,0,739395.story">consequences</a>. Cars rely on digital systems for anything from optimizing their engine’s efficiency to navigation systems, entertainment, and on-board diagnosis. Soon the intra-vehicle, vehicle-to-vehicle, and vehicle-to-roadside networking will fuel innovative products, driving the needs for fast development and the highest possible level of correctness. The EDA industry is taking notice, and Mentor has certainly taken the <a href="http://www.mentor.com/products/vnd/">lead</a> there. Whether they provide the adequate functional verification framework is another matter.</p>
<p>Formal verification will extend its reach by addressing the hard problems of EC (datapath verification, and retiming for FPGA), by being seamlessly integrated in the synthesis flow (power and timing exception verification), and by providing practical solutions to IP and hybrid HW/SW design verification.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2010/01/24/has-formal-verification-technology-stalled/' rel='bookmark' title='Has formal verification technology stalled?'>Has formal verification technology stalled?</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/02/21/formal-verification-stalling-take-two/' rel='bookmark' title='Formal verification stalling, take two'>Formal verification stalling, take two</a></li>
<li><a href='http://www.ocoudert.com/blog/2010/04/20/is-fpga-a-sustainable-market-for-eda/' rel='bookmark' title='Is FPGA a sustainable market for EDA?'>Is FPGA a sustainable market for EDA?</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2009/10/19/the-formal-verification-market-is-still-untapped/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Test-driven design, a methodology for low-defect software</title>
		<link>http://www.ocoudert.com/blog/2009/10/13/test-driven-design/</link>
		<comments>http://www.ocoudert.com/blog/2009/10/13/test-driven-design/#comments</comments>
		<pubDate>Tue, 13 Oct 2009 12:29:52 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[EDA]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[verification]]></category>

		<guid isPermaLink="false">http://www.ocoudert.com/blog/?p=294</guid>
		<description><![CDATA[CodeProject I wrote earlier about the good practices in designing APIs, which is so important when developing complex software. However one usually does not have the chance to start a product from scratch. This means that more often than ever, a software manager picks up an existing tool with an existing team. Making the tool [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2009/10/13/test-driven-design/">Test-driven design, a methodology for low-defect software</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/08/api-design-101/' rel='bookmark' title='API design 101'>API design 101</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a><br />
I wrote <a href="http://www.ocoudert.com/blog/2009/10/08/api-design-101/" target="_blank">earlier</a> about the good practices in designing APIs, which is so important when developing complex software. However one usually does not have the chance to start a product from scratch. This means that more often than ever, a software manager picks up an existing tool with an existing team. Making the tool more efficient –better QoR, faster runtime, smaller memory footprints, more stability, new features, etc— is made difficult by legacy code, awkward APIs, or plain wrong architecture. What to do then? We usually cannot afford to rewrite all or major parts of the product. Does that mean that we are stuck with an endless cycle of resource-intensive software incremental changes, often creating as many bugs that they are intended to fix?</p>
<p><strong>Defect rate</strong></p>
<p>First I would like to discuss the notion of software reliability and how it evolved over the past 40+ years. A defect causes an invalid behavior of a program with respect to its specification (e.g., incorrect output, performance issue, crash). One of many ways to look at software quality is to estimate its defect rate, i.e., the number of defects per line of code (loc), or more conveniently per 1,000 lines of code (kloc).</p>
<p>The first observation is that the larger the code, the higher its defect rate. It is estimated that the bug rate increases logarithmically with code size.</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/IBM_defect_study.png"><img class="aligncenter" title="IBM defect study" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/IBM_defect_study.png" alt="IBM defect study" align="middle" /></a><br />
Source: <em>Program Quality and Programmer Productivity, Capers Jones, IBM 1977</em></p>
<p>Thus the total number of defects for a specific application can be reduced by the following:</p>
<ol>
<li>Continuous      code factorization (direct loc reduction).</li>
<li>Use of      libraries (which have a reduced bug rate, thanks to the extensive exposure      they receive due to their long lifespan and high usage).</li>
<li>Increase      the expressive power of the programming language (indirect loc reduction).</li>
</ol>
<p>Since the introduction of FORTRAN in 1957, many languages and operating systems have been created and have grown more powerful and sophisticated. What could be typically coded in 10 klocs of FORTRAN can be coded today with less than 5 klocs of C++, and about 3-4 klocs of Java. Raising the level of abstraction of programming languages helps decreasing the total number of defects because it results in smaller programs with a lower defect rate.</p>
<p>Evidently, testing reduces the defect rate. A software powerhouse like Microsoft reports about 10-20 defects/klocs before QA, and claims that the rate drops to 1/kloc in released code. Looking at long lifespan and very critical code, statistic from the Jet Propulsion Laboratory shows that spacecraft software (which is typically only 20 klocs, and must run without interruption for years) reaches 6-10 defects/klocs after 2-5 years of testing. The code developed for the shuttle program is estimated to have less than 0.1 defect/klocs.</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/JPL_defect_data.png"><img class="aligncenter" title="JPL_defect_data" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/JPL_defect_data.png" alt="JPL defect data" /></a><br />
Source: <em>Nikora, Allen P., “Error Discovery Rate by Severity Category and Time to Repair Software Failures for Three JPL Flight Projects”, Software Product Assurance Section, Jet Propulsion Laboratory, November 5, 1991&#8243; </em></p>
<p><a name="kloc_per_defect"></a>Over the past 40 years, independent researches from academia and the private sector have shown that on average an application has a defect rate of 5.5/klocs, regardless of the programming language and the operating system used for development. This looks counterintuitive, since increasing the abstraction level of the programming language reduces the bug rate and the actual size of one specific application. But that progress is neutralized by the ever-increasing size and complexity of the programs, made possible by better software development methodologies and powerful development environments. To put a defect rate of 5.5/kloc in perspective, consider your typical EDA place-and-route product, say 3Mlocs of C/C++, with a likely high turnover rate (i.e., percentage of locs that are modified in every release). You can expect about 16,000 defects…</p>
<p><strong>Test-Driven Design</strong></p>
<p>Now I will present a method that I successfully used for both existing and from-scratch products. It is based on the observation that independently from the quality of the team and the advancement of the tool, the software complexity and the unpredictable evolution of the product makes managing the software quality quite problematic. Think EDA, where customers ask for new capabilities every week and salespeople sell features 6 or 12 months before they are actually developed. It is difficult, if not impossible, to have an upfront, clean, and frozen specification, from which an architecture and a set of APIs can be derived. One needs to change the architecture and the APIs because of new unpredicted features and unforeseen problems, or simply because the software is written in a hurry without the adequate resources &#8211;I have no doubt that most readers will agree on that last point. This creates bloated code with a high defect rate, which result in application with a larger number of bugs.</p>
<p>Test-driven design flips the traditional software development scheme upside-down. In most cases, the software development flow consist of (1) specify the requirements in some language (e.g., English, ML, C++ or Java header files), and (2) iterate a code/test loop until the software reaches a point where it is deemed stable enough to go through a full QA regression release process. This often leads to slow iterations between the release team and the R&amp;D team before the release is fully qualified. Also the essence of the original specification may be lost because there is no concrete way (read: operational semantics) to check whether the released product actually meets its intended requirements.</p>
<p><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/classic_vs_tdd_software_development_flow.png"><img class="aligncenter size-full wp-image-309" title="classic_vs_tdd_software_development_flow" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/classic_vs_tdd_software_development_flow.png" alt="classic_vs_tdd_software_development_flow" width="600" /></a></p>
<p style="text-align: center;"><em>Traditional vs. test-driven software development flow</em></p>
<p>Contrast this with a test-driven design approach. In that methodology, the tests are written <em>before</em> anything else. The goal is to capture the specification with a set of small (positive <em>and</em> negative) unit tests. Then some code is written and run on the unit tests. Some of the tests fail, which lead to further refinement of both the unit tests and the code. This iteration write-test/code/test converges until one cannot design a new test that would break the code. The next step, QA regression release process, can then be carried on.</p>
<p>A few things are important to recognize in a test-driven software development methodology: (1) the spec <em>is</em> the set of unit tests; (2) therefore the release can be validated as meeting the spec; (3) the testing iteration handled by R&amp;D is closed when the unit tests <em>and</em> the code are fully stable, which leads to fewer iterations between the release and R&amp;D teams; and (4) this methodology does not assume anything about the intrinsic quality of the code and the strength of the development team. Indeed this approach can be used on very badly architected code and still lead to substantial improvements.  Also note that the unit tests can be internal, e.g., written in C++ and providing a self-testing mechanism, or more traditional with external data that are fed to the application.</p>
<p><strong>Case studies</strong></p>
<p>Let me give a few concrete examples. A tool I was in charge of contained some legacy code that performed an essential task in EDA: constant propagation (it consists of propagating logic values through a logic network, following basic computation rules, e.g., NOT(0) = 1, AND(0, 1) = 0, and AND(1, 1) = 1). The computational principles are simple, but a good constant propagation system should be lazy, incremental, support undo, may explain to the user why some constant occurs in some part of the network, etc.  This makes the development of the system much more challenging.</p>
<p>The legacy code produced crashes now and then. It was difficult to read, it contained suspicious piece of code to handle corner cases (e.g., multi-driver nets, user-set constants), and it had a poor testing coverage (&lt;50%). I decided to go for a full rewrite with a clean API, and unit tests were developed together with the new code following a TDD methodology. This resulted in 6267 loc of C++, 40% of which being unit tests (click the screenshot of the C++ unit tests below), made of 1415 asserts. That code was release in May 2007, got 3 reported defects until November 2007, and has been without defect since then.</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/screenshot_constant_annot_unit_test.png"><img class="size-full wp-image-298  aligncenter" title="screenshot_constant_annot_unit_test" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/screenshot_constant_annot_unit_test.png" alt="screenshot_constant_annot_unit_test" width="300" /></a></p>
<p>Another example is a C++ template’ized bitwise four-valued simulator, written to match the Verilog semantics. This was done with 8014 loc of C++, including 40% of unit tests, made of 1015 asserts (click the screenshot below: you can recognize the basic four-valued logic truth tables).  The template was self-tested with three different concrete instances of logic representation (on 2-tuples of bool, on strings made of 32 or 64 characters &#8217;0&#8242;, &#8217;1&#8242;, &#8216;x&#8217;, and &#8216;z&#8217;, and finally on an actual logic netlist).  No defect was ever found on the semantics.</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/screenshot_simulator_unit_test.png"><img class="size-full wp-image-299  aligncenter" title="screenshot_simulator_unit_test" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/screenshot_simulator_unit_test.png" alt="screenshot_simulator_unit_test" width="300" /></a></p>
<p>In both these cases, I had the opportunity of rewriting or starting from scratch. What if one has to improve on an existing system too large to be rewritten?</p>
<p>The third example is about a complex feature (sequential clock gating) that at the time had been released 6 months before. The field complained about inconsistencies and erratic behavior, so I decided to apply a TDD methodology to rectify the code. First hurdle, we established a unit test campaign, which consists of describing the spec in terms of unit tests in plain English and sketches. This produced 49 unit tests, as shown below (click to enlarge).</p>
<p style="text-align: center;"><a href="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/seq_clock_gating_unit_test_campaign.png"><img class="aligncenter size-full wp-image-300" title="seq_clock_gating_unit_test_campaign" src="http://www.ocoudert.com/blog/wp-content/uploads/2009/10/seq_clock_gating_unit_test_campaign.png" alt="seq_clock_gating_unit_test_campaign" width="300" /></a></p>
<p>Second hurdle, we proceeded to translate these informal unit test descriptions into elementary RTL descriptions. The idea was that if the code was compliant to the spec, we could predict exactly which optimized netlist it would produce. Third hurdle, a 3<sup>rd</sup> party reviewed these 49 RTL tests, and found that 9 of them were faulty because they did not capture what was specified in the document. Once we fixed these tests came the fourth hurdle: we run the code.</p>
<p>The results were brutal: the code crashed on 3 tests, it synthesized a functionally incorrect netlist in 5 cases, and produced 13 suboptimal results. Overall, 21 failures out of 49 tests, a 43% defect rate! We then went through a 2 weeks iteration of unit test refinement and code fixing with a team that <em>never</em> touched the initial code, to eventually converge on 72 unit tests &#8211;many more than we could think of initially&#8211; and a usable feature.</p>
<p><strong>Conclusion</strong></p>
<p>Test-driven design (TDD) aims at capturing a spec with unit tests, then have some code successfully running these tests. The unit tests are more important than the code itself –any code that passed the unit tests meets the spec&#8211;. TDD initially requires a higher investment: writing unit tests to capture an expected behavior is a complex task, and a 3<sup>rd</sup> party review is needed to validate them. But the effort pays off: eventually the set of unit tests becomes the spec, and can even be used as documentation. Running unit tests is fast, so it dramatically reduces the R&amp;D testing time. Also once a code passes a comprehensive set of unit tests, the risk of iterating from QA back to R&amp;D is reduced. Overall, test-driven design increases code correctness and stability dramatically, even in the presence of a deficient architecture and legacy code.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/08/api-design-101/' rel='bookmark' title='API design 101'>API design 101</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/04/09/what-is-software-quality/' rel='bookmark' title='What is software quality?'>What is software quality?</a></li>
<li><a href='http://www.ocoudert.com/blog/2011/05/30/how-to-make-software-deterministic/' rel='bookmark' title='How to make software deterministic'>How to make software deterministic</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2009/10/13/test-driven-design/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>API design 101</title>
		<link>http://www.ocoudert.com/blog/2009/10/08/api-design-101/</link>
		<comments>http://www.ocoudert.com/blog/2009/10/08/api-design-101/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 13:06:36 +0000</pubDate>
		<dc:creator>Olivier Coudert</dc:creator>
				<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[quality]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://coudert.wordpress.com/?p=247</guid>
		<description><![CDATA[CodeProjectI built up products from scratch several time in my professional life. Usually it starts with a very small engineering team &#8211;sometimes I was the very first member of the team. This is a great opportunity to lay strong foundations for the subsequent software development, because one is in charge of the whole process. But [...] [...]<p>Continue reading <a href="http://www.ocoudert.com/blog/2009/10/08/api-design-101/">API design 101</a></p>
Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a style="display: none;" rel="tag" href="http://www.codeproject.com/script/Articles/BlogFeedList.aspx?amid=6630043">CodeProject</a>I built up products from scratch several time in my professional life. Usually it starts with a very small engineering team &#8211;sometimes I was the very first member of the team. This is a great opportunity to lay strong foundations for the subsequent software development, because one is in charge of the whole process. But one does not always have the chance to start from scratch.</p>
<p>I also worked with already established products, with larger team and millions of lines of already existing code. The typical software management and development project always offers some cumbersome legacy code and API that survive years after years. The reason is not so much that people do not want to fix the problem, but that fixing the problem requires a major product architecture overhaul, which comes to a prohibitive cost. There are striking lessons in failed software architectures, and it all start with API design. I am sharing here my practical experience with  C++ projects, but most of these advices also apply to Java.</p>
<p><strong>Why is API so important?</strong></p>
<p>An API can be a company’s greatest asset: it captures communication and exchange of services in an application. A good API will naturally lead to more reuse, simpler code, and lower maintenance cost. If the API is public, a good API will also capture customers. There are examples of Java libraries that failed to be accepted not because they were inefficient, but because they very poorly designed.</p>
<p>An API can also be a company’s greatest liability: once the service has clients, one can no longer change the API!  Suspending or rewriting an API is very pricey in terms of time and money. In the case of a public API, cost also comes in terms of reputation. A public API is forever: there is only one chance to get it right.</p>
<p><strong>What is a good API?</strong></p>
<p>In today’s object-oriented software, writing an API is providing a service. Thus instead of thinking in terms of implementation and efficiency, one must first think in terms of modules and services: determine the usage model; establish the clients’ needs; and anticipate tomorrow’s needs.</p>
<p>Besides being powerful enough to satisfy the requirements, an API should be designed with two principles in mind:</p>
<ol>
<li><strong>Keep      it simple! </strong>An API must be easy to learn and use, even without documentation. The API must be hard to misuse. Functionality should be easy to explain &#8211;if it is hard to name, it is likely a bad function. Use simple, consistent naming, and the code will read like a prose –Java libraries and STL are good inspirations for naming conventions. The API should be as small as possible: you can always add to an API, but you can never remove. A method should not take more than 3-4 parameters –else wrap the parameters in a class that can be augmented later.</li>
<li><strong>Keep      it abstract!</strong> An API must allow extension for future needs. For example, it should not assume anything about the implementation. It should minimize accessibility to implementation-specific details –an API, once public, <em>will</em> be used, and you do not want to expose the ugly details of a database.</li>
</ol>
<p>In theory, an API should be written before going into some implementation. Gathering requirements is the first step. Requirements must be case-driven, specific, and should be questioned relentlessly until proven to be must-have. The API should then be written in the target language (C++, Java, etc): this will force the development team to make choices, and to keep the API simple and abstract enough –nobody wants to have too much to discuss!  Then the API should be reviewed and made final in a public forum with the two principles above in mind: keep it simple (so that it is easy to support) and abstract (so that it is easy to extend).</p>
<p>An API should be documented, but well-designed APIs are sometimes self-explanatory. An API should answer the following questions about its components.</p>
<ol>
<li>Class:      what does an instance of a class represent?  Is that a singleton class?  Is there a factory?  Who owns the memory?</li>
<li>Method:      what does it do?  What is the      contract between the client and the instance?  Is there any precondition and post-condition?  Is there any side effect?</li>
<li>Parameters:      what do they represent?  Which      information do they carry?  Who own      them?</li>
<li>Exceptions:      who throw exceptions?  What do they      mean?  What to do when catching one?</li>
</ol>
<p><strong>API and performances</strong></p>
<p><strong> </strong></p>
<p>Bad API decision can limit performances. When designing an API, it is good to consider the following rules.</p>
<ol>
<li>Avoid      mutability. If a method returns a      mutable instance, that instance needs to be created somewhere, which raises      the question of memory ownership. Also      mutable classes limit thread-safeness. Use ‘const’ whenever possible.</li>
<li>Avoid      implicit call to copy and assignment operators. This is a waste of resources if you can      use references. Declare these      operators ‘explicit’ or ‘private’ to catch any misuse at compile time.</li>
<li>A factory      is often better than constructors. A      factory has full control on how instances are created and when they should      be released (shared model, garbage collection, save/restore, caching and disk      mirroring, etc). A factory can return      an instance of a sub-class.</li>
<li>Avoid      exposing implementation details. It      may prevent later improvements of a database. Never expose data members of a class,      always use get/set accessors.</li>
<li>Question      the thread-safeness of computational-intensive methods. One day the software may run on a grid or      in a cloud.</li>
<li>Never      compromise the rules above for a small runtime or memory improvement. For the vast majority of the      applications, going a few percents faster is not worth the maintenance      nightmare it can imply.</li>
</ol>
<p><strong>Final word</strong></p>
<p>A good API is a key to produce smaller and simpler code, which makes the product more stable and easier to maintain. Designing a good API is a collaborative effort, and a formal decision process is needed to freeze an API. A good API is hard to write, get your best people on it. And finally, a public API is forever. May these simple rules guide your next project.</p>
<p>Related posts:<ol>
<li><a href='http://www.ocoudert.com/blog/2009/10/13/test-driven-design/' rel='bookmark' title='Test-driven design, a methodology for low-defect software'>Test-driven design, a methodology for low-defect software</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.ocoudert.com/blog/2009/10/08/api-design-101/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

