================================ Tutorial: Analyzing a Subtle Bug ================================ .. toctree:: :hidden: :maxdepth: 2 :reversed: tutorial-subtle .. |nbsp| unicode:: 0xA0 :trim: .. |br| raw:: html
This tutorial assumes that you have already read the `man pages <../index.html#sec-man>`_. It also assumes that you are familiar with condition variables. .. CO<` * :doc:`McMini trace (bufferSize==1)` * :doc:`McMini trace (annotated)` * :doc:`McMini trace (bufferSize==2)` * :doc:`McMini trace (bufferSize==5)` The revised version to be referenced here is found below, and also in the sidebar. The code has been modified slightly to make clear which code refers producer threads, and which refers to consumer threads. In addition, the 'main' routine has been modified to take a single argument, the :code:`bufferSize`. As with all debugging of multi-threaded code, the difficulty lies in the combinatorial explosion of possible executions. So, we look for the simplest case exhibiting the bug. To do this, we add the argument :code:`bufferSize`, instead of accepting the original buffer size of |nbsp| 10. - :doc:`tutorial-subtle.c` If you wish to reproduce the results of this tutorial, then copy the tutorial code, above, into :code:`subtle.c`, and compile it locally. .. note:: The first principle is to use a small test case, to limit the combinatorial explosion of traces (thread schedules) to search. In this case, the bufferSize was originally 10. Our version of the program lowers the bufferSize to 1 in the initialization of allVars in 'main'. After that, we compile the code and run with McMini. --------------------------------------- Analyzing the bug for ``bufferSize==1`` --------------------------------------- .. COMMENT The :option: directive will not find the previous web page. We should probably replicate the option on this page. Next, we turn our attention to tuning the ':option:`-m`' parameter. This is important: too large and it runs forever; too small and the trace with the bug doesn't execute long enough to see the bug. By trial and error, we found ':option:`-m` 18' to suffice to show the bug, although a little large is also fine. When we use ':option:`-m` 30', we find a trace with a bug using 54 transitions. But when reducing this to ':option:`-m` 18', we find a trace with a bug by using only 28 transitions. Notice that McMini identifies two possible lost wakeups in the trace. (See `this web page `_ for a definition of a "lost wakeup".) The trace is printed below: .. sidebar:: Example code and McMini trace output * :doc:`Tutorial code (source)` * :doc:`McMini trace (bufferSize==1)` * :doc:`McMini trace (annotated)` * :doc:`McMini trace (bufferSize==2)` * :doc:`McMini trace (bufferSize==5)` In order to see the transitions that result in deadlock, we do: .. code:: shell ./mcmini -f -m18 -v ./subtle 1 And the final trace is shown here: - :doc:`tutorial-trace1` We examine the trace sequence for this program when it reaches deadlock: .. code:: shell ./mcmini-gdb -f -m18 ./subtle 1 (gdb) mcmini forward end (gdb) set paginsation off (gdb) mcmini printTransitions ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, .. image:: transitions-subtle.png :alt: mcmini transitions at end of subtle From here, we wish to see why Thread |nbsp| 1 executes again during transitions |nbsp| 37 and |nbsp| 38. So, we navigate to just before that transition. To do this, we navigate from transtion |nbsp| 44 to |nbsp| 36, or in other words, `mcmini back 8`. .. code:: shell ./mcmini-gdb -f -m18 ./subtle 1 (gdb) set pagination off (gdb) mcmini forward end (gdb) mcmini back 8 (gdb) mcmini printTransitions ... 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, Now we see that Threads |nbsp| 1 and |nbsp| 2 are both enabled, and the asterisk indicated that the current trace sequence would execute Thread |nbsp| 1 next: .. image:: transitions-middle-subtle.png :alt: mcmini transitions in middle of subtle Since both threads are enabled, we experiment by forcing Thread |nbsp| 2 to continue executing. We replace '1' by '2' repeatedly in the trace sequence. .. code:: shell % ./mcmini -t0 -t' 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,'\ '1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,'\ '2, 2, 2, 2, 2, 2,' ./subtle 1 ... 34. thread 2: pthread_mutex_unlock(mut:3) 35. thread 2: pthread_mutex_lock(mut:4) 36. thread 2: pthread_cond_signal(cond:2) [Lost wakeup? (No thread waiting on cond) ] 37. thread 2: pthread_mutex_unlock(mut:4) 38. thread 2: pthread_mutex_lock(mut:3) 39. thread 2: pthread_mutex_unlock(mut:3) 40. thread 2: pthread_mutex_lock(mut:1) 41. thread 2: pthread_mutex_unlock(mut:1) 42. thread 2: pthread_mutex_lock(mut:2) 43. thread 2: pthread_cond_wait(cond:1, mut:2) (awake -> asleep) 44. thread 1: pthread_mutex_lock(mut:4) 45. thread 1: pthread_cond_wait(cond:2, mut:4) (awake -> asleep) 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, END THREAD PENDING OPERATIONS * thread 0: pthread_join(thr:1, _) [ Blocked ] thread 1: pthread_cond_wait(cond:2, mut:4) (asleep -> awake) [ Blocked ] thread 2: pthread_cond_wait(cond:1, mut:2) (asleep -> awake) [ Blocked ] END ***** Model checking completed! ***** *** DEADLOCK DETECTED *** ... This shows us the deadlock with a simple pattern: Thread |nbsp| 1 repeatedly executes until reaching a "Lost wakeup", and then Thread |nbsp| 2 repeatedly executes until reaching "Lost wakeup". Next, analyzing the trace, we see that Thread |nbsp| 1 calls `pthread_cond_signal(cond:1)` and Thread |nbsp| 2 calls `pthread_cond_wait(cond:1, mut:2)`. In both cases, they are executed in a critical section enforced by the `mut:2` mutex. Thus, the natural solution is to add a global variable "protected" by `mut:2`, which is initialized to |nbsp| 0, and incremented just before `pthread_cond_signal(cond:1)`. In this case, before executing `pthread_cond_wait(cond:1, mut:2)`, one first tests the global variable. If it is greater than |nbsp| 0, than we decrement, and omit calling `pthread_cond_wait(cond:1, mut:2)`. Otherwise, we decrement *after* calling `pthread_cond_wait(cond:1, mut:2)`. Thus, the "lost wakeup" will never happen by using this global variable as the "count" of an implicit semaphore. Similarly, we see calls to `pthread_cond_signal(cond:2)` and `thread_cond_wait(cond:2, mut:4)`, where both are protected by the `mut:4` mutex. Thus, we can here also create a global variable "protected" by `mut:4`, which acts as the count of an implicit semaphore. After making this modification, there are no more "lost wakeups", and the deadlock no longer occurs. **TODO:** *Implement this solution, and demonstrate it in McMini.* -------------------------------------------------- Analyzing the bug for ``bufferSize==2`` and larger -------------------------------------------------- Next, we have found a bug. But for intellectual curiosity, let's look at how the bug exhibits itself when ``bufferSize==2``. Does the bug now require more lost wakeups, or still the same number? This will give us a good clue why the native execution (with the original '``bufferSize==10``') tends to deadlock only after about 30 |nbsp| seconds. .. sidebar:: Example code and McMini trace output * :doc:`Tutorial code (source)` * :doc:`McMini trace (bufferSize==1)` * :doc:`McMini trace (bufferSize==2)` * :doc:`McMini trace (bufferSize==5)` - :doc:`tutorial-trace2` For '``bufferSize==2``', we see that ':option:`-m` 28' yields the bug of smallest depth. In this deadlock, there are 37 transitions to produce the bug. There are no additional lost wakeups. Instead, the producer now executes much longer during which time the consumer halts just before calling pthread_cond_wait. As before, the consumer finally calls pthread_cond_wait, thereby creating a deadlock. Since ``bufferSize==2`` might take longer, one could consider the McMini verbose mode, to directly observe the trace sequences being searched, in lexicographic order, as follows: .. code:: shell ./mcmini -q -f -m28 -v ./subtle 2 Just for fun, one can click on the sidebar for ``bufferSize==5``, and see a trace. By trial and error, a flag of `-m117` was found to be of sufficient length to show deadlock. A total of 206 |nbsp| traces were examined in 5 |nbsp| seconds. .. code:: shell ./mcmini -q -f -m117 ./subtle 5