<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>LShift Ltd.</title>
	<atom:link href="http://www.lshift.net/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://www.lshift.net/blog</link>
	<description>What happens at LShift</description>
	<lastBuildDate>Wed, 15 May 2013 09:54:56 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Testing Ansible playbooks with Vagrant</title>
		<link>http://www.lshift.net/blog/2013/05/15/testing-ansible-playbooks-with-vagrant</link>
		<comments>http://www.lshift.net/blog/2013/05/15/testing-ansible-playbooks-with-vagrant#comments</comments>
		<pubDate>Wed, 15 May 2013 09:54:56 +0000</pubDate>
		<dc:creator>Frank Shearar</dc:creator>
				<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1766</guid>
		<description><![CDATA[One of the projects here at LShift uses ansible to configure its EC2 machines. I needed to refactor the playbook in a minor way. But of course &#8220;refactor&#8221; doesn&#8217;t just mean &#8220;change the code&#8221;. It means &#8220;change the code (presumably for the better) while preserving behaviour&#8220;. I really didn&#8217;t want to check the change by [...]]]></description>
			<content:encoded><![CDATA[<p>One of the projects here at LShift uses <a href="http://ansible.cc/">ansible</a> to configure its EC2 machines. I needed to refactor the playbook in a minor way. But of course &#8220;refactor&#8221; doesn&#8217;t just mean &#8220;change the code&#8221;. It means &#8220;change the code (presumably for the better) <em>while preserving behaviour</em>&#8220;. I really didn&#8217;t want to check the change by running the playbook against a production deployment! So what to do?</p>

<p><span id="more-1766"></span></p>

<p>Handily, <a href="http://www.vagrantup.com/">vagrant</a> supports ansible, so I can run a playbook from scratch simply by saying

<code>vagrant up</code>

. I had to smooth out some wrinkles: EC2 Ubuntu machine have an &#8216;ubuntu&#8217; user, while Vagrant Ubuntu boxes have a &#8216;vagrant&#8217; user. EC2 Ubuntu machines have a

<code>/dev/xvdb</code>

device, Vagrant boxes don&#8217;t.</p>

<p>And thus, this little shim playbook came to be:</p>

<pre>

<div class="codecolorer-container yaml twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="yaml codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: green;">- hosts</span><span style="font-weight: bold; color: brown;">: </span>target<span style="color: green;"><br />
&nbsp; user</span><span style="font-weight: bold; color: brown;">: </span>vagrant<span style="color: green;"><br />
&nbsp; sudo</span><span style="font-weight: bold; color: brown;">: </span>True<br />
<span style="color: #007F45;"><br />
&nbsp; tasks</span>:<span style="color: green;"><br />
&nbsp; &nbsp; - name</span><span style="font-weight: bold; color: brown;">: </span>Create fake block device for /dev/xvdb, simulating ephemeral storage<span style="color: green;"><br />
&nbsp; &nbsp; &nbsp; action</span><span style="font-weight: bold; color: brown;">: </span>file src=/dev/ram1 path=/dev/xvdb mode=0660 owner=root group=disk state=link<br />
<span style="color: green;"><br />
&nbsp; &nbsp; - name</span><span style="font-weight: bold; color: brown;">: </span>Create fake block device for /dev/xvdf, simulating an EBS volume<span style="color: green;"><br />
&nbsp; &nbsp; &nbsp; action</span><span style="font-weight: bold; color: brown;">: </span>file src=/dev/ram2 path=/dev/xvdf mode=0660 owner=root group=disk state=link<br />
<br />
&nbsp; &nbsp; <span style="color: blue;"># As yet there's no ansible module for making a file system [1]. We try, and if mkfs fails</span><br />
&nbsp; &nbsp; <span style="color: blue;"># we presume it's because there's already a file system on that device, and eat the</span><br />
&nbsp; &nbsp; <span style="color: blue;"># failure.</span><br />
&nbsp; &nbsp; <span style="color: blue;">#</span><br />
&nbsp; &nbsp; <span style="color: blue;"># [1] https://github.com/ansible/ansible/pull/2776</span><span style="color: green;"><br />
&nbsp; &nbsp; - name</span><span style="font-weight: bold; color: brown;">: </span>Put a filesystem on block devices to emulate EBS volumes<span style="color: green;"><br />
&nbsp; &nbsp; &nbsp; action</span><span style="font-weight: bold; color: brown;">: </span>shell mkfs -t ext4 /dev/$item ; true<span style="color: #007F45;"><br />
&nbsp; &nbsp; &nbsp; with_items</span><span style="font-weight: bold; color: brown;">:<br />
</span> &nbsp; &nbsp; &nbsp; &nbsp;- xvdb<br />
&nbsp; &nbsp; &nbsp; &nbsp; - xvdf<br />
<span style="color: green;"><br />
&nbsp; &nbsp; - name</span><span style="font-weight: bold; color: brown;">: </span>Create the ubuntu user<span style="color: green;"><br />
&nbsp; &nbsp; &nbsp; action</span><span style="font-weight: bold; color: brown;">: </span>user name=ubuntu state=present groups=admin<br />
<span style="color: green;"><br />
&nbsp; &nbsp; - name</span><span style="font-weight: bold; color: brown;">: </span>Make the ubuntu user an sudoer<span style="color: green;"><br />
&nbsp; &nbsp; &nbsp; action</span><span style="font-weight: bold; color: brown;">: </span>lineinfile dest=/etc/sudoers regexp=<span style="color: #CF00CF;">&quot;^ubuntu&quot;</span> line=<span style="color: #CF00CF;">&quot;ubuntu ALL=(ALL) ALL&quot;</span> state=present<br />
<span style="color: green;"><br />
&nbsp; &nbsp; - name</span><span style="font-weight: bold; color: brown;">: </span>Make /home/ubuntu/.ssh/<span style="color: green;"><br />
&nbsp; &nbsp; &nbsp; action</span><span style="font-weight: bold; color: brown;">: </span>file path=/home/ubuntu/.ssh/ state=directory<br />
<br />
&nbsp; &nbsp; <span style="color: blue;"># This is not as lame as it looks. The real playbook will run as the 'ubuntu' user,</span><br />
&nbsp; &nbsp; <span style="color: blue;"># and this is a test environment.</span><span style="color: green;"><br />
&nbsp; &nbsp; - name</span><span style="font-weight: bold; color: brown;">: </span>Copy the ansible key from the vagrant user to the ubuntu user<span style="color: green;"><br />
&nbsp; &nbsp; &nbsp; action</span><span style="font-weight: bold; color: brown;">: </span>shell cp /home/vagrant/.ssh/authorized_keys /home/ubuntu/.ssh/authorized_keys creates=/home/ubuntu/.ssh/authorized_keys<br />
<span style="color: green;"><br />
&nbsp; &nbsp; - name</span><span style="font-weight: bold; color: brown;">: </span>Fix permissions of same<span style="color: green;"><br />
&nbsp; &nbsp; &nbsp; action</span><span style="font-weight: bold; color: brown;">: </span>file path=/home/ubuntu/.ssh/ owner=ubuntu group=ubuntu mode=0600 state=directory<span style="color: green;"><br />
&nbsp; &nbsp; &nbsp; action</span><span style="font-weight: bold; color: brown;">: </span>file path=/home/ubuntu/.ssh/authorized_keys owner=ubuntu group=ubuntu mode=0600 state=file<br />
<span style="color: green;"><br />
- include</span><span style="font-weight: bold; color: brown;">: </span>therealplaybook.yml</div></div>

</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/05/15/testing-ansible-playbooks-with-vagrant/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>iOS Enterprise Program: Distribution certificates</title>
		<link>http://www.lshift.net/blog/2013/05/08/ios-enterprise-program-distribution-certificates</link>
		<comments>http://www.lshift.net/blog/2013/05/08/ios-enterprise-program-distribution-certificates#comments</comments>
		<pubDate>Wed, 08 May 2013 09:49:47 +0000</pubDate>
		<dc:creator>Martin Eden</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[iOS]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1759</guid>
		<description><![CDATA[To develop apps for iOS you need an Apple Developer Account. There is the normal developer program which costs $99/year and allows you to distribute apps publically through the app store. However, there is also the Enterprise account which costs $300/year. With the Enterprise account (which only businesses can apply for) you are unable to [...]]]></description>
			<content:encoded><![CDATA[<p>To develop apps for iOS you need an Apple Developer Account. There is the normal developer program which costs $99/year and allows you to distribute apps publically through the app store. However, there is also the Enterprise account which costs $300/year. With the Enterprise account (which only businesses can apply for) you are unable to distribute via the app store, but you can distribute to devices belonging to your organisation directly (&#8220;in-house&#8221;), without passing through Apple&#8217;s approval process, and without having to deal with ad-hoc provisioning.</p>

<p>A client of ours has an Enterprise Account and I was tasked with using their account to distribute the app for testing. To do this I needed to create a Distribution certificate (as opposed to a Development certificate). However, the option was greyed-out &#8211; disabled. What to do?
<span id="more-1759"></span></p>

<p>Eventually I tried Apple Support. They initially couldn&#8217;t figure out what was wrong either. They suggested that I didn&#8217;t have enough permissions (I did) or that the option was disabled because Enterprise Accounts can&#8217;t distribute through the app store &#8211; but they then confirmed that I needed that option to create an in-house app as well. After a few days I received the following email:</p>

<blockquote>After discussing the issue with our specialists, I confirmed that there can only be two distribution certificates at one time. In order to create a new one, you would have to revoke one of the existing ones.</blockquote>

<p>I presume that somewhere buried in the documentation this is mentioned, but certainly googling and my attempts to scan through the documentation from beginning to end didn&#8217;t turn anything up. So here for your googling needs:</p>

<p>Why is the option &#8220;App Store and Ad Hoc&#8221; disabled? It is because iOS Enterprise Developer accounts have a limit of two distribution certificates at a time.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/05/08/ios-enterprise-program-distribution-certificates/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Emacs versus Vim</title>
		<link>http://www.lshift.net/blog/2013/04/28/emacs-versus-vim</link>
		<comments>http://www.lshift.net/blog/2013/04/28/emacs-versus-vim#comments</comments>
		<pubDate>Sun, 28 Apr 2013 22:02:10 +0000</pubDate>
		<dc:creator>Frank Shearar</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Water cooler]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1745</guid>
		<description><![CDATA[Here at LShift we take our programming pretty seriously. Which is why we now warm up properly before discussing important topics like static versus dynamic typing, tabs versus spaces and other such crucial aspects of our craft.



An Emacs operator preparing for a discussion with a vim user about text editors.
]]></description>
			<content:encoded><![CDATA[<p>Here at LShift we take our programming pretty seriously. Which is why we now warm up properly before discussing important topics like static versus dynamic typing, tabs versus spaces and other such crucial aspects of our craft.</p>

<iframe src="http://www.youtube.com/embed/7ysop5liKP8" width="680" height="383" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>

<p>An Emacs operator preparing for a discussion with a vim user about text editors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/04/28/emacs-versus-vim/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mutant Refactoring powers</title>
		<link>http://www.lshift.net/blog/2013/04/28/mutant-refactoring-powers</link>
		<comments>http://www.lshift.net/blog/2013/04/28/mutant-refactoring-powers#comments</comments>
		<pubDate>Sun, 28 Apr 2013 13:01:04 +0000</pubDate>
		<dc:creator>Ceri Storey</dc:creator>
				<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1737</guid>
		<description><![CDATA[Occasionally, I&#8217;ve found that when doing some refactorings (especially when splitting a class for example), it can be all too easy to include redundant code. Whilst most of the time it&#8217;s possible to eliminate that by a combination of careful inspection and keeping the tests green, I&#8217;ve found that mutation testing tools can greatly automate [...]]]></description>
			<content:encoded><![CDATA[<p>Occasionally, I&#8217;ve found that when doing some refactorings (especially when splitting a class for example), it can be all too easy to include redundant code. Whilst most of the time it&#8217;s possible to eliminate that by a combination of careful inspection and keeping the tests green, I&#8217;ve found that <a href="http://en.wikipedia.org/wiki/Mutation_testing">mutation testing</a> tools can greatly automate the process here.</p>

<p><span id="more-1737"></span>
<p>For a concrete example, I&#8217;m working on a personal <a href="http://en.wikipedia.org/wiki/Spaced_repetition">spaced repetition</a> project, that also serves as an excuse to play with <a href="http://martinfowler.com/bliki/CQRS.html">CQRS</a>. The idea is that rather like <a href="http://ankisrs.net/">Anki</a> each card (fact in Anki terms) can have multiple fields, and questions and answers are templated.</p></p>

<p>In this example, I&#8217;ve split the <a href="https://github.com/cstorey/srsrb/blob/0c72f4194064e6a15c65142ef7b626fad66b5005/lib/srsrb/deck_view.rb">view model repository</a> into two by areas of functionality; one for <a href="https://github.com/cstorey/srsrb/blob/98037ea93686021c375544b327242b571504eb61/lib/srsrb/card_editor_projection.rb">editing cards</a> (where we see N fields per card), and another for <a href="https://github.com/cstorey/srsrb/blob/98037ea93686021c375544b327242b571504eb61/lib/srsrb/review_projection.rb">cards as they would appear whilst reviewing</a> (were you just have a front and a back to the card).</p>

<p>However, there&#8217;s still some cruft here and there and dead code that we&#8217;d like to get rid of. Mutation testing makes it easier to find this, as deductively, if you mutate code that is never called, there should be no changes to the result of your program.</p>

<p>Running <a href="https://github.com/mbj/mutant"><tt>mutant</tt></a> over the new <tt>SRSRB::CardEditorProjection</tt> class, we (eventually) see the following in the final report:</p>

<p style='font-family: monospace'>
<span style='color:#A00'>!!! Mutant alive: rspec:SRSRB::CardEditorProjection#card_for:/Users/cez/projects/srs-rb/lib/srsrb/card_editor_projection.rb:60:235ac !!!</span><br />
<span style='color:#00A'>@@ -1,4 +1,4 @@<br />
</span> def card_for(id)<br />
<span style='color:#A00'>-  cards.get[id]<br />
</span><span style='color:#0A0'>+  cards.get<br />
</span> end<br />
Took: (4.74s)<br />
<span style='color:#A00'>!!! Mutant alive: rspec:SRSRB::CardEditorProjection#card_for:/Users/cez/projects/srs-rb/lib/srsrb/card_editor_projection.rb:60:81c99 !!!</span><br />
<span style='color:#00A'>@@ -1,4 +1,4 @@<br />
</span> def card_for(id)<br />
<span style='color:#A00'>-  cards.get[id]<br />
</span><span style='color:#0A0'>+  cards[id]<br />
</span> end<br />
Took: (4.75s)<br />
<span style='color:#A00'>!!! Mutant alive: rspec:SRSRB::CardEditorProjection#card_for:/Users/cez/projects/srs-rb/lib/srsrb/card_editor_projection.rb:60:f83c4 !!!</span><br />
<span style='color:#00A'>@@ -1,4 +1,4 @@<br />
</span> def card_for(id)<br />
<span style='color:#A00'>-  cards.get[id]<br />
</span><span style='color:#0A0'>+  id<br />
</span> end<br />
Took: (4.71s)<br />
<span style='color:#A00'>!!! Mutant alive: rspec:SRSRB::CardEditorProjection#card_for:/Users/cez/projects/srs-rb/lib/srsrb/card_editor_projection.rb:60:9e7ae !!!</span><br />
<span style='color:#00A'>@@ -1,4 +1,4 @@<br />
</span> def card_for(id)<br />
<span style='color:#A00'>-  cards.get[id]<br />
</span><span style='color:#0A0'>+  nil<br />
</span> end<br />
Took: (4.59s)<br />
<span style='color:#A00'>!!! Mutant alive: rspec:SRSRB::CardEditorProjection#card_for:/Users/cez/projects/srs-rb/lib/srsrb/card_editor_projection.rb:60:61efb !!!</span><br />
<span style='color:#00A'>@@ -1,4 +1,4 @@<br />
</span><span style='color:#A00'>-def card_for(id)<br />
</span><span style='color:#0A0'>+def card_for(s8e207872a38234087817)<br />
</span>   cards.get[id]<br />
 end<br />
Took: (4.76s)<br />
<span style='color:#A00'>!!! Mutant alive: rspec:SRSRB::CardEditorProjection#card_for:/Users/cez/projects/srs-rb/lib/srsrb/card_editor_projection.rb:60:12b0a !!!</span><br />
<span style='color:#00A'>@@ -1,4 +1,4 @@<br />
</span><span style='color:#A00'>-def card_for(id)<br />
</span><span style='color:#0A0'>+def card_for<br />
</span>   cards.get[id]<br />
 end<br />
Took: (4.64s)<br />
</p>

<p>So, <tt>mutant</tt> has found that for all the ways it knows how to mutate the method <tt>#card_for</tt> (renaming parameters, deleting code, &amp;c) the result is that the tests still pass. However, the smoking gun is renaming input parametersーby renaming the method parameter without renaming their usages, it&#8217;s quite clear that this method never actually gets calledーotherwise the tests for that mutation would fail with a <tt>NameError</tt>. And by inspection, we can see it&#8217;s not referenced, so we can easily excise it.</p>

<p>Of course, the original purpose of mutation testing wasn&#8217;t just to find dead code; it&#8217;ll also show you where you have missing tests, or an edge case for a behavior has been missed. For example, it&#8217;s quite informative to run this once with just your unit tests, and then again including your end to end tests, as that will quite clearly highlight any short-cuts you might have (even unwittingly) taken in your unit tests.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/04/28/mutant-refactoring-powers/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How readable are your comments?</title>
		<link>http://www.lshift.net/blog/2013/04/27/how-readable-are-your-comments</link>
		<comments>http://www.lshift.net/blog/2013/04/27/how-readable-are-your-comments#comments</comments>
		<pubDate>Sat, 27 Apr 2013 20:40:22 +0000</pubDate>
		<dc:creator>Frank Shearar</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1735</guid>
		<description><![CDATA[You&#8217;ve done the Right Thing and written extensive class comments, docstrings and the like. But are they really readable?


There&#8217;s a fair amount of research into readability, at least in English. Flesh-Kincaid, SMOG (Simple Measure Of Gobbledygook), Coleman-Liau, Automated Readability Index are among the more well-known such measures. Essentially all these measures do the same thing: [...]]]></description>
			<content:encoded><![CDATA[<p>You&#8217;ve done the Right Thing and written extensive class comments, docstrings and the like. But are they really readable?</p>

<p><span id="more-1735"></span>
<p>There&#8217;s a fair amount of research into readability, at least in English. Flesh-Kincaid, SMOG (Simple Measure Of Gobbledygook), Coleman-Liau, Automated Readability Index are among the more well-known such measures. Essentially all these measures do the same thing: how long are your sentences, and how long are the words you use? Polysyllabic sesquipedalianism, let alone egregious hyperverbosity and prolixity, decreases readability. (CLI score: 34.14) Using short words makes things more readable. (CLI score: 8.5)</p></p>

<p>Flesch-Kincaid and SMOG both suffer from measuring syllables. &#8220;Suffer&#8221;, because syllable-counting in English is not trivial. However, Coleman-Liau and Automated Readability Index just count word length. More amenable to calculation.</p>

<p>Some languages permit comments to be explicitly tied to things: Clojure&#8217;s docstrings, Smalltalk&#8217;s class comments. Given that, let&#8217;s look at using Coleman-Liau on a Smalltalk package&#8217;s class comments. The Coleman-Liau index maps text into a real number that represents the approximate education level required to understand the text, according to the US education system. Thus, a score of 10 represents the reading ability of a Grade 10 student, 14 that of a second year undergraduate, and so on.</p>

<p>Now to do this properly we need to tokenise the text into words and sentences. In a production system we&#8217;d need to be careful: splitting the streams by periods is insufficient because otherwise &#8220;The U.S. Postal Service is slow.&#8221; would parse as three sentences. But in the interests of clarity, we&#8217;ll ignore that: we treat sentences as delimited by periods, and words by spaces.</p> 

<pre>

<div class="codecolorer-container smalltalk twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="smalltalk codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">|<span style="color: #00007f;"> classes cli </span>|<br />
<span style="color: #00007f;">classes</span> <span style="color: #000066; font-weight:bold;">:=</span> <span style="">&#40;</span><span style="color: #0000ff;">PackageInfo</span> named: <span style="color: #7f0000;">'Kernel-Classes'</span><span style="">&#41;</span> classes.<br />
<span style="color: #00007f;">cli</span> <span style="color: #000066; font-weight:bold;">:=</span> <span style="">&#91;</span>:<span style="color: #00007f;">str</span> | | words sentences l s |<br />
&nbsp; &nbsp; <span style="color: #00007f;">words</span> <span style="color: #000066; font-weight:bold;">:=</span> <span style="">&#40;</span><span style="color: #00007f;">str</span> splitBy: <span style="color: #7f0000;">' '</span><span style="">&#41;</span> collect:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="">&#91;</span>:<span style="color: #00007f;">each</span> | each withoutLeadingBlanks withoutTrailingBlanks<span style="">&#93;</span>.<br />
&nbsp; &nbsp; <span style="color: #00007f;">sentences</span> <span style="color: #000066; font-weight:bold;">:=</span> <span style="">&#40;</span><span style="color: #00007f;">str</span> splitBy: <span style="color: #7f0000;">'.'</span><span style="">&#41;</span> collect:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="">&#91;</span>:<span style="color: #00007f;">each</span> | each withoutLeadingBlanks withoutTrailingBlanks<span style="">&#93;</span>.<br />
<br />
&nbsp; &nbsp; <span style="color: #007f00; font-style: italic;">&quot;The original formulation is 0.0588L - 0.296S - 15.8 where<br />
&nbsp; &nbsp; &nbsp; &nbsp;* L is the average number of letters per 100 words and<br />
&nbsp; &nbsp; &nbsp; &nbsp;* S is the average number of sentences per 100 words.<br />
&nbsp; &nbsp; &nbsp; We fold the constant factor into the coefficients to make<br />
&nbsp; &nbsp; &nbsp; the important things clear:<br />
&nbsp; &nbsp; &nbsp; &nbsp;* l measures the average word length, while<br />
&nbsp; &nbsp; &nbsp; &nbsp;* s measures the (reciprocal of the) average sentence length.&quot;</span><br />
&nbsp; &nbsp; <span style="color: #00007f;">l</span> <span style="color: #000066; font-weight:bold;">:=</span> <span style="">&#40;</span><span style="">&#40;</span><span style="color: #00007f;">str</span> select: <span style="color: #7f0000;">#isAlphaNumeric</span><span style="">&#41;</span> size<span style="">&#41;</span> / <span style="">&#40;</span><span style="color: #00007f;">words</span> size<span style="">&#41;</span>.<br />
&nbsp; &nbsp; <span style="color: #00007f;">s</span> <span style="color: #000066; font-weight:bold;">:=</span> <span style="">&#40;</span><span style="">&#40;</span><span style="color: #00007f;">sentences</span> collect: <span style="">&#91;</span>:<span style="color: #00007f;">each</span> | <span style="">&#40;</span><span style="color: #00007f;">each</span> splitBy: <span style="color: #7f0000;">' '</span><span style="">&#41;</span> size<span style="">&#93;</span><span style="">&#41;</span> average<span style="">&#41;</span> / <span style="">&#40;</span><span style="color: #00007f;">words</span> size<span style="">&#41;</span>.<br />
&nbsp; &nbsp; <span style="">&#40;</span><span style="color: #00007f;">5</span>.<span style="color: #00007f;">88</span> * <span style="color: #00007f;">l</span><span style="">&#41;</span> - <span style="">&#40;</span><span style="color: #00007f;">29</span>.<span style="color: #00007f;">6</span> * <span style="color: #00007f;">s</span><span style="">&#41;</span> - <span style="color: #00007f;">15</span>.<span style="color: #00007f;">8</span><span style="">&#93;</span>.<br />
<span style="color: #00007f;">classes</span> collect:<br />
&nbsp; &nbsp; <span style="">&#91;</span>:<span style="color: #00007f;">cls</span> | <span style="">&#123;</span>cls name. <span style="color: #00007f;">cli</span> value: <span style="color: #00007f;">cls</span> instanceSide organization classComment asString<span style="">&#125;</span><span style="">&#93;</span><br />
<br />
<span style="color: #007f00; font-style: italic;">&quot;=&gt; &nbsp; an OrderedCollection(<br />
&nbsp; &nbsp; #(#BasicClassOrganizer -45.400000000000006)<br />
&nbsp; &nbsp; #(#Behavior 8.568979591836733)<br />
&nbsp; &nbsp; #(#Categorizer 18.796997792494484)<br />
&nbsp; &nbsp; #(#Class 10.976666666666667)<br />
&nbsp; &nbsp; #(#ClassBuilder 10.886810551558753)<br />
&nbsp; &nbsp; #(#ClassCategoryReader -0.9633333333333312)<br />
&nbsp; &nbsp; #(#ClassCommentReader -45.400000000000006)<br />
&nbsp; &nbsp; #(#ClassDescription 13.691774891774887)<br />
&nbsp; &nbsp; #(#ClassOrganizer 7.881632653061221)<br />
&nbsp; &nbsp; #(#Metaclass 19.016567834681037))&quot;</span></div></div>

</pre>

<p>(Side note: usually we&#8217;d say

<code>cls comment</code>

. However,

<code>ClassDescription >> #comment</code>

returns a template encouraging the reader to fill in the blanks, in the event of there being a missing class comment. That would throw out our calculations, so we route around the helper and go directly to the source of the comments.)</p>

<p>But look at that first result:

<code>BasicClassOrganizer</code>

&#8217;s class comment is apparently readable by someone not even born yet! Of course, that&#8217;s because that class has <em>no</em> comment. That&#8217;s handy information in itself, although we&#8217;d be better off filtering those classes out and treat them separately.</p>

<p>So what about languages other than English? I&#8217;ve seen work on a <a href="http://dl.acm.org/citation.cfm?doid=991719.991771">Japanese readability index</a> and a <a href="http://www.cse.cuhk.edu.hk/~king/PUB/thesis_tplau.pdf">Chinese one</a>. Applying Coleman-Liau and ARI to agglutinative languages like isiXhosa would probably not work: such languages have longer words than English, and less words in a sentence: &#8220;Indoda iyambona umntwana&#8221; has a CLI of around 17.5, indicating a sentence well outside the grasp of an adult without extensive tertiary education. The English translation, &#8220;The man sees the child&#8221;, has a CLI of -0.5!</p>

<p>It appears that English has had by far the most research into readability. Or of course the internet hasn&#8217;t overcome its English bias yet.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/04/27/how-readable-are-your-comments/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Auto-generating LShift blog posts</title>
		<link>http://www.lshift.net/blog/2013/04/26/auto-generating-lshift-blog-posts</link>
		<comments>http://www.lshift.net/blog/2013/04/26/auto-generating-lshift-blog-posts#comments</comments>
		<pubDate>Fri, 26 Apr 2013 07:37:02 +0000</pubDate>
		<dc:creator>John Wright</dc:creator>
				<category><![CDATA[Clojure]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1707</guid>
		<description><![CDATA[I've often found myself at a loss for blog post topics, so rather than write one myself I decided to let a computer do the heavy lifting!

<a href="http://en.wikipedia.org/wiki/Markov_chain">Markov chains</a> offer a neat trick for generating surrealist blog oeuvres. They work by figuring out the probability of one word appearing after another, given a suitable corpus of input material.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve often found myself at a loss for blog post topics, so rather than write one myself I decided to let a computer do the heavy lifting!</p>

<p><a href="http://en.wikipedia.org/wiki/Markov_chain">Markov chains</a> offer a neat trick for generating surrealist blog oeuvres. They work by figuring out the probability of one word appearing after another, given a suitable corpus of input material.</p>

<p><span id="more-1707"></span></p>

<p>The meat of the algorithm is surprisingly simple. Given a sequence of tokens (words and punctuation characters), you build a mapping between tokens and the frequencies of the  tokens that appear one step to their right.</p>

<p>To give it a whirl, clone the <a href="https://github.com/johnwright/blog-o-matic">GitHub repo</a> and make sure you have <a href="https://github.com/technomancy/leiningen">Leiningen</a> installed. Here&#8217;s a fairly noddy example:</p>

<pre>
$ lein repl
...
user=&gt; (use 'blog-o-matic.core)
nil
user=&gt; (build-frequency-map ["the" "cat" "sat" "on" "the" "mat"])
{"on" {"the" 1}, "sat" {"on" 1}, "cat" {"sat" 1}, "the" {"mat" 1, "cat" 1}
</pre>

<p>Using the frequency map, you derive a new stream of tokens from a particular starting token and following the trail by making a weighted random choice among the available next tokens. The <tt>random-next-token</tt> function takes care of this. There&#8217;s also some helpers for stitching the tokens back together and spitting out sentences. Here&#8217;s a sample run based on the last twenty blog posts:</p>

<pre>
user=&gt; (require '[blog-o-matic.scrape :as scrape])
nil
user=&gt; (def posts (take 20 (scrape/fetch-posts)))
#user/posts
user=&gt; (def posts-tokenised (map scrape/tokenise posts))
#user/posts-tokenised
user=&gt; (def freqs (reduce build-frequency-map {} posts-tokenised))
#user/freqs
user=&gt; (first (random-sentences freqs))
"With github-differ, I won’t regret it to serve as mentioned on BitBucket."
user=&gt; (apply str (interpose " " (take 3 (random-sentences freqs))))
"Partly that stuff, and relatedly, by MongoDB on a sense therefore that spits
out the above. Android users won’t just inserted into the rub: The app wants
your submitter claims to click around to define your table and do. We’re using
a provisional response codes, and take a good work."
</pre>

<p>The devil is always in the details, and Markov chains are sensitive to the peculiarities of the input text. As is so often the case in the software world, 99% of the code is munging data into shape and 1% is a nifty algorithm. LShift blog posts present a challenge because you&#8217;ll find identifiers, magic numbers and great heaving code stanzas slap-bang in the middle of a sentence. Those bits have to be stripped out or the algorithm will get lost down a blind alley of symbology.</p>

<p>You may need to run it a few times before it produces anything giggle-worthy. Automated silliness detection is left as an exercise for the reader&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/04/26/auto-generating-lshift-blog-posts/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fallacies of a Monad</title>
		<link>http://www.lshift.net/blog/2013/04/26/fallacies-of-a-monad</link>
		<comments>http://www.lshift.net/blog/2013/04/26/fallacies-of-a-monad#comments</comments>
		<pubDate>Fri, 26 Apr 2013 07:15:25 +0000</pubDate>
		<dc:creator>John Wright</dc:creator>
				<category><![CDATA[Clojure]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1698</guid>
		<description><![CDATA[DSL based templating sucks! This looks a very short beep-like sound card. Let paragraphs rely on a sense of data. Roy recently released my mind: In practice of course, it grew features.]]></description>
			<content:encoded><![CDATA[<p>
DSL based templating sucks! This looks a very short beep-like sound card. Let paragraphs rely on a sense of data. Roy recently released my mind: In practice of course, it grew features.
</p>

<p>
Our first local policy at LShift although we’ve had a Meteor is necessary to distinguish this point though, we want to and hence won’t discuss. Empty is the dark ages trying to wind position when your job to generate input data API calls to understand this blog post. Hello, add our socket buffer, etc.
</p>

<p>
I’m glad I have addressed your submitter claims to work out what if we just the two commits! You especially love peer review. Perhaps it’s about the motherboard. Success!
</p>

<p><span id="more-1698"></span>
<p>
Some quick googling suggests there’s no permissions. Last month we leverage the language used between the bottom: Let’s mirror the same process.
</p></p>

<p>
The above the collections that empty dictionaries for escape from our model when you ever at which is primarily a matter of different outcomes sets of the random from a very much of the core operations on my Debian fan speed for new Android compatibility, sometimes parsers accept the working on Monday morning to encode being used by making it also like this product of Programming Languages as the news a programmer must either use spatial logic and quiet case being a method should work with the top level timer wheels.
</p>

<p>
WebDriver’s high-level library for remote service. Never Mind the value. Programming in. Unfortunately, since I did I wanted the parent right, let’s look at it. Entity Framework as well.
</p>

<p>
N-dimensional options as represented by receiving a Raspberry Pi! Monticello versions in. Imagine we’re at next time. Create button press.
</p>

<p>
YOUR CPU should have been researched for the Blog with the most convenient because we just being played with unnatural amounts of subscribers is the first ingredient we don’t care, We write some point, ignorant users and they’re in the drive.
</p>

<p>
HTML-formatted notification hooks compared to get the trunk model the wrapper functions or maybe I’m not bored or may find just starting point will all; try CorePy code is a structure produces. Objects remain.
</p>

<p>
RegCreateKey Operation is started before Jersey and are correctly, on a small program that method as well as JSON without cycles in a notification channel on Postgres, and look back asking for no harm to use packages together with Meteor Angular Leaderboard demo into an arbitrary number of the DoSomething message.
</p>

<p>
Network protocols live in the misfortune of all in most protocols often another word; after all of sense to rely on Spring Jersey Start Meteor in trait that fixes it with watch this code whenever Meteor data?
</p>

<p>
HEAD of the same answer: Aha!
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/04/26/fallacies-of-a-monad/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Application-level change logging with EntityFramework</title>
		<link>http://www.lshift.net/blog/2013/04/22/application-level-change-logging-with-entityframework</link>
		<comments>http://www.lshift.net/blog/2013/04/22/application-level-change-logging-with-entityframework#comments</comments>
		<pubDate>Mon, 22 Apr 2013 11:12:13 +0000</pubDate>
		<dc:creator>Martin Eden</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1676</guid>
		<description><![CDATA[We have been developing Daylight, which is a records management system for Freedom from Torture, a charity dedicated to the treatment and rehabilitation of survivors of torture. Due to the sensitive nature of the data being recorded, Freedom from Torture wanted a way of tracking every change that was made to records.

The application uses Entity [...]]]></description>
			<content:encoded><![CDATA[<p>We have been developing Daylight, which is a records management system for <a href="http://www.freedomfromtorture.org/">Freedom from Torture</a>, a charity dedicated to the treatment and rehabilitation of survivors of torture. Due to the sensitive nature of the data being recorded, Freedom from Torture wanted a way of tracking every change that was made to records.</p>

<p>The application uses <a href="http://msdn.microsoft.com/en-gb/data/ef.aspx">Entity Framework</a> as an ORM layer over an SQL database. One option would have been to use database triggers to log changes. However, we wanted to track changes not in terms of the database model, but in terms of the application model. This allows us to then easily query the log at a later point using the same application model and make sense of it. We can directly re-hydrate the logs back into C# objects in different recorded states in a strongly typed manner.</p>

<p>The solution is to override the database context object&#8217;s SaveChanges method. However, inside that statement is quite a lot of complexity! I have spent some time over the last few weeks extracting the logging code from the client application and the result is <a href="https://bitbucket.org/MartinEden/frame-log">FrameLog</a>, an open source logging library for Entity Framework.
<span id="more-1676"></span></p>

<p>It is compatible with EF version 5, and will work with both a code-first and database-first approach. The library in both binary and source forms can be used and modified for both commercial and non-commercial works (see <a href="https://bitbucket.org/MartinEden/frame-log/wiki/License">the license</a>).</p>

<p>Links for FrameLog:
<ul>
    <li><a href="https://bitbucket.org/MartinEden/frame-log">Source code, issue tracker, and documentation</a></li>
        <li><a href="https://bitbucket.org/MartinEden/frame-log/downloads">Binaries</a></li>
        <li><a href="https://nuget.org/packages/FrameLog/">NuGet package</a></li>
</ul></p>

<p>One of the features of FrameLog is the HistoryExplorer class, that allows type-safe querying of the logs. Here is an example:</p>

<div class="codecolorer-container csharp twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="csharp codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">class</span> Person<br />
<span style="color: #008000;">&#123;</span><br />
&nbsp; &nbsp; &nbsp;<span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">string</span> Name <span style="color: #008000;">&#123;</span> get<span style="color: #008000;">;</span> set<span style="color: #008000;">;</span> <span style="color: #008000;">&#125;</span><br />
<span style="color: #008000;">&#125;</span><br />
<br />
<span style="color: #0600FF; font-weight: bold;">using</span> <span style="color: #008000;">&#40;</span>var db <span style="color: #008000;">=</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> DatabaseContext<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><br />
<span style="color: #008000;">&#123;</span><br />
&nbsp; &nbsp; var explorer <span style="color: #008000;">=</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> HistoryExplorer<span style="color: #008000;">&lt;</span>ChangeSet, User<span style="color: #008000;">&gt;</span><span style="color: #008000;">&#40;</span>db<span style="color: #008000;">.</span><span style="color: #0000FF;">FrameContext</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span><br />
&nbsp; &nbsp; var changes <span style="color: #008000;">=</span> explorer<span style="color: #008000;">.</span><span style="color: #0000FF;">ChangesTo</span><span style="color: #008000;">&#40;</span>person, p <span style="color: #008000;">=&gt;</span> p<span style="color: #008000;">.</span><span style="color: #0000FF;">Name</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span><br />
&nbsp; &nbsp; <span style="color: #0600FF; font-weight: bold;">foreach</span> <span style="color: #008000;">&#40;</span>var c <span style="color: #0600FF; font-weight: bold;">in</span> changes<span style="color: #008000;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #008000;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; Console<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteLine</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;On {0}, name was changed to {1} by {2}&quot;</span>, c<span style="color: #008000;">.</span><span style="color: #0000FF;">Timestamp</span>, c<span style="color: #008000;">.</span><span style="color: #0000FF;">Value</span>, c<span style="color: #008000;">.</span><span style="color: #0000FF;">Author</span><span style="color: #666666;">&quot;);<br />
&nbsp; &nbsp; }<br />
}</span></div></div>

<p>Note that this example elides some User class that is used for recording who made the changes. Also note that you won&#8217;t just be able to drop in a FrameLog binary and do the above &#8211; there is a little work to be done setting up the database&#8217;s &#8220;FrameContext&#8221;. See <a href="https://bitbucket.org/MartinEden/frame-log/wiki/Home">the documentation</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/04/22/application-level-change-logging-with-entityframework/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Debugging a segfaulting binary without debug symbols</title>
		<link>http://www.lshift.net/blog/2013/03/31/debugging-a-segfaulting-binary-without-debug-symbols</link>
		<comments>http://www.lshift.net/blog/2013/03/31/debugging-a-segfaulting-binary-without-debug-symbols#comments</comments>
		<pubDate>Sun, 31 Mar 2013 23:00:05 +0000</pubDate>
		<dc:creator>alexander</dc:creator>
				<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1647</guid>
		<description><![CDATA[We mostly use memory-safe high level languages at LShift (although we&#8217;ve done the odd embedded systems dev job), but sometimes a bit of systems programming knowhow still comes handy. I had the misfortune of a pure, i.e. no-JNI java program segfaulting on me with Oracle Java 7 in a non-reproducible fashion. I wanted to find [...]]]></description>
			<content:encoded><![CDATA[<p>We mostly use memory-safe high level languages at LShift (although we&#8217;ve done the odd embedded systems dev job), but sometimes a bit of systems programming knowhow still comes handy. I had the misfortune of a pure, i.e. no-JNI java program segfaulting on me with Oracle Java 7 in a non-reproducible fashion. I wanted to find out what exactly the program was up to at the point of the crash.  Helpfully, on fatal errors java will generate a slightly obscurely named file

<code>hs_err_pid${pid}.log</code>

where

<code>${pid}</code>

is the pid your deceased java process run under (the

<code>hs</code>

comes from HotSpot, in case you wonder). This file contains amongst other things a VM stacktrace which will tell you were in C-land things went wrong. </p>

<p>But let&#8217;s jump straight to the chase and open the core dump file like in gdb like so:</p>

<p><span id="more-1647"></span></p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&gt; gdb `which java` ./core<br />
GNU gdb (GDB) 7.1-ubuntu<br />
[...]<br />
Reading symbols from /usr/bin/java...(no debugging symbols found)...done.<br />
[...]<br />
Program terminated with signal 6, Aborted.<br />
#0 &nbsp;0x00007ff378354a75 in raise () from /lib/libc.so.6<br />
(gdb)</div></div>

<p><p>OK, so far we&#8217;ve learned two things: firstly, the program died on SIGABRT (signal 6), which is raised by the abort(3) call, which is amongst other things invoked on failed asserts and secondly that there are no debug symbols in this java binary, hence no source level debugging which will make things more&#8230; interesting.</p>

<p>Some quick googling <a href="http://stackoverflow.com/questions/10498282/how-to-debug-jdk-source">suggests</a> there&#8217;s no quick and easy way to get a debug build for Oracle JDK (yet another reason to use OpenJDK&#8230;).</p>

<p>Undeterred we start out with a look at the b(ack)t(race):</p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(gdb) bt<br />
#0 &nbsp;0x00007ff378354a75 in raise () from /lib/libc.so.6<br />
#1 &nbsp;0x00007ff3783585c0 in abort () from /lib/libc.so.6<br />
#2 &nbsp;0x00007ff377d40455 in os::abort(bool) () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/libjvm.so<br />
#3 &nbsp;0x00007ff377ea0717 in VMError::report_and_die() () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/libjvm.so<br />
#4 &nbsp;0x00007ff377d43f60 in JVM_handle_linux_signal () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/server/libjvm.so<br />
#5 &nbsp;&lt;signal handler called&gt;<br />
#6 &nbsp;0x00007ff3763b837e in ZIP_Read () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/libzip.so<br />
#7 &nbsp;0x00007ff3763b7d5c in Java_java_util_zip_ZipFile_read () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/libzip.so<br />
#8 &nbsp;0x00007ff37052a17e in ?? ()<br />
#9 &nbsp;0x0000000000000000 in ?? ()</div></div>

<p>Right, so it looks like java segfaulted whilst trying to read an entry from a
zipfile. Let&#8217;s look at the relevant stackframe:</p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(gdb) frame 6<br />
#6 &nbsp;0x00007ff3763b837e in ZIP_Read () from /usr/lib/jvm/java-7-oracle/jre/lib/amd64/libzip.so<br />
(gdb) info frame<br />
Stack level 6, frame at 0x7ff361bdf3e0:<br />
&nbsp;rip = 0x7ff3763b837e in ZIP_Read; saved rip 0x7ff3763b7d5c<br />
&nbsp;called by frame at 0x7ff361be14b0, caller of frame at 0x7ff361bdf3b0<br />
&nbsp;Arglist at 0x7ff361bdf3d0, args:<br />
&nbsp;Locals at 0x7ff361bdf3d0, Previous frame's sp is 0x7ff361bdf3e0<br />
&nbsp;Saved registers:<br />
&nbsp; rbx at 0x7ff361bdf3b0, rbp at 0x7ff361bdf3d0, r12 at 0x7ff361bdf3b8, r13 at 0x7ff361bdf3c0, r14 at 0x7ff361bdf3c8,<br />
&nbsp; rip at 0x7ff361bdf3d8</div></div>

<p>OK, at this point the absence of debug symbols starts to make itself painfully felt and I wink over my colleague Jarek who&#8217;s got l00ter GDB skills than myself. Deprived of some nice local variables or source code to look at we can still turn to disassemble the address the instruction pointer (

<code>rip</code>

register) points to and look at things at the ASM level like so:</p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(gdb) disassemble 0x7ff3763b837e</div></div>

<p>But let&#8217;s skip that for now and take a step back. The main thing I&#8217;d actually like to know is what zipfile java was trying to read when it segfaulted. But without debug symbols and thus the ability to inspect local variables and function call args it&#8217;s not obvious how to do so. The first ingredient we need to make progress is the relevant function signature.</p>

<p>Thankfully, Java is GPL&#8217;ed so we could <a href="http://www.java.net/download/openjdk/jdk7/promoted/b147/openjdk-7-fcs-src-b147-27_jun_2011.zip">download the source</a>, but a bit of Googling is quicker, the signature we&#8217;re looking for is:</p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">jint ZIP_Read(jzfile *zip, jzentry *entry, jlong pos, void *buf, jint len);</div></div>

<p><a href="http://code.ohloh.net/search?s=struct%20jzfile">Another quick search</a> tells out what jzfile looks like:</p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">typedef struct jzentry { &nbsp;/* Zip file entry */<br />
&nbsp; &nbsp; char *name; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /* entry name */<br />
&nbsp; &nbsp; jlong time; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /* modification time */<br />
&nbsp; &nbsp; jlong size; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /* size of uncompressed data */<br />
&nbsp; &nbsp; jlong csize; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/* size of compressed data (zero if uncompressed) */<br />
&nbsp; &nbsp; jint crc; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /* crc of uncompressed data */<br />
&nbsp; &nbsp; char *comment; &nbsp; &nbsp; &nbsp; &nbsp;/* optional zip file comment */<br />
&nbsp; &nbsp; jbyte *extra; &nbsp; &nbsp; &nbsp; &nbsp; /* optional extra data */<br />
&nbsp; &nbsp; jlong pos; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/* position of LOC header or entry data */<br />
} jzentry;</div></div>

<p>
Fortuitously the filename is right at the beginning of the struct, which means we could get at it easily if we just had a pointer to the struct instance itself: we just need another pointer dereference (by contrast getting at the content of

<code>char* comment</code>

for example would be more involved, because we would not only need to work out the size of all the preceding entries in the struct but also how much <a href="http://peeterjoot.wordpress.com/2009/11/11/c-structure-alignment-padding/">padding</a> the compiler added for alignment).
</p>

<p><p>But how do we get at the pointer</p>

<code>jzfile *zip</code>

<p>that was passed as the first arg to</p>

<code>Zip_Read</code>

<p>? Well, since I was running this on a 64-bit unix work station, we need to look up the <a href="http://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI">linux calling conventions on amd64</a>. It turns out the first argument it passed in the RDI register. So let&#8217;s have a look at the registers:</p></p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(gdb) info registers<br />
rax &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x7ff361bdf3f0 &nbsp; 140683293619184<br />
rbx &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0xf5 245<br />
rcx &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x7ff361bdf3f0 &nbsp; 140683293619184<br />
rdx &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x0&nbsp; 0<br />
rsi &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x0&nbsp; 0<br />
rdi &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x7ff3641a9d40 &nbsp; 140683333246272<br />
rbp &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x7ff361bdf3d0 &nbsp; 0x7ff361bdf3d0<br />
rsp &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x7ff361bdf3b0 &nbsp; 0x7ff361bdf3b0<br />
r8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0xf5 245<br />
r9 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x7ff361be14d8 &nbsp; 140683293627608<br />
r10 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x7ff37052a104 &nbsp; 140683538243844<br />
r11 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0xeeb4f6e0 &nbsp; 4004837088<br />
r12 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x7ff3641a9d40 &nbsp; 140683333246272<br />
r13 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x0&nbsp; 0<br />
r14 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x7ff361bdf3f0 &nbsp; 140683293619184<br />
r15 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x352f1d8&nbsp; &nbsp; 55767512<br />
rip &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0x7ff3763b837e &nbsp; 0x7ff3763b837e &lt;ZIP_Read+30&gt;<br />
eflags &nbsp; &nbsp; &nbsp; &nbsp; 0x206&nbsp; &nbsp; [ PF IF ]<br />
cs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x33 51<br />
ss &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2b 43<br />
ds &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0&nbsp; 0<br />
es &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0&nbsp; 0<br />
fs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0&nbsp; 0<br />
gs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0&nbsp; 0</div></div>

<p>Right, so we want to inspect the content at

<code>0x7ff3641a9d40</code>

. </p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(gdb) help x<br />
Examine memory: x/FMT ADDRESS.<br />
ADDRESS is an expression for the memory address to examine.<br />
FMT is a repeat count followed by a format letter and a size letter.<br />
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),<br />
&nbsp; t(binary), f(float), a(address), i(instruction), c(char) and s(string).<br />
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).<br />
The specified number of objects of the specified size are printed<br />
according to the format.<br />
<br />
Defaults for format and size letters are those previously used.<br />
Default count is 1. &nbsp;Default address is following last thing printed<br />
with this command or &quot;print&quot;.</div></div>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(gdb) x/a 0x7ff3641a9d40<br />
0x7ff3641a9d40: 0x7ff3641a9e10</div></div>

<p>Alternatively, since we&#8217;re on a 64bit (= 8 bytes) machine, this gives the same
answer:</p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(gdb) x/gx 0x7ff3641a9d40<br />
0x7ff3641a9d40: 0x00007ff3641a9e10</div></div>

<p>Let&#8217;s do the second de-reference:</p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(gdb) x/s 0x00007ff3641a9e10<br />
$4 = 0x7ff3641a9e10<br />
&quot;/home/alexander/.m2/repository/SOME_PROJECT/resources-1.0.jar&quot;</div></div>

<p>Aha! The segfault occured as java was trying to read in <i>SOME_PROJECT</i>&#8217;s
resources jar!</p>

<p>Since gdb can also evaluate simple C expressions with the p(rint) command, we
could also directly have done:</p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">(gdb) p *((char**) 0x7ff3641a9d40)<br />
$4 = 0x7ff3641a9e10 &quot;/home/alexander/.m2/repository/SOME_PROJECT/resources-1.0.jar&quot;</div></div>

<p>Success!</p>

<p>But what if we had indeed wanted to look at</p>

<div class="codecolorer-container text twitlight" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">char* comment</div></div>

<p>instead? The easiest way would probably have been to do let the compiler do the work of computing the offset for us by writing a small program that defines the same struct type and prints out the offset between the two pointers. Or we could have just loaded a debug build of a program that uses that struct as <a href="http://stackoverflow.com/questions/7272558/can-we-define-a-new-data-type-in-a-gdb-session">a bogus symbol table in gdb</a>. At this point though, I&#8217;m glad I only rarely need to leave the comfortable confines of a high level VM these days.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/03/31/debugging-a-segfaulting-binary-without-debug-symbols/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fixing github part 2</title>
		<link>http://www.lshift.net/blog/2013/03/28/fixing-github-part2</link>
		<comments>http://www.lshift.net/blog/2013/03/28/fixing-github-part2#comments</comments>
		<pubDate>Thu, 28 Mar 2013 23:10:41 +0000</pubDate>
		<dc:creator>Jacek Lach</dc:creator>
				<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.lshift.net/blog/?p=1632</guid>
		<description><![CDATA[The fact that github displays all dates in GMT-7 has annoyed me to no end since forever. Based on Frank&#8217;s github-differ, described in his last blog post, I wrote an extension that fixes it.

Without further ado, here it is: https://github.com/jaceklach/github-your-time
]]></description>
			<content:encoded><![CDATA[<p>The fact that github displays all dates in GMT-7 has annoyed me to no end since forever. Based on Frank&#8217;s <a href="https://github.com/frankshearar/github-differ/">github-differ</a>, described in his last <a href="http://www.lshift.net/blog/2013/03/27/enhancing-peer-review-through-github">blog post</a>, I wrote an extension that fixes it.</p>

<p>Without further ado, here it is: <a href="https://github.com/jaceklach/github-your-time">https://github.com/jaceklach/github-your-time</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.lshift.net/blog/2013/03/28/fixing-github-part2/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
