Some More Bitface Thoughts

Something I forgot to mention yesterday was that I thought the “bitface” term was useful not just to refer to people who manipulate bits for a living (or hobby) — programmers, like myself. It can also work to discuss anyone who makes digital content: websites, blogs, podcasts, videos, photos, and so on.

We’re all moving bits around. We’re all labourers at the bitface.

Some Thoughts On Software Development

Before the job interview that I mentioned the other day, the company asked me to answer some questions in writing. I didn’t get the job, but I was pleased with my written answers (and they presumably helped me to get the interview, at least). So I thought I’d reuse them as a blog post. None of this should be surprising for anyone who knows anything about the software development field, but it’s interesting to reflect on how things have changed across my career.

What are some of the fundamental changes in your approach to software development you have adopted in the last few years?

There are two main changes that are fundamental and independent of languages and deployment environments: agile techniques and test-driven development (TDD).

Agile

Moving from waterfall to agile development was probably the most significant change to development practices in the industry. We always knew that breaking work down into smaller units led to better estimating, more modular code, and just better software. The genius of agile was to extend that understanding to the period of time spent on a block of work. A two-week sprint, with its work being specifically estimated, planned and developed, is just infinitely more manageable than a project phase lasting months.

Add to that:

  • self-organising teams which include someone from the customer or end user — or at least someone whose role is to represent the user;
  • accepting that change will happen, and embracing it;
  • and the discipline of saying that some features won’t be developed;

and we have a recipe for success.

TDD

Good developers always understood that testing was essential, and did it. But they used to follow a written test plan, or just have an idea of what needed to be tested and work to that. Testing was manual, hard to repeat, and error-prone.

TDD brought automation. So instead of writing a document listing the required tests, we can write code. That inherently makes the tests rerunnable, so regressions get caught before they become a problem.

But almost more important than that is the idea of writing the tests first. In an ideal world you write a comprehensive set of tests, write functional code until all the tests pass, and you’re done. It may not always work out exactly like that — in particular, adding tests to a mature codebase can be problematic — but writing tests first encourages us to write code that is easy to test, which tends to lead to better-designed, more modular code.

An added bonus is that the tests can help to document the code, by showing our expectations. And of course they make refactoring easy and safe, as long as they are in place before you start.

If you were to start your last project over again, what would you do differently?

The project I’m thinking of involved rewriting the product’s GUI into a modern, responsive, browser-independent form, using HTML 5 and Twitter Bootstrap.

The existing version was an old frames-based web app that only worked fully in Internet Explorer, and had to be tweaked when each new version of that browser came out. We had long wanted to modernise it, but there were always other demands on development time.

Eventually I got a chance to try a proof of concept for the change. The application uses JSPs and Struts action classes, and the brief was to continue using these as much as possible. I decided to start with one of the main display pages, the one that users spend most of their time in. The idea was to give a quick demonstration of what was possible; and it did, to a point. But what I hadn’t realised was that frames are not part of HTML 5. There are ways to keep using them, but it’s not easy, and not good practice.

So while the new look and feel of a single page was clear, it was far from clear how the various pages would interact, how they would be brought together to form the whole UI, without frames.

If I were to start the project again now, my first step would be to work out how to link the pages together into a single interface, in the absence of frames. Most likely I would use one or other of the forms of JSP includes.

However, if there was the budget to do a more complete rewrite — by which I mean one that did not necessarily seek to use the existing JSPs — I would probably make much greater use of JavaScript and Ajax, and use the action classes just to provide data to the Ajax calls.

What is your approach to testing, and how would you test your application?

I would use a mixture of automated unit testing using JUnit, automated GUI testing, and actual user testing, if at all possible.

This fits well with what I was saying above. There are, broadly, three levels of testing: unit, integration, and system. Though writing automated unit tests is a development activity, rather than a testing one. Certainly we wouldn’t expect dedicated QA testers to work at the unit-test level.

So let’s assume that we have satisfactory unit-test coverage and we are interested in testing the application as a whole. Automation is obviously key here, as well, both because it allows us to easily repeat the tests regularly — for every checkin, in an ideal world (and see below); and because it removes the need for testers to manually step through a written script, which is boring and error-prone.

I have used Selenium for automated GUI testing, with some success. It takes a significant amount of development work, because it’s doing a significant thing, but the effort should pay off.

However, even after all that, there is still no alternative to having someone sit down and actually use the application. Automated testing might pick up outright errors in how the user interaction works. But it won’t catch fine details like misaligned elements, typos in onscreen text, or just generally how it feels to use the application.

What are the benefits of Continuous Integration?

Continuous Integration takes us beyond the traditional daily build. It does more than just building, and does it more frequently than just daily.

At the simplest level it ensures that, for every commit, an incremental build of the complete product is made, and all the unit tests are run. In the most advanced case, as well as building and testing, the product can be deployed to test servers and integration tests such as the automated GUI tests mentioned above can be run. Realistically those tend to take longer, so it’s unlikely that you would do them for every commit, but they can certainly be run multiple times daily.

So we get the following benefits:

  • frequent builds catch problems in code integration;
  • unit tests are run frequently, catching any regressions;
  • integration tests are run regularly, catching other problems;
  • general confidence in the product is increased;
  • developers are happy to commit changes frequently.

Java isn't slow

So if your Java code is doing something easier than processing 6 million events a second, and it’s slow, you can maybe make it faster!

Source: Java isn’t slow

Great piece by Julia Evans on some really fast Java applications. Notably LMAX.

Eclipse SVN key bindings not working

I often get problems with the key bindings when I create a new Eclipse workspace. The recent ones with Subversion seemed intractable until I found this answer on the mighty StackOverflow.

It’s a frustrating thing when your muscle-memory has an action and it doesn’t trigger the expected response.

keyboard shortcuts – SVN key bindings not working in Eclipse – Stack Overflow.

Link: The One Correct Way to do Dependency Injection | Schauderhaft

The One Correct Way to do Dependency Injection | Schauderhaft In the end, "Dependency Injection" just means "passing parameters"; which was always the right way to do things anyway. From my Pinboard

Tip: using Pandoc to create truly standalone HTML files

If you’re using the excellent Pandoc to convert between different document formats, and you:

  • want your final output to be in HTML;
  • want the HTML to be styled with CSS;
  • and want the HTML document to be truly standalone;

then read on.

The most common approach with Pandoc is, I think, to write in Markdown, and then convert the output to RTF, PDF or HTML. There are all sorts of more advanced options too; but here we are only concerned with HTML.

The pandoc command has an option which allows you to style the resulting HTML with CSS. Example 3 in the User’s Guide shows how you do this, with the -c option. The example also uses the -s option, which means that we are creating a standalone HTML document, as distinct from a fragment that is to be embedded in another document. The full command is:

pandoc -s -S --toc -c pandoc.css -A footer.html README -o example3.html

If you inspect the generated HTML file after running this, you will see it contains a line like this:

<link rel="stylesheet" href="pandoc.css" type="text/css">

That links to the CSS stylesheet, keeping the formatting information separate from the content. Very good practice if you’re publishing a document on the web.

But what about that “standalone” idea that you expressed with the -s option? What that does is make sure that the HTML is a complete document, beginning with a DOCTYPE tag, an <html> tag, and so on. But if, for example, you have to email the document you just created, or upload it to your company’s document store, then things fall apart. When your reader opens it, they’ll see what you wrote, all right; but it won’t be styled the way you wanted it. Because that pandoc.css file with the styling is back on your machine, in the same directory as the original Markdown file.

What you really want is to use embedded CSS; you want the content of pandoc.css to be included along with the prose you wrote in your HTML file.

Luckily HTML supports that, and Pandoc provides a way to make it all happen: the -H option, or using its long form, –include-in-header=FILE

First you’ll have to make sure that your pandoc.css file1 starts and ends with HTML <style> tags, so it should look something like this:

<style type="text/css">
body {
    margin: auto;
    padding-right: 1em;
    padding-left: 1em;
    max-width: 44em; 
    border-left: 1px solid black;
    border-right: 1px solid black;
    color: black;
    font-family: Verdana, sans-serif;
    font-size: 100%;
    line-height: 140%;
    color: #333; 
}
</style>

Then run the pandoc command like this:

pandoc -s -S --toc -H pandoc.css -A footer.html README -o example3.html

and you’re done. A fully standalone HTML document.


  1. It doesn’t have to be called that, by the way.

Bash - how to recursively find the latest modified file in a directory

Recursively finding the latest modified file in a directory.

From the mighty Stack Overflow, some useful tips on using find with dates.

Pass-By-Reference Problem When Using Websphere Application Server

This has been kicking around, nearly finished, for months. It's not going to get any better, or shorter, so it's long past time I put it out there.

It’s also just long; and technical. So feel free to ignore. I won’t be offended.

I rarely write about programming or other technical issues here, but I probably should do so more often. Certainly in a case like this.

I often think about the many, many problems that I’ve had help with from strangers on the internet; people who have taken the time to write blog posts, answers to questions on forums, or technology tutorials. My job would barely be possible at times without the web. Of course, we didn’t have it back when I started in 1987; but we didn’t do such complex things, with so many different languages and technologies.

Anyway, all these kind strangers have helped me, and I rarely find myself in a position to give anything back to the community. So since I recently hit a problem that no-one else seems to have had, it’s really my duty to describe it, and my solution, in the hope that it might be of use to someone down the line.

If you’re looking for the solution to the problem with pass-by reference on WAS, and don’t want to read the story of how I got there, you can jump straight to The New Bug

Background

We develop our main app using a fairly standard n-tier architecture using JEE: web front end using JSPs and Struts; EJBs; a multiplicity of database platforms accessed using Spring. All fairly standard stuff, whose purpose is to move financial messages around.

A lot of this was originally developed when I wasn’t around (I was seconded to another department) and by contractors and others who are no longer with us. So I take no responsibility for the stupidities that exist in codebase. Or rather, I accept no blame. I do, in fact, have responsibility for it; for keeping it going and developing it onwards now.1

One of the bad choices that was made by the original developers of this version of the product, was that they should cache the results of database queries. The users can define various criteria by which they want to select a set of messages to view; those get translated into SQL, which our Java code executes using JDBC. Again, all standard stuff. JDBC was designed for exactly that kind of thing. Databases exist solely to do that kind of thing.

So the wise and sensible developers decided that performance would be a problem if a query returned many rows from the database. They decided that transferring the rows to the browser and allowing the user to scroll through them would be impossible. So they designed a caching mechanism.

Thing is, JDBC has that kind of caching built right in. And furthermore, they (our developers) included a limit: a maximum number of rows to return, which could be set to 50, 100, 500, or 1000. Pretty reasonable, since any query that returned over 200 or so rows is likely to be less than useful, anyway.

But they still built that caching mechanism.

The Mechanism

That’s all right, though, I hear you say. Cache the rows server-side in memory, return a subset to the user as they page through them. It sounds fine.

True enough. Except they didn’t cache them in memory. Oh no. That would have been too sensible. And might have caused performance problems (I’m sure they thought, if they even considered they matter). No, they cached them elsewhere. Where? In the database.

Yes, they introduced another table; a shadow table; an almost-identical duplicate of the Messages table, called MessageQueryResults. Executing a query then consisted of selecting the required rows and writing them into this table, keyed by the HTTP session ID; and then re-querying this results table to get a page worth of results.

So, to recap, then: to improve performance (without first determining that there was actually a problem), they replaced a simple database read with a read, a set of writes, and another set of reads.

That was bound to perform better, right?

The Failure

It wasn’t performance that brought this flimsy edifice crashing down, though.2 No, it actually ran quite successfully for several years. Three things brought about its end: Microsoft, multiplicity, and me.3

Microsoft’s part was through their database platform, SQL Server. Between one version and another they changed something about their storage mechanism, so that you could no longer rely on rows on a table being in the sequence in which they were written to the table. The thing is, you’re not supposed to be able to rely on that, according to DB theory. That was another flaw in the “design” above; it relied on the shadow table’s rows being returned in the same sequence they were written in. On Oracle and DB2 that worked; and it did on SQL Server too, until (if memory serves) the 2005 version. This meant that clients on that platform who had large queries couldn’t rely on them being displayed in the right order.

Oh dear.

Hacks were applied to sort this out. Pun intended: sorting is pretty much what they did. Not a fix, but a workaround at best.

The multiplicity part was that the same mechanism was used to query another table; and then a third. And there was a fourth on the horizon. Each new table meant a new shadow table which had to be maintained in parallel – and whose creation and upgrade scripts had to be maintained across three database platforms. A maintenance nightmare.

Then there was me. I had known about the problem for some time, of course – I had done an estimate for fixing it – but there was never time to fix it. It was a big task, quite intrusive, and showing no easily-provable customer benefit. Yes, I know ease of maintenance, by making life easier for developers, is an implicit customer benefit; but try selling that to management, when there are customers crying out for new features.

But in the project that was to introduce the fourth table (or seventh and eighth, you might say) I was in a position to say, “we fix this first, or it all goes to hell”.

The user story was written. I got my estimate out of hibernation (and increased it, of course). And then I did the fix. It was a great joy. That in-memory caching mechanism I mentioned above? I did that. If I’d been designing it from scratch I would almost certainly have relied on JDBC’s internal caching, at least until it proved problematic. But under the circumstances, when the code relied on there being a cache, it was going to be much less disruptive to retain one. I just replaced the stupid one with a more sensible one.

Inevitably, though, I introduced a new bug.

The New Bug

This is where I stop telling a story and start explaining the problem and solution.

Introducing the new message caching mechanism, which replaced the MessageQueryResults table, inadvertently caused a problem when we set a WAS server to pass-by-reference mode.

This mode is recommended when the different tiers of the application (web, EJB) are running in the same JVM. This is normally the case in our test environments, and frequently the case in client systems. Enabling this mode removes the need for objects to be copied as they are passed through the tiers, and can improve performance dramatically in such environments.

What Went Wrong

Changing the caching mechanism caused no problem as long as pass-by-reference was off. As soon as it was turned on, we noticed that taking certain actions, such as deleting a message, failed.

The failure was at a point in the code where the a value such as an amount was being retrieved from the Map that formed the new cache. The failure was that the retrieved Object was being cast to a Number, but what was in the Map entry was actually a String.

This Map comes, by a fairly complex set of steps, from the new cache, and before that from the database itself, of course.

Now, since the Amount column on the DB is numeric, and the Map in question is originally populated via Spring from the DB, obviously the value was a numeric one originally. This suggested that the value must have been changed, and that gave us the first clue to tracking down the cause of the problem, and coming up with a solution.

The Cause

It seemed likely – and running debug, it was shown to be so – that the numeric value that was retrieved from the database and stored in the Map was being replaced by the edited value which is built for displaying. In other words, the object now contained a String holding digits, a decimal point, and probably commas.

Why it Changed When We Switched on Pass-By-Reference

When the data was being passed by value, a new Map, complete with its contents, was being passed from the EJB layer to the webapp. The webapp then updated values in that Map, editing them for display purposes. But it was only changing its own copy; it had no effect on the version stored back in the EJB layer. So when the same Map was retrieved again, so that the action could be performed, a new copy was received by the webapp. No problem.

But when pass-by-reference is on, no copy is made. The webapp receives a reference to the actual Map that is stored in the cache back in the EJB layer. So when it updates an entry in that Map, it updates the very object that is stored in the cache (note that the put method of the Map interface will update the stored value if it receives a key that it already holds).

And then when the Map and its entry are retrieved again for the action to be performed, it is the updated (and now wrong) version that is retrieved.

Why it Changed When We Changed the Caching Mechanism

And yet both possible passing settings were available before we changed the caching mechanism. Why did we not get this problem when using pass-by-reference with the old caching mechanism?

The answer to that is that the old mechanism cached the query results in the database itself, in the MessageQueryResults table. Each time a set of results was requested by the webapp, the EJB layer went back to this temporary table and populated the Map that it returned to the webapp. So the amount value would always have been set up freshly from the numeric Amount column, which ensured that it was an object of type Number.

The Fix

I tried making copies of the Maps and Lists used, at various points in the process, including using ImmutableLists and ImmutableMaps from Google’s Guava library, in an attempt to prevent the value object of interest from being updated. However, it wasn’t possible to make them immutable deeply enough (and would probably have caused other problems if it had been). That was largely because the principal Map is created and populated by Spring, so we don’t have much control over it.

One solution – and probably the proper one – would have been to copy the entries from the Map at the point they are read and processed in the webapp. This would have meant that the edited, String, version of amount would be a different, new object, and would not have been updated in the Map that came from the cache.

However, the vast complexity of the class where this would have had to happen made this seem like a very difficult and dangerous approach, especially at this late point in the project.

An alternative solution was suggested by one of my colleagues. It was to accept the fact that the amount value might be a String containing a numeric value with commas and decimal point, and to parse the numeric value out of it.

This allowed us to cater for both numeric and string values, and it worked with either form of passing semantics. But it felt like a hack, and I was sure it would come back to bite us.

Fixing the Fix, a Little Later

It did. The trigger this time was paging through the list of results; when you returned to a page you had already seen, you ended up with an object of the wrong kind coming out of the Map. If memory serves it was a String where it should have been a Date.

It was clearly another result of the data being edited for display and updated in-place in the Map. There are too many possible places in in the relevant method to rely on finding them all, so I returned to the “probably the proper” solution mentioned above. I changed the relevant method such that it now returns a copy of the List containing the required subset of the query results. This is less straightforward than might be hoped, because copying a List, including by the clone method of the implementing class, for example, tends to do a “shallow copy”, which means that you get a new List instance, but containing references to the same objects.

I wrote a method called copyList, which iterates over a List and makes “deep” copies of a few expected types of object. We may have to extend this method to handle other types, but I don’t expect that at the moment.

Also Worth Noting

There is a warning about this on IBM’s Best Practice: Using pass by reference for EJBs and servlets if in same JVM page, but it’s one of those typical contrived kind of examples that probably wouldn’t really alert you to the possibility of something like my experience.


Notes

To set the pass-by-reference mode on or off, take the following steps in the WAS administrative console (this is WAS 6, it’s probably different at other releases).

Go to Servers -> Application servers -> <server -name>

Expand Container Services; click on ORB Service; check/uncheck “Pass by reference”


  1. Not just me, I should note. ↩︎

  2. Or more folding down, slowly, over years. ↩︎

  3. I love a bit of alliteration, don’t you? ↩︎

Thoughts on Business Sectors

It occurs to me that software companies, like the one I work for, are probably considered part of the 'service sector', in the kind of statistics that you hear on the news from time to time. Like most such companies, we do provide services. But at our core, we make and sell things -- computer programs. The fact that the things are delivered by FTP rather than DHL does not make them any less things.

In short, we should be considered as part of the ‘manufacturing sector’; or at least as some sort of hybrid. The national statistics are therefore skewed, and the UK probably has a far larger manufacturing sector than we are generally told.

(Incidentally, I seem to have posted a version of this at http://peg.gd/16Y, which just lets you do it, with no ‘About’ or any information. Interesting.)