programming

    JetBrains Mono: Equal or Not

    I just installed the JetBrains Mono font. We programmers need monospaced fonts, and this is a very nice one. It comes installed with recent versions of JetBrains’s IDEs. My copy of IntelliJ was not recent, it turned out.

    Anyway, the most interesting thing is ligatures for programmers. Take a look at this:

    Screenshot 2020 01 19 at 23 25 11

    You see that “not equals” sign? The crossed-out equals that we were taught to write back in secondary school? That’s not a character in any normal ASCII typeface. Plus, this is Java: even if it were a character (there is a Unicode character for that symbol), it’s not part of the language. The compiler wouldn’t recognise it.

    What that actually is is the standard not-equals of C-based languages: !=. But the font has detected it and replaced it with the more attractive and traditional symbol.

    It’s a setting you can disable, and I’m not sure I’ll keep it that way, but it’s impressive and unusual.

    REPL Reply

    Hjertnes talks about the joy of a REPL:

    A REPL or read eval print loop is what we called an interactive prompt back in the day when I learnt Python and Ruby.

    He goes on to say:

    For a REPL to make sense you need to be able to test small chunks of code. Like this function or this expression; or my typical thing, “would this work” or how the fuck was that syntax again?

    I’ve sometimes found that they have a downside. When you are looking for code examples, then if a language has a REPL, very often the examples show the use of a feature in the REPL. Which may be fine, but is not so helpful if you’re trying to find out how to construct a class or a function.

    Which point, to be fair, Hjertnes does address:

    In other words, if your language require a lot of “foreplay” to run code, like declaring a namespace and a class etc (I’m looking at you Java and C#) it will probably not be the right thing. But if you can evaluate code without much fuss it is.

    Java is supposed to be getting one soon, I believe, if it’s not already in version 9.

    They Took Something Very Weird and Made It More Usable

    Good piece by Paul Ford, writing at Bloomberg on Microsoft buying GitHub:

    [GitHub] has a well-designed web interface. If you don’t think that’s worth $7.5 billion, you’ve never read the git manual.

    He means the man pages, I assume.

    GitHub is “the central repository for decentralized (sic) code archives,” which is mildly amusing. But this:

    In the pre-git era, you updated your software annually and sent customers floppy disks. But if you’re running a big software platform, you might update your servers constantly—many times a day or every 20 minutes.

    is a bit over the top. There were a lot of changes between sending out floppies and continuous deployment.

    I question his (lack of) capitalisation. The command is git, all lower case. But if you’re talking about the application, you should spell it “Git”, with the capital. I think so, anyway. You would write about “CVS”, even though the command was (is) cvs; and “Subversion,” with the command svn. But at least it’s not as annoying as people who write it in all-caps.

    Lastly, when he says, “Computers are mercurial,” I’m assuming he’s wryly referencing what was once Git’s major rival in the distributed version-control space. Nicely deadpan, if so.

    Tab Convert

    That’s convert, with the stress on the first syllable. The noun, in other words. As in, “I am a tab convert.” A convert, that is, to using tabs for indentation of source code, instead of spaces.

    A Background of Spaces

    From the earliest time that I learned about the tabs vs spaces debate, I’ve been a spaces guy. This is at least partly because of the influence of my then-colleague Benjamin Geer. He has gone on to other, no doubt better, things, but he was probably the best programmer I’ve ever worked with. He introduced me to the idea that you should always use four spaces for indentation. The reason being that if you use tabs, people can have their editor’s tab size set to all sorts of different values, and it leads to source files not looking as you expect them to.

    Whereas spaces are spaces: you can’t go wrong with a space (or four).

    I’ve changed, though. I have become a convert, in my job, and maybe philosophically, to tabs.

    Stack Overflow Survey

    About a year ago there was a survey of developers on Stack Overflow. Among many questions, they asked about whether people used spaces or tabs. The detail that got most attention was that developers who use spaces were paid more on average than those who use tabs. I strongly suspect that correlation is not causation in this case, but it seemed noteworthy at the time.

    More interesting to me was the fact that more people used tabs, at 42.9% against 37.8%. I was surprised: I thought spaces had won years ago. Though I often wondered (sometimes publicly, and I’m surprised to see that was only last year) why the default setting for Eclipse was tabs.

    Maybe that default, and others like it, is part of the reason for the statistics. Most people don’t change defaults. On the other hand, surely developers are the kind of people who are most likely to change defaults?

    Anyway, after the survey came out there were various posts about it, notably John Gruber, who said he was “a devout user of tabs”. OK, he’s not a developer these days, but there were others who are who said similar things. The one that struck me was one that I can’t locate now that said “tabs are semantic.” In other words, pressing the tab key means “indent here.” Four spaces means… four spaces? Could be an indentation, could be something else.

    Everything Changes Imperially

    So I was primed for the idea of switching to tabs, even though I still used spaces in my own projects. And then I started my new job at Imperial College. When I first started looking at the code, I quickly realised that it was indented with tabs throughout. I checked with my co-worker who is the main contributor. He didn’t mind, but they had always used tabs.

    Obviously I didn’t want to introduce a mixture. That’s what really messes up the display of code in different editors. You have to be consistent within a project. So if I were to change the project to spaces I would have to change every file. That was an unnecessary step; and per the above, I was primed to use tabs. They’re semantic, after all.

    I switched my IDE to indent using tabs, with the tab-stop value set to 4. And so we proceed, tabbing away merrily.

    So far I prefer it this way.

    Imperial Adventures

    Just over a month ago I posted a brief note about job news, saying that more details would be forthcoming. I was, as I said then, just waiting for some paperwork.

    It took longer than I expected to get that paperwork sorted out, but I received and returned the contract yesterday afternoon. On Monday I start work at the Small Area Health Statistics Unit (SAHSU), part of the School of Epidemiology and Biostatistics in the Faculty of Medicine at Imperial College.

    That’s quite a mouthful, but in short I’ll be working on programming something called the The Rapid Inquiry Facility (RIF), which is an open-source tool for studying health statistics.

    I’m neither a medical researcher nor a statistician, but I am a programmer (or a software engineer, if you want to be fancy). Our job is to understand the needs of someone — usually referred to as “the business,” but I’m guessing that will be different in my new job — and translate those needs into actions in software. That basic definition doesn’t change according to the problem domain. Whether it’s sending payments from one bank to another, checking a person’s right to work on a government database, or doing something with statistical data about health issues, the programmer’s job is to understand what the user needs and make things happen on a screen.

    The big difference for me, I think, will be that in this new role I’ll have the chance to contribute to doing something good in the world. As I said at my interview, I’ve mainly worked in financial software, and while, sure, people need banks, it wasn’t the most socially-usefully thing. The last half-year working at the Home Office had some value, but I was a tiny cog in a huge machine.

    At Imperial I’ll be able to feel that I’m actually contributing something useful to society, as well as doing what should be really interesting work.

    Oh, and: I’ll be back in Paddington, which I know from my Misys days, and it’s a much shorter commute than to Croydon.

    The Kickstarter Corporate Communication Conundrum

    Today I chanced to see an email in which a manager was asking his staff to work for extra hours. Well, ‘asking’ is putting it generously, to be honest. There didn’t seem to be much that was optional about it.

    The Kickstarter connection, though: you’ll be familiar with the idea of ‘stretch goals.’ If not, the idea is that the basic target is to make X amount of money, but if we make X + 10%, or whatever, we’ll be able to do these other things. Develop additional features, make the item in more colours, or whatever. My guess is that the term originally comes from sports.

    So this email included in the subject the phrase ‘stretch targets.’ Meaning we want you to do more this week/month/whatever, than we originally planned. It was clearly written by someone who thinks that the way to develop software faster is to work your staff to the bone. When in fact that’s much more likely to result in people taking shortcuts and making mistakes.

    In this team they’re already working weekends, and now they’re being ‘stretched’ even more. It bodes ill. But perhaps co-opting the language of positive things for something so negative is worse.

    Some Open-Source Software for Your Delectation

    I have made a thing, and pushed it out into the world. Well, really, this is me pushing it out into the world, because nobody will have noticed it before now, and with this, there’s a chance they might.

    A couple of months ago Manton Reece and Brent Simmons announced the existence of JSON Feed, a new syndication format to sit alongside RSS and Atom; but using JavaScript Object Notation or JSON, instead of XML.

    They invited people to write parsers and formatters and so on for it, and I quickly realised that no-one had yet written one in Java. As far as I can tell that is still the case. Or at least, if they have, they haven’t made it public yet.

    No-one, that is, but me, as I have written just such a thing: a JSON Feed parsing library, written in Java. I’m calling it Pertwee. That’s the product page at my company site (more on which later). It’s open-source, and can be found at Github

    As software projects go, it’s not that exciting. But it is the first open-source project that I’ve released. I hope someone might find some use for it.

    Wondering why people recruiting for senior development positions often ask low-level JVM type questions. Doesn’t hurt to know that stuff, but who keeps it at their fingertips?

    Swim, Test, Shop, Film, Sleep

    Yesterday I kind of wilfully skipped a day. At some point in the evening I realised I wasn’t going to write a post, so I just said, “Fine: that’s allowed.”

    Today I started by going for a swim. After my new regime of exercise last summer, I got out of the habit once I started a new contract. So it was good to get back to it. (Which is not to say I haven’t swum or gone to the gym in all that time, but it’s been a few weeks at the moment.)

    After that I took a HackerRank test for a new job opportunity. It’s a site that does programming tests. This one was, I suspect, a disaster. I hate doing that kind of thing: you’ve got a timer running, and the problem you’re trying to solve is unlike anything you’d have to do professionally… Anyway, suffice to say, it didn’t go terribly well.

    This evening was all about falling asleep in front of the telly. We tried to watch 20,000 Days On Earth, the film about Nick Cave from a few years back. I got it a few Christmases or birthdays ago, but hadn’t got round to watching it till now. I enjoyed what I saw of it, but there was definite falling asleep on the sofa and missing chunks. Oh well, it’s a DVD: we can always go back.

    Oh yes: there was also a trip to Westfield, the time-void where hours go to die.

    Some More Bitface Thoughts

    Something I forgot to mention yesterday was that I thought the “bitface” term was useful not just to refer to people who manipulate bits for a living (or hobby) — programmers, like myself. It can also work to discuss anyone who makes digital content: websites, blogs, podcasts, videos, photos, and so on.

    We’re all moving bits around. We’re all labourers at the bitface.

    Some Thoughts On Software Development

    Before the job interview that I mentioned the other day, the company asked me to answer some questions in writing. I didn’t get the job, but I was pleased with my written answers (and they presumably helped me to get the interview, at least). So I thought I’d reuse them as a blog post. None of this should be surprising for anyone who knows anything about the software development field, but it’s interesting to reflect on how things have changed across my career.

    What are some of the fundamental changes in your approach to software development you have adopted in the last few years?

    There are two main changes that are fundamental and independent of languages and deployment environments: agile techniques and test-driven development (TDD).

    Agile

    Moving from waterfall to agile development was probably the most significant change to development practices in the industry. We always knew that breaking work down into smaller units led to better estimating, more modular code, and just better software. The genius of agile was to extend that understanding to the period of time spent on a block of work. A two-week sprint, with its work being specifically estimated, planned and developed, is just infinitely more manageable than a project phase lasting months.

    Add to that:

    • self-organising teams which include someone from the customer or end user — or at least someone whose role is to represent the user;
    • accepting that change will happen, and embracing it;
    • and the discipline of saying that some features won’t be developed;

    and we have a recipe for success.

    TDD

    Good developers always understood that testing was essential, and did it. But they used to follow a written test plan, or just have an idea of what needed to be tested and work to that. Testing was manual, hard to repeat, and error-prone.

    TDD brought automation. So instead of writing a document listing the required tests, we can write code. That inherently makes the tests rerunnable, so regressions get caught before they become a problem.

    But almost more important than that is the idea of writing the tests first. In an ideal world you write a comprehensive set of tests, write functional code until all the tests pass, and you’re done. It may not always work out exactly like that — in particular, adding tests to a mature codebase can be problematic — but writing tests first encourages us to write code that is easy to test, which tends to lead to better-designed, more modular code.

    An added bonus is that the tests can help to document the code, by showing our expectations. And of course they make refactoring easy and safe, as long as they are in place before you start.

    If you were to start your last project over again, what would you do differently?

    The project I’m thinking of involved rewriting the product’s GUI into a modern, responsive, browser-independent form, using HTML 5 and Twitter Bootstrap.

    The existing version was an old frames-based web app that only worked fully in Internet Explorer, and had to be tweaked when each new version of that browser came out. We had long wanted to modernise it, but there were always other demands on development time.

    Eventually I got a chance to try a proof of concept for the change. The application uses JSPs and Struts action classes, and the brief was to continue using these as much as possible. I decided to start with one of the main display pages, the one that users spend most of their time in. The idea was to give a quick demonstration of what was possible; and it did, to a point. But what I hadn’t realised was that frames are not part of HTML 5. There are ways to keep using them, but it’s not easy, and not good practice.

    So while the new look and feel of a single page was clear, it was far from clear how the various pages would interact, how they would be brought together to form the whole UI, without frames.

    If I were to start the project again now, my first step would be to work out how to link the pages together into a single interface, in the absence of frames. Most likely I would use one or other of the forms of JSP includes.

    However, if there was the budget to do a more complete rewrite — by which I mean one that did not necessarily seek to use the existing JSPs — I would probably make much greater use of JavaScript and Ajax, and use the action classes just to provide data to the Ajax calls.

    What is your approach to testing, and how would you test your application?

    I would use a mixture of automated unit testing using JUnit, automated GUI testing, and actual user testing, if at all possible.

    This fits well with what I was saying above. There are, broadly, three levels of testing: unit, integration, and system. Though writing automated unit tests is a development activity, rather than a testing one. Certainly we wouldn’t expect dedicated QA testers to work at the unit-test level.

    So let’s assume that we have satisfactory unit-test coverage and we are interested in testing the application as a whole. Automation is obviously key here, as well, both because it allows us to easily repeat the tests regularly — for every checkin, in an ideal world (and see below); and because it removes the need for testers to manually step through a written script, which is boring and error-prone.

    I have used Selenium for automated GUI testing, with some success. It takes a significant amount of development work, because it’s doing a significant thing, but the effort should pay off.

    However, even after all that, there is still no alternative to having someone sit down and actually use the application. Automated testing might pick up outright errors in how the user interaction works. But it won’t catch fine details like misaligned elements, typos in onscreen text, or just generally how it feels to use the application.

    What are the benefits of Continuous Integration?

    Continuous Integration takes us beyond the traditional daily build. It does more than just building, and does it more frequently than just daily.

    At the simplest level it ensures that, for every commit, an incremental build of the complete product is made, and all the unit tests are run. In the most advanced case, as well as building and testing, the product can be deployed to test servers and integration tests such as the automated GUI tests mentioned above can be run. Realistically those tend to take longer, so it’s unlikely that you would do them for every commit, but they can certainly be run multiple times daily.

    So we get the following benefits:

    • frequent builds catch problems in code integration;
    • unit tests are run frequently, catching any regressions;
    • integration tests are run regularly, catching other problems;
    • general confidence in the product is increased;
    • developers are happy to commit changes frequently.

    Java isn't slow

    So if your Java code is doing something easier than processing 6 million events a second, and it’s slow, you can maybe make it faster!

    Source: Java isn’t slow

    Great piece by Julia Evans on some really fast Java applications. Notably LMAX.

    Eclipse SVN key bindings not working

    I often get problems with the key bindings when I create a new Eclipse workspace. The recent ones with Subversion seemed intractable until I found this answer on the mighty StackOverflow.

    It’s a frustrating thing when your muscle-memory has an action and it doesn’t trigger the expected response.

    keyboard shortcuts – SVN key bindings not working in Eclipse – Stack Overflow.

    Link: The One Correct Way to do Dependency Injection | Schauderhaft

    The One Correct Way to do Dependency Injection | Schauderhaft In the end, "Dependency Injection" just means "passing parameters"; which was always the right way to do things anyway. From my Pinboard

    Tip: using Pandoc to create truly standalone HTML files

    If you’re using the excellent Pandoc to convert between different document formats, and you:

    • want your final output to be in HTML;
    • want the HTML to be styled with CSS;
    • and want the HTML document to be truly standalone;

    then read on.

    The most common approach with Pandoc is, I think, to write in Markdown, and then convert the output to RTF, PDF or HTML. There are all sorts of more advanced options too; but here we are only concerned with HTML.

    The pandoc command has an option which allows you to style the resulting HTML with CSS. Example 3 in the User’s Guide shows how you do this, with the -c option. The example also uses the -s option, which means that we are creating a standalone HTML document, as distinct from a fragment that is to be embedded in another document. The full command is:

    pandoc -s -S --toc -c pandoc.css -A footer.html README -o example3.html
    

    If you inspect the generated HTML file after running this, you will see it contains a line like this:

    <link rel="stylesheet" href="pandoc.css" type="text/css">
    

    That links to the CSS stylesheet, keeping the formatting information separate from the content. Very good practice if you’re publishing a document on the web.

    But what about that “standalone” idea that you expressed with the -s option? What that does is make sure that the HTML is a complete document, beginning with a DOCTYPE tag, an <html> tag, and so on. But if, for example, you have to email the document you just created, or upload it to your company’s document store, then things fall apart. When your reader opens it, they’ll see what you wrote, all right; but it won’t be styled the way you wanted it. Because that pandoc.css file with the styling is back on your machine, in the same directory as the original Markdown file.

    What you really want is to use embedded CSS; you want the content of pandoc.css to be included along with the prose you wrote in your HTML file.

    Luckily HTML supports that, and Pandoc provides a way to make it all happen: the -H option, or using its long form, –include-in-header=FILE

    First you’ll have to make sure that your pandoc.css file1 starts and ends with HTML <style> tags, so it should look something like this:

    <style type="text/css">
    body {
        margin: auto;
        padding-right: 1em;
        padding-left: 1em;
        max-width: 44em; 
        border-left: 1px solid black;
        border-right: 1px solid black;
        color: black;
        font-family: Verdana, sans-serif;
        font-size: 100%;
        line-height: 140%;
        color: #333; 
    }
    </style>
    

    Then run the pandoc command like this:

    pandoc -s -S --toc -H pandoc.css -A footer.html README -o example3.html
    

    and you’re done. A fully standalone HTML document.


    1. It doesn’t have to be called that, by the way.

    Bash - how to recursively find the latest modified file in a directory

    Recursively finding the latest modified file in a directory.

    From the mighty Stack Overflow, some useful tips on using find with dates.

    Pass-By-Reference Problem When Using Websphere Application Server

    This has been kicking around, nearly finished, for months. It's not going to get any better, or shorter, so it's long past time I put it out there.

    It’s also just long; and technical. So feel free to ignore. I won’t be offended.

    I rarely write about programming or other technical issues here, but I probably should do so more often. Certainly in a case like this.

    I often think about the many, many problems that I’ve had help with from strangers on the internet; people who have taken the time to write blog posts, answers to questions on forums, or technology tutorials. My job would barely be possible at times without the web. Of course, we didn’t have it back when I started in 1987; but we didn’t do such complex things, with so many different languages and technologies.

    Anyway, all these kind strangers have helped me, and I rarely find myself in a position to give anything back to the community. So since I recently hit a problem that no-one else seems to have had, it’s really my duty to describe it, and my solution, in the hope that it might be of use to someone down the line.

    If you’re looking for the solution to the problem with pass-by reference on WAS, and don’t want to read the story of how I got there, you can jump straight to The New Bug

    Background

    We develop our main app using a fairly standard n-tier architecture using JEE: web front end using JSPs and Struts; EJBs; a multiplicity of database platforms accessed using Spring. All fairly standard stuff, whose purpose is to move financial messages around.

    A lot of this was originally developed when I wasn’t around (I was seconded to another department) and by contractors and others who are no longer with us. So I take no responsibility for the stupidities that exist in codebase. Or rather, I accept no blame. I do, in fact, have responsibility for it; for keeping it going and developing it onwards now.1

    One of the bad choices that was made by the original developers of this version of the product, was that they should cache the results of database queries. The users can define various criteria by which they want to select a set of messages to view; those get translated into SQL, which our Java code executes using JDBC. Again, all standard stuff. JDBC was designed for exactly that kind of thing. Databases exist solely to do that kind of thing.

    So the wise and sensible developers decided that performance would be a problem if a query returned many rows from the database. They decided that transferring the rows to the browser and allowing the user to scroll through them would be impossible. So they designed a caching mechanism.

    Thing is, JDBC has that kind of caching built right in. And furthermore, they (our developers) included a limit: a maximum number of rows to return, which could be set to 50, 100, 500, or 1000. Pretty reasonable, since any query that returned over 200 or so rows is likely to be less than useful, anyway.

    But they still built that caching mechanism.

    The Mechanism

    That’s all right, though, I hear you say. Cache the rows server-side in memory, return a subset to the user as they page through them. It sounds fine.

    True enough. Except they didn’t cache them in memory. Oh no. That would have been too sensible. And might have caused performance problems (I’m sure they thought, if they even considered they matter). No, they cached them elsewhere. Where? In the database.

    Yes, they introduced another table; a shadow table; an almost-identical duplicate of the Messages table, called MessageQueryResults. Executing a query then consisted of selecting the required rows and writing them into this table, keyed by the HTTP session ID; and then re-querying this results table to get a page worth of results.

    So, to recap, then: to improve performance (without first determining that there was actually a problem), they replaced a simple database read with a read, a set of writes, and another set of reads.

    That was bound to perform better, right?

    The Failure

    It wasn’t performance that brought this flimsy edifice crashing down, though.2 No, it actually ran quite successfully for several years. Three things brought about its end: Microsoft, multiplicity, and me.3

    Microsoft’s part was through their database platform, SQL Server. Between one version and another they changed something about their storage mechanism, so that you could no longer rely on rows on a table being in the sequence in which they were written to the table. The thing is, you’re not supposed to be able to rely on that, according to DB theory. That was another flaw in the “design” above; it relied on the shadow table’s rows being returned in the same sequence they were written in. On Oracle and DB2 that worked; and it did on SQL Server too, until (if memory serves) the 2005 version. This meant that clients on that platform who had large queries couldn’t rely on them being displayed in the right order.

    Oh dear.

    Hacks were applied to sort this out. Pun intended: sorting is pretty much what they did. Not a fix, but a workaround at best.

    The multiplicity part was that the same mechanism was used to query another table; and then a third. And there was a fourth on the horizon. Each new table meant a new shadow table which had to be maintained in parallel – and whose creation and upgrade scripts had to be maintained across three database platforms. A maintenance nightmare.

    Then there was me. I had known about the problem for some time, of course – I had done an estimate for fixing it – but there was never time to fix it. It was a big task, quite intrusive, and showing no easily-provable customer benefit. Yes, I know ease of maintenance, by making life easier for developers, is an implicit customer benefit; but try selling that to management, when there are customers crying out for new features.

    But in the project that was to introduce the fourth table (or seventh and eighth, you might say) I was in a position to say, “we fix this first, or it all goes to hell”.

    The user story was written. I got my estimate out of hibernation (and increased it, of course). And then I did the fix. It was a great joy. That in-memory caching mechanism I mentioned above? I did that. If I’d been designing it from scratch I would almost certainly have relied on JDBC’s internal caching, at least until it proved problematic. But under the circumstances, when the code relied on there being a cache, it was going to be much less disruptive to retain one. I just replaced the stupid one with a more sensible one.

    Inevitably, though, I introduced a new bug.

    The New Bug

    This is where I stop telling a story and start explaining the problem and solution.

    Introducing the new message caching mechanism, which replaced the MessageQueryResults table, inadvertently caused a problem when we set a WAS server to pass-by-reference mode.

    This mode is recommended when the different tiers of the application (web, EJB) are running in the same JVM. This is normally the case in our test environments, and frequently the case in client systems. Enabling this mode removes the need for objects to be copied as they are passed through the tiers, and can improve performance dramatically in such environments.

    What Went Wrong

    Changing the caching mechanism caused no problem as long as pass-by-reference was off. As soon as it was turned on, we noticed that taking certain actions, such as deleting a message, failed.

    The failure was at a point in the code where the a value such as an amount was being retrieved from the Map that formed the new cache. The failure was that the retrieved Object was being cast to a Number, but what was in the Map entry was actually a String.

    This Map comes, by a fairly complex set of steps, from the new cache, and before that from the database itself, of course.

    Now, since the Amount column on the DB is numeric, and the Map in question is originally populated via Spring from the DB, obviously the value was a numeric one originally. This suggested that the value must have been changed, and that gave us the first clue to tracking down the cause of the problem, and coming up with a solution.

    The Cause

    It seemed likely – and running debug, it was shown to be so – that the numeric value that was retrieved from the database and stored in the Map was being replaced by the edited value which is built for displaying. In other words, the object now contained a String holding digits, a decimal point, and probably commas.

    Why it Changed When We Switched on Pass-By-Reference

    When the data was being passed by value, a new Map, complete with its contents, was being passed from the EJB layer to the webapp. The webapp then updated values in that Map, editing them for display purposes. But it was only changing its own copy; it had no effect on the version stored back in the EJB layer. So when the same Map was retrieved again, so that the action could be performed, a new copy was received by the webapp. No problem.

    But when pass-by-reference is on, no copy is made. The webapp receives a reference to the actual Map that is stored in the cache back in the EJB layer. So when it updates an entry in that Map, it updates the very object that is stored in the cache (note that the put method of the Map interface will update the stored value if it receives a key that it already holds).

    And then when the Map and its entry are retrieved again for the action to be performed, it is the updated (and now wrong) version that is retrieved.

    Why it Changed When We Changed the Caching Mechanism

    And yet both possible passing settings were available before we changed the caching mechanism. Why did we not get this problem when using pass-by-reference with the old caching mechanism?

    The answer to that is that the old mechanism cached the query results in the database itself, in the MessageQueryResults table. Each time a set of results was requested by the webapp, the EJB layer went back to this temporary table and populated the Map that it returned to the webapp. So the amount value would always have been set up freshly from the numeric Amount column, which ensured that it was an object of type Number.

    The Fix

    I tried making copies of the Maps and Lists used, at various points in the process, including using ImmutableLists and ImmutableMaps from Google’s Guava library, in an attempt to prevent the value object of interest from being updated. However, it wasn’t possible to make them immutable deeply enough (and would probably have caused other problems if it had been). That was largely because the principal Map is created and populated by Spring, so we don’t have much control over it.

    One solution – and probably the proper one – would have been to copy the entries from the Map at the point they are read and processed in the webapp. This would have meant that the edited, String, version of amount would be a different, new object, and would not have been updated in the Map that came from the cache.

    However, the vast complexity of the class where this would have had to happen made this seem like a very difficult and dangerous approach, especially at this late point in the project.

    An alternative solution was suggested by one of my colleagues. It was to accept the fact that the amount value might be a String containing a numeric value with commas and decimal point, and to parse the numeric value out of it.

    This allowed us to cater for both numeric and string values, and it worked with either form of passing semantics. But it felt like a hack, and I was sure it would come back to bite us.

    Fixing the Fix, a Little Later

    It did. The trigger this time was paging through the list of results; when you returned to a page you had already seen, you ended up with an object of the wrong kind coming out of the Map. If memory serves it was a String where it should have been a Date.

    It was clearly another result of the data being edited for display and updated in-place in the Map. There are too many possible places in in the relevant method to rely on finding them all, so I returned to the “probably the proper” solution mentioned above. I changed the relevant method such that it now returns a copy of the List containing the required subset of the query results. This is less straightforward than might be hoped, because copying a List, including by the clone method of the implementing class, for example, tends to do a “shallow copy”, which means that you get a new List instance, but containing references to the same objects.

    I wrote a method called copyList, which iterates over a List and makes “deep” copies of a few expected types of object. We may have to extend this method to handle other types, but I don’t expect that at the moment.

    Also Worth Noting

    There is a warning about this on IBM’s Best Practice: Using pass by reference for EJBs and servlets if in same JVM page, but it’s one of those typical contrived kind of examples that probably wouldn’t really alert you to the possibility of something like my experience.


    Notes

    To set the pass-by-reference mode on or off, take the following steps in the WAS administrative console (this is WAS 6, it’s probably different at other releases).

    Go to Servers -> Application servers -> <server -name>

    Expand Container Services; click on ORB Service; check/uncheck “Pass by reference”


    1. Not just me, I should note. ↩︎

    2. Or more folding down, slowly, over years. ↩︎

    3. I love a bit of alliteration, don’t you? ↩︎

    Thoughts on Business Sectors

    It occurs to me that software companies, like the one I work for, are probably considered part of the 'service sector', in the kind of statistics that you hear on the news from time to time. Like most such companies, we do provide services. But at our core, we make and sell things -- computer programs. The fact that the things are delivered by FTP rather than DHL does not make them any less things.

    In short, we should be considered as part of the ‘manufacturing sector’; or at least as some sort of hybrid. The national statistics are therefore skewed, and the UK probably has a far larger manufacturing sector than we are generally told.

    (Incidentally, I seem to have posted a version of this at http://peg.gd/16Y, which just lets you do it, with no ‘About’ or any information. Interesting.)