Why is this GoLang solution faster than the equivalent Java Solution?

The below is a small story about how I took a program with a runtime of hours to minutes, to tens of seconds to seconds. More for my own personal amusement then anything else.

At work there is a tradition of a Friday quiz being posted by the winner of the previous week. I missed out on the most recent one due to having to duck off early to do my tax but the problem was rather an interesting one.

The challange itself is not as simple as you would initally think and taken from a 2015 IBM Ponder This https://www.research.ibm.com/haifa/ponderthis/challenges/May2015.html

Three people are playing the following betting game.
Every five minutes, a turn takes place in which a random player rests and the other two bet
against one another with all of their money.
The player with the smaller amount of money always wins,
doubling his money by taking it from the loser.
For example, if the initial amounts of money are 1, 4, and 6,
then the result of the first turn can be either
2,3,6 (1 wins against 4);
1,8,2 (4 wins against 6); or
2,4,5 (1 wins against 6).
If two players with the same amount of money play against one another,
the game immediately ends for all three players.
Find initial amounts of money for the three players, where none of the three has more than 255,
and in such a way that the game cannot end in less than one hour. (So at least 12 turns)
In the example above (1,4,6), there is no way to end the game in less than 15 minutes.
All numbers must be positive integers.

Only one person managed to find an answer, lets call him Josh (because that’s his name), having spent a few hours writing up a solution using his favourite programming language Go. Come Monday morning I arrived, looked at the quiz I and became intrigued. Could I write a version that would outperform his. After all Go is a pretty performant language, but I suspected that he may have missed some easy optimisations, and if I picked something equally as fast I should be able to at least equal it.

I took a copy of his code https://gist.github.com/walesey/e2427c28a859c4f7bc920c9af2858492 (since modified to be much faster) with a runtime of 1 minute 40 seconds (on my laptop) started work.

Looking at the problem itself we have a few pieces of information we can use. One is that we don’t ever need to calculate the 12th turn. If the game makes the 11th turn and still continues then it made 12 so that saves us some calculations. Another is that if any money amounts are the same we can stop the game instantlty, and more importantly not even add those to the loop.

Given that the existing solution was written in Go it seemed insane to some that I started writing my solution at least initally in Python. This however is not as crazy as it seems. Python being higly malleable allow some rapid iteration, trying out a few things before moving over to another language.

The first thing to note is that you need to generate a tree of every possible combination that a game can take. Then you iterate over each one for the inital starting amounts of money to determine if the game ever ends, and if not mark it as a game that does not finish. To generate the combinations I went with a fairly simple recursive strategy,


# Calculate all the possible sequences for who misses out for each turn
def calc_events(current=[], turn=0):
    if turn == DESIRED_TURNS:
        return [current]

    one = list(current)
    one.append(0)

    two = list(current)
    two.append(1)

    three = list(current)
    three.append(2)

    turn += 1
    path1 = calc_events(current=one, turn=turn)
    path2 = calc_events(current=two, turn=turn)
    path3 = calc_events(current=three, turn=turn)

    return path1 + path2 + path3

The result of running the above is an array of arrays containing each situation where someone sits out a turn. It is important to note that his produces a list that modifies from the back (big-endian so to speak) as this will be a very important consideration later.

The result looks something like this (truncated to just 4 results),


[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]

Given the above I coded up a simple solution which simply looped through the combinations. Here it is in pseudocode.


for player1 money:
  for player2 money:
    for player3 money:
      for game in games:
        for turn in game:
          result = play turn using money amounts
        if result:
          print player1, player2, player3

The above is about as bad as it gets for an algorithm as we have 5 nested loops. As expected the runtime performance was horrible. In fact I added a progress bar which simply calculated how far into the first loop the application had reached. Using Python I worked out that it was going to take several hours to get a result using the above. Not good enough.

A few calculations and I realised that I was asking my poor program to calculate something like 50 billion games. Clearly I needed to reduce the number of games and speed up the processing as much as possible. The first change I made was to only calculate the game turn events once and keep the result. I had also written the code to be as readable as possible, which how you should write anything but as this can be an issue also removed the following function by inlining it for improved performance.


def turn(p1, p2):
    if p1 == p2:
        return False, p1, p2

    if p1 > p2:
        p1 = p1 - p2
        p2 = p2 + p2
    else:
        p2 = p2 - p1
        p1 = p1 + p1
    return True, p1, p2

The next thing I did was generate all the permutations of money amounts to play games with into a single list. The last thing I did was switch from using Python to PyPy which with its JIT should speed up the loops considerably. The result of all this can be found here https://gist.github.com/boyter/cab749f4713201f5b409c5b1353fc36c and its runtime using PyPy dropped to ~8 minutes.

8 minutes was about 5 times slower then the GoLang program at this point, which is pretty good for a dynamic language like Python, and consider that my implementation was single threaded where as the Go was using as many cores as it could get. My next step was to implement the same program in a faster language. The only other faster languages I know that I have any experience in are C# and Java. Since I already had Java setup I went with that. I will also admint it was about proving a point. Go may be fast, but at writing Java should be able to equal it for most tasks.

At this point however I mentioned to Josh that his Go program had some inefficiencies. The big one being that he was calculating the 12th game needlessly. I then modified his program and with some other changed reduced the Go program runtime down to ~40 seconds. I was starting to get worried at this point, as I was aiming to beat the Go program’s performance by 5%. No idea why I picked this number but it seemed reasonable if I was smart with the implementation.

At first I ported the Python program in its original readable reusable form to Java and ran it. The runtime was ~7 minutes. I then inlined the same functions and converted it over to use parallel streams. This time the runtime was about 90 seconds. This would have been fast enough had I not mentioned to Josh how he could improve his code. I had shifted the goalposts on myself and had a new target now of ~40 seconds.

After doing a few things such as inlining where possible, changing to enum and some other small tweaks I had a runtime of ~60 seconds. The big breakthrough I had was after looking at the hot function I realised that storing the game events in a List of Arrays meant that we had a loop in a loop. It was however possible to flatten this into a single array of integers and reset the game every 11 turns with a simple if check.

This was the breakthrough I needed. Suddently the runtime dropped from ~60 seconds to about 23 seconds. I happily posted my success in the Slack channel with a link to the code and sat back.

The smile was soon turned to a frown. Josh implemented the same logic into his Go program and it now ran in ~6 seconds. It was suddently 5x times faster. At this point I had a mild panic and started playing with bitwise operations and other micro optimisations before realising that no matter what I did he could simply implement the same change and get the same performance benefit as we were both using roughtly the same algorithm.

Something was clearly wrong. Either Go was suddently 5x faster at a basic for loop and integer math then Java OR I had a bug in my code somewhere which was making it worse. I looked. Josh looked. A few other people looked. Nobody could work out what the issue was. At this point I didn’t care about the runtime, I just wanted to know WHY did it appear to be running slower. I was so desperate for an answer I did what all programmers do at this point and outsourced to the collective brain known as Stack Overflow http://stackoverflow.com/questions/43082115/why-is-this-golang-solution-faster-then-the-equivalent-java-solution

A few un-constructive comments came back such as Java is slow (seriously its not 1995 anymore guys, Java is fast) etc… Thankfully one brilliant person managed to point out what I had missed. It was not the loop itself, but the input. Remember how I said that the generation of the events being big-endian was important? Turns out the Go program had done the reverse and implemented it little-endian.

The thing about the core loop is that it has a bail-out condition. If two players have the same money amount we end the game and don’t process any further. The worst possible situation for the loop is to process almost every condition only to find out just at the end that it ended. Its a lot of processing work. Ideally you want to find the failing conditions as soon as possible. By changing the games from the end I was forcing the Java program to process about 5x times as many combinations as the Go program.

It just happended to be that Josh has picked a more optimal path through the games.

A simple reverse of the games (line 126 of the linked solution) and suddenly the Java program was running in about ~6 seconds and the same time as the Go program. You can view the code where https://gist.github.com/boyter/42df7f203c0932e37980f7974c017ec5

For fun I tried running it on a 16 core VPS and it ran in about ~2 seconds and maxed out all the cores so it seems parallel streams do what you expect them to.

Interestingly while Josh’s starting positions was more optimal then mine, its probably still not the optimal path for this problem. There is bound to be a way to generate the game such that you hit the failing conditions as soon as possible saving needless processing. There is probably a Thesis for a PhD in there somewhere.

I figure this is probably as far as I want to take this. I did play around with bitwise operations and loop un-rolling but the time didn’t change that much.

I certainly had fun with the implementation and working things out. Some may argue that optimising for a micro benchmark such as this is a waste of time, and generally they would be right. That said there are occasions where you really do need to optimise the hell out of something, say a diff algorithm or some such. In any case what developer does not dream of saving the day with some hand unrolled loop optimisation that saves the company millions and brings the developer the praise and respect of their peers!

EDIT – A rather smart person by the name of dietrichepp pointed out that there is a better algorithm. Rather than brute force the states work backwards. You can read their comment on Hacker News and view their code in C++ on Github.

The Worst Individual I Ever Worked With

Taken from a comment I posted on HN in a thread about a Soccer Con Man.

Not actually a programmer. The guy was hired to be a project manager.

After joining things were as expected but after a few weeks we noticed that he was rarely around after lunch and never around after lunch on a Friday.

We would email him at those times deliberately to catch him out and I recall starting to put sticky notes on his laptop “Came to see you a X time”. He would come back and just dump all the notes in the rubbish and claim he never got them. He would often claim to be working from home, despite his laptop being on his desk and usually closed. He would also never responding during those times to email or IM.

A classic seagull manager he would appear when something went wrong, making a lot of noise, writing a lot of emails and then vanishing. He would also be sure to be seen when something was delivered often staying back late on those times.

It got so bad one friend of mine started tracking when he was around and then tracking when one of his relatives died. During his tenure the following incidents occurred,

– Hot water system blew up. 4 times. He had pictures which he would show all the time.

– Uncles, aunts, and various over family members died to the total of 20 individuals.

– Our time tracking him showed him to be in the office less than 15 hours a week on average.

We started to suspect he had a second job and was pulling the same con on them. This was never proved, but we did find someone who had worked with him previously and they reported the same behavior.

The worst thing was it was raised with management at least several dozen times and nothing ever happened. He managed to pull this scam off for 4 years. I could not believe the waste of money this guy was, literally $500,000 burnt on a useless individual.

C# as a Language from old Google+ Post

The more I use C# as a language for writing things the more I am convinced that its approach really is the best language approach out there.

The unit test support is excellent which allows development speed to be just as fast as any dynamic language (Python, PHP, Perl).

The static typing catches so many issues before you get to runtime and allows sweeping changes without breaking things.

Unlike Java it has the var keyword (saves time and improves readability) and so many more useful functions which yes you can replicate but are just built in and work correctly.

Then you get to the really good stuff. LINQ is awesome. The lazy loading allows you to implement a repository pattern over your database which is just awesome. Set up the basic select * from then add extension methods allowing you to chain whatever you need, EG

from person in _dbContext.GetPerson().ByUserName(username).ByPassword(password);

100% elegant, easy to test, easy to write, easy to read and understand and generally works exactly as you would expect without any hidden gotchas. And because its lazy it doesn’t chew resources sucking back everything from the database.

You can use functional programming techniques if you wish, and with the new async decorators you can work in a node.js style if you with, with static typing and all existing library support.

Or you can continue to work in a C like manner, or mix it up with objects, procedural code and functional.

I switched back to Java not that long ago to write a simple server using Jetty and even with things like Guice (best DI implementation I have used so far) and Guava it was still painful. Less painful, but I really felt that the compiler was fighting me from doing things in an elegant manner most of the time. Even adding the “var” keyword would improve Java in a massive way. Add some functional programming in there and I would be pretty happy.

I just wish C# would run on the JVM as I would use it for pretty much everything in a heartbeat. As it is the Mono support is missing the stuff I really want and isn’t as seamless as the experience should be. A pity really as C# really is in my experience the nicest language to work today that’s production ready.

Interesting Code Comment

Found the following comment in some code I had modified a few years ago.

Just to set this up, its an existing application I had no hand in creating, and is a totally atrocity of 180,000 lines of untested code (and pretty much un-testable) which through the abuse of extension methods lives in a single class spread out across multiple files.

/* 
This is evil but necessary. For some reason people have put validation rules here rather then in the bloody ValidationHelper. Thanks to their incompetence or genius... we now have no idea if we add the extra validation in the correct place and call it here if it will work. Since this is also 180,000 lines of non tested nor testable code (without refactoring) I have no confidence in making any changes. Sure we have subversion but that dosnt allow us to code fearlessly ripping apart methods and refactoring since we have no test safety net.

I guess the obligatory car analogy would be driving down the highway, carrying nuclear waste, in an open container, in a snow storm, with acid/lsd/ice fueled drugie ninja bikies attacking you, while on fire, while juggling chainsaws, and all of a sudden you need to change the tyre. So much is going on that its you dont want to risk it and then when forced to do so 
you know its going to end up badly.

If you are still reading this then for the love of all things holy, help by refacting stuff so we can test it properly. The DAO layer should be fairly simple but everthing else is a shambles. 

Rant time over. Lets commit sin by adding more validation.
*/

Not so unique GUID

I have been doing a lot of work with the Sitecore CMS recently. Once of the things you quickly learn is how it relies on GUID’s for pretty much everything. This means of course when you start testing and need to supply GUID’s into your tests that you end up with lots of GUIDs that look like the following sprinkled through your code {11111111-1111-1111-1111-111111111111}

Today I remarked that we should be using things like “deadbeef” for the first part of the GUID with a colleague. He suggested that we should try and actually write something. With a little bit of 1337 speak this is actually possible. Naturally we got back to work, but with a little free time I quickly coded up a simple Python application to generate “phrased” GUID’s. Some examples follow,

silicles-oafs-blob-tael-declassified -> {5111c1e5-0af5-b10b-7ae1-dec1a551f1ed}
deedless-gait-soft-goes-eisteddfodic -> {deed1e55-9a17-50f7-90e5-e157eddf0d1c}
libelist-diel-alls-flit-disaffiliate -> {11be1157-d1e1-a115-f117-d15aff111a7e}
offstage-diel-labs-scat-classifiable -> {0ff57a9e-d1e1-1ab5-5ca7-c1a551f1ab1e}

None of the above are make much sense, but by looking at the outputs you can attempt to write something such as,

 cassette soft gold dice collectibles
{ca55e77e-50f7-901d-d1ce-c011ec71b1e5}

Very zen. Some rough back of napkin calculations gives my program something like 10,000,000,000,000 combinations of GUID’s based on the word list I supplied. I may just turn it into a online GUID generator like this one http://www.guidgenerator.com/

EDIT – You can now get these guids at https://searchcode.com/guid/