Another day another interview…

Another day another interview. I actually have been getting some good results from them so far. In particular the last two I have been on. I will discuss them briefly.

The first had an interesting coding test. Rather then asking me to solve Fizzbuzz or implement a depth first algorithm over a binary tree (seriously, I have been programming for 10 years and never needed to do that. I can, but its something I did in uni and not really applicable to anything I have done since then). It was to implement a simple REST service.

You created your service, hosted it online (heroku was suggested as its free) passed in the URL to a form, submitted and it hit your service looking for error codes and correct responses/output to input. Since you got to implement it in any language you want I went with Python/Django and produced the following code.

def parse_json(self, data):
	filtered = self.filter_drm(data['payload'])
	filtered = self.filter_episode_count(filtered)

	return self.format_return(filtered)

def filter_drm(self, data):
	if data is None or data == []:
		return []

	result = [x for x in data if 'drm' in x and x['drm'] == True]
	return result

def filter_episode_count(self, data, count=0):
	if data is None or data == []:
		return []

	result = [x for x in data if 'episodeCount' in x and x['episodeCount'] > count]
	return result

def format_return(self, data):
	if data is None or data == []:
		return {"response": []}

	result = [{	"image": x['image']['showImage'], 
				"slug": x['slug'],
				"title": x['title']} for x in data 
				if 'image' in x and 'slug' in x and 'title' in x]
	return {"response": result}

Essentially its the code from the model I created. It takes in some JSON data, filters it by the field DRM and Episode count, then returns a subset of the data in it. The corresponding view is very simple, with just some JSON parsing (with error checks) and then calling the above code. I did throw in quite a few unit tests though to ensure it was all working correctly.

Thankfully, after writing the logic, some basic testing (curl to fake a response) it all looked OK to me. I uploaded on heroku (never used it before and it took most of the time) and submitted the form. First go everything worked correctly passing all of the requirements listed which made me rather happy.

As for the second interview, it raised a good question which highlights the fact while I know how to write a closure and lambda I cannot actually say what they are. It also highlighted I really need to get better at Javascript since while I am pretty comfortable with it on the front end for backend processes such as node.js I am an absolute notice.

For the first, I was right about a lambda, which is just an anonymous function. As for the second part a closure is a function which closes over the environment allowing it to access variables not in its function list. An example would be,

def function1(h):
    def function2():
        return h
    return function2()

In the above function2 closes over function1 allowing it to access the the variables in function1′s environment such as h.

The other thing that threw me was implementing a SQL like join in a nice way. See the thing is I have been spoilt by C# which makes this very simple using LINQ. You literally join the two lists in the same way SQL would and it just works. Not only that the implementation is really easy to read.

I came up with the following which is ugly for two reasons,

1. its not very functional
2. it has very bad  O(N^2) runtime performance.

var csv1 = [
    {'name': 'one'},
    {'name': 'two'}

var csv2 = [
    {'name': 'one', 'address': '123 test street'},
    {'name': 'one', 'address': '456 other road'},
    {'name': 'two', 'address': '987 fake street'},

function joinem(csv1, csv2) {
    var ret = [];
    $.each(csv1, function(index, value) {
        $.each(csv2, function(index2, value2) {
            if( == {

    return ret;

var res1 = joinem(csv1, csv2);

Assuming I get some more time later I want to come back to this. I am certain there is a nice way to do this in Javascript using underscore.js or something similar which is just as expressive as the LINQ version.

searchcode screenshot

Since I have been working on searchcode for a while and its getting close to being ready for release (a few weeks away at this point I predict) I thought I would post a teaser screenshot.

The below shows how it looks for a sample search. The design is far cleaner then what is currently online which is a big win as the current design of searchcode is seriously ugly.

searchcode teaster


I still have quite a way to go before this is ready to be released, but it is getting closer. Will continue to post updates as I get closer to the release date along with how I migrated from an old PHP codebase to a Python one.

Sample Coding Test

Being in the job market again I been doing quite a few tests. Since I have already put in the effort to a test without result I thought I would post it here.

The test involved producing output from a supplied CSV input file which contained insurance claims. Something about taking the input and using it to predict future claims. Please forgive my explanation as I am not a financial expert. Anyway the idea was to take an input such as the following,

One, 1992, 1992, 110.0
One, 1992, 1993, 170.0
One, 1993, 1993, 200.0
Two, 1990, 1990, 45.2
Two, 1990, 1991, 64.8
Two, 1990, 1993, 37.0
Two, 1991, 1991, 50.0
Two, 1991, 1992, 75.0
Two, 1991, 1993, 25.0
Two, 1992, 1992, 55.0
Two, 1992, 1993, 85.0
Two, 1993, 1993, 100.0

into the following,

1990, 4
One, 0, 0, 0, 0, 0, 0, 0, 110, 280, 200
Two, 45.2, 110, 110, 147, 50, 125, 150, 55, 140, 100

The test was mostly about proving that you can write maintainable code which is unit testable and the like. Anyway here is my solution. It takes in a list of objects which represent each of the four columns of the input.

The feedback I received back was that the coverage I achieved was high (I had a collection of tests over the methods), the code clean and well documented.

public class TriangleCSVLine
    public string product { get; set; }
    public int originYear { get; set; }
    public int developmentYear { get; set; }
    public double incrementalValue { get; set; }

public List TranslateToOutput(List parsedCsv)
    var output = new List();

    // Sanity checks...
    if (parsedCsv == null || parsedCsv.Count == 0)
        return output;

    // Used to determine where we are looking
    var totalYears = parsedCsv.Select(x => x.developmentYear).Distinct().OrderBy(x => x);
    var minYear = totalYears.Min();
    var maxYear = totalYears.Max();

    foreach (var product in parsedCsv.Select(x => x.product).Distinct())
        // All of the products values and the years it has
        var productValues = parsedCsv.Where(x => product.Equals(x.product));
        var originYears = Enumerable.Range(minYear, (maxYear - minYear) + 1);

        var values = new List();

        foreach (var year in originYears)
            // For each of the development years for this "period"
            var developmentYears = parsedCsv.Where(x => x.originYear == year)
                                                .Select(x => x.developmentYear).Distinct();

            // If we have no development years
            // that means we have an origin year without a year 1 
            // development year. This means we have no idea how many values
            // of zero should be in the file, so lets bail
            // should probably go into a pre validation
            if (developmentYears.Count() == 0)
                throw new MissingOriginDevelopmentTrangleCSVException(
                    string.Format("Missing development years for origin {0} in product {1}", year, product)

            // The values are running values...
            // so we keep the total and increment it as we go
            double runningTotal = 0;
            foreach (var rangeYear in Enumerable.Range(developmentYears.Min(), (developmentYears.Max() - developmentYears.Min()) + 1))
                var value1 = productValues.Where(x => x.originYear == year && x.developmentYear == rangeYear).SingleOrDefault();
                if (value1 != null)
                    runningTotal += value1.incrementalValue;
        output.Add(string.Format("{0}, {1}", product, string.Join(", ", values)));

    return output;

private string GenerateHeader(List parsedCsv)
    // Get distinct list of all the years
    var years = parsedCsv.Select(x => x.developmentYear).Distinct();

    // 1990-1990 counts as 1 year so add one
    var developmentYears = (years.Max() - years.Min()) + 1; 
    var header = string.Join(", ", years.Min(), developmentYears);

    return header;

Bitcoin Clones use Same Network?

Another comment I posted over on the TechZing Podcast. It was addressing Justin’s comment about bitcoin clones using the same “network” which is true, in that they share the same protocol but each have their own blockchain.

Each of the “bitcoin” clones are actually their own network. As far as I am aware they have no communication between each network in any form. Its also why each one’s blockchain is so different in size. Also the difference between bitcoin and litecoin (and its clones, such as dogecoin) is the proof of work algorithm they use to verify transactions. Bitcoin uses SHA256 (hence you are seeing lots of ASIC devices) whereas litecoin uses Scrypt, which is more ASIC resistant (although ASIC is starting to come out for them as well).

Most of the coins fall into those two groups, either SHA256 or Scrypt. Two coins that I know of that are slightly different are Primecoin and Vertcoin. Primecoin calculates primes as its proof of work algorithm, so its output is vaguely useful to anyone studying prime numbers. Its also the only coin that I am aware of that can only be mined by CPU. This makes it popular to run on botnets and spot instances in the cloud as you don’t need a GPU. Vertcoin by difference uses Scrypt, but a modified version which is supposed to be very resistant to ASIC mining, presumably by using even more memory then Scrypt.

I think both of you would be wise to actually have a look at dogecoin. The community has gotten more traction then litecoin has in 2 months and is catching up to bitcoin at a staggering rate. Once you get past the meme (which makes it easier to get into I guess?) there is a lot to like and its certainly gaining a lot of a adoption. Lastly its about to have its first block rate halving soon, so now is probably a good chance to pick some up before the price doubles again.

It sounds crazy, but the price is going nuts right now. Its the 3rd highest martketcap coin now and the reward is going to drop in 3 days so expect it to go up again.–in-on-dogecoin/

I highly suggest reading the above. I don’t agree with it all but mostly it seems right to me. Dogecoin has the potential to be the new litecoin and possibly the new bitcoin. Especially with all of the activity taking place.

Be sure to have a look at as well. The community is VERY active, enthusiastic and generous. They are spending the coins making doge more of a currency and less a value store.

Python pep8 git commit check

Without adding a git commit hook I wanted to be able to check if my Python code conformed to pep8 standards before committing anything. Since I found the command reasonably useful I thought I would post it here.

git status -s -u | grep '\.py$' | awk '{split($0,a," "); print a[2]}' | xargs pep8

Just run the above in your projects directory. It’s fairly simple but quite effective at ensuring your Python code becomes cleaner ever time you commit to the repository. The nice thing about it is that it only checks files you have modified, allowing you to slowly clean up existing code bases.

Regarding the Zombie Apocalypse

This piece of content is taken from a comment I left on the TechZing podcast blog. I should note I have not even begun to explore issues such as what happens to a zombie in extreme heat or cold. Of course much of the below can be disregarded if the zombie virus is airborne, but this assumes the standard zombie canon of being spread through bites.

My take on the zombie apocalypse was always that it could never happen. The reasons being,

1. The zombies primary enemy is also its main food source. This is like having to tackle a Lion every time you feel like eating a sandwich. You are going to get mauled.

2. The zombies only method of reproducing is also biting its primary enemy. Again, every time you feel randy go tackle a Lion which has the intent to maul you. Keep in mind in order to be effective each zombie needs to bite at least 2 humans, which leads us nicely to…

3. Humans are bloody good at killing things. This includes a great number of creatures which have far more effective killing implements then we were given by nature (Lions, Tigers, Bears, oh my!) I don’t know about you, but I am pretty sure I could take out 20 zombies in a car without too many issues. Quite a few people have cars. Certainly more then 1 in 20 people in a first world country do. Even if they only take out 2 zombies each we are ahead.

Add in all the gun nuts looking for something to shoot, people with medieval suits of armor (bite that zombie!), wannabe ninjas with swords, kung-fu experts, bomb nuts and the fact that a tank or even lightly armored vehicle is totally impervious to a zombie and I can’t see them lasting too long. Heck a mob armed with rocks only has to take out one zombie each to be effective as each zombie still needs to bite two before being stoned to death. You can see the numbers are clearly on our side. Even armed with sticks I can see humans winning this one.

Anyway that’s my thinking on this. What I find more scary is that there are people prepared for the zombie apocalypse and even worse is that quite a few of them would be hoping it will occur.

New searchcode Logo

Just a quick post to show off the new logo. I have been working on a new version of the site for a few weeks now and want to get something out there for people to look at. Design wise its not done yet, but the logo is.

Searchcode Logo

There it is in all its glory. The new design has a similar look and I should be able to start talking about and showing it off soon. Things I can mention are,

  • Moving away from PHP to Python (Django) for the web server.
  • Twitter Bootstrap used for the design.
  • Large amount of tests which should hopefully fix some some known issues.
  • A full writeup of the conversion to follow!

Lastly I am looking at creating a self hosted version of searchcode for people to download and use themselves. If you are interested in running a version let me know via a comment or better yet email me.

The worst program I ever worked on

The worst program I ever worked on was something I was asked to maintain once. It consisted of two parts. The first was a web application writen in ASP. The second portion was essentially Microsoft Reporting Services implemented in 80,000 lines of VB.NET.

The first thing I did was chuck it into VS2010 and run some code metrics on it. The results were, 10 or so Methods had 2000+ lines of code. The maintainability index was 0 (number between 0 and 100 where 0 is unmaintainable). The worst function had a cyclomatic complexity of 2700 (the worst I have ever seen on a function before was 750 odd). It was full of nested in-line dynamic SQL all of which referred to tables with 100+ columns, which had helpful names like sdf_324. There were about 5000 stored procedures of which most were 90% similar to other ones with a similar naming scheme. There were no foreign key constraints in the database. Every query including updates, inserts and deletes used NOLOCK (so no data integrity). It all lived in a single 80,000 line file, which crashed VS every time you tried to do a simple edit.

I essentially told my boss I would quit over it as there was no way I could support it without other aspects of work suffering. Thankfully it was put in the too hard basket and nobody else had to endure my pain. I ended up peer reviewing the changes the guy made some time later and a single column update touched in the order of 500 lines of code.

There was one interesting thing I found with it however, there was so much repeated/nested if code in methods you could hold down page down and it would look like the page was moving the other way, similar to how a wheel on TV looks like its spinning the other way.

Why you should never ask permission to clean up code

This is something that took me 2 years or so to learn. One day I realised nobody was really looking at my timecards in depth so I started allocating extra time to things and using the extra time to fix the things I thought needed fixing. Once I started delivering on this I showed my manager who agreed that it was a good use of time. I was given free reign to fix anything I felt would add maximum value, provided the bug fixes continued to be delivered without any major compromise.

Since that time I have re-factored quite a few code-bases; added unit tests, fixed some build processes, improved performance and generally feel happier at work for getting things done that are important to me.

Don’t get stuck in constant bug fix mode. If you cant get approval to fix things then change jobs because bug fix after bug fix is depressing and will bring you down.

Why is storing, tracking and managing billions of tiny files directly on a file system a nightmare?

Its a real pain when you want to inspect the files, delete or copy them.

Try taking 300,000 files and copy them somewhere. Then copy 1 file which has the size of the 300,000 combined. The single file is MUCH faster (its also why we usually do a tar operation before copying stuff if its already compressed). Any database that’s not a toy will usually lay the 300,000 records out in a single file (depending on settings, sizes and filesystem limits).

The 300,000 files end up sitting all over the drive and disk seeks kill you at run-time. This may not be true for a SSD but I don’t have any evidence to to suggest this or otherwise.

Even if the physical storage is fine with this I suspect you may run into filesystem issues when you lay out millions if not hundreds of millions of files over a directory and then hit it hard.

I have played with 1,000,000 files before when playing with crawling/indexing things and it becomes a real management pain. It may seem cleaner to lay each out as a singe file but in the long run if you hit a large size the benefits aren’t worth it.