Sample Coding Test

Being in the job market again I been doing quite a few tests. Since I have already put in the effort to a test without result I thought I would post it here.

The test involved producing output from a supplied CSV input file which contained insurance claims. Something about taking the input and using it to predict future claims. Please forgive my explanation as I am not a financial expert. Anyway the idea was to take an input such as the following,

Header
One, 1992, 1992, 110.0
One, 1992, 1993, 170.0
One, 1993, 1993, 200.0
Two, 1990, 1990, 45.2
Two, 1990, 1991, 64.8
Two, 1990, 1993, 37.0
Two, 1991, 1991, 50.0
Two, 1991, 1992, 75.0
Two, 1991, 1993, 25.0
Two, 1992, 1992, 55.0
Two, 1992, 1993, 85.0
Two, 1993, 1993, 100.0

into the following,

1990, 4
One, 0, 0, 0, 0, 0, 0, 0, 110, 280, 200
Two, 45.2, 110, 110, 147, 50, 125, 150, 55, 140, 100

The test was mostly about proving that you can write maintainable code which is unit testable and the like. Anyway here is my solution. It takes in a list of objects which represent each of the four columns of the input.

The feedback I received back was that the coverage I achieved was high (I had a collection of tests over the methods), the code clean and well documented.

public class TriangleCSVLine
{
    public string product { get; set; }
    public int originYear { get; set; }
    public int developmentYear { get; set; }
    public double incrementalValue { get; set; }
}

public List TranslateToOutput(List parsedCsv)
{
    var output = new List();

    // Sanity checks...
    if (parsedCsv == null || parsedCsv.Count == 0)
    {
        return output;
    }
    output.Add(GenerateHeader(parsedCsv));

    // Used to determine where we are looking
    var totalYears = parsedCsv.Select(x => x.developmentYear).Distinct().OrderBy(x => x);
    var minYear = totalYears.Min();
    var maxYear = totalYears.Max();

    foreach (var product in parsedCsv.Select(x => x.product).Distinct())
    {
        // All of the products values and the years it has
        var productValues = parsedCsv.Where(x => product.Equals(x.product));
        var originYears = Enumerable.Range(minYear, (maxYear - minYear) + 1);

        var values = new List();

        foreach (var year in originYears)
        {
            // For each of the development years for this "period"
            var developmentYears = parsedCsv.Where(x => x.originYear == year)
                                                .Select(x => x.developmentYear).Distinct();

            // If we have no development years
            // that means we have an origin year without a year 1 
            // development year. This means we have no idea how many values
            // of zero should be in the file, so lets bail
            // should probably go into a pre validation
            if (developmentYears.Count() == 0)
            {
                throw new MissingOriginDevelopmentTrangleCSVException(
                    string.Format("Missing development years for origin {0} in product {1}", year, product)
                );
            }

            // The values are running values...
            // so we keep the total and increment it as we go
            double runningTotal = 0;
            foreach (var rangeYear in Enumerable.Range(developmentYears.Min(), (developmentYears.Max() - developmentYears.Min()) + 1))
            {
                var value1 = productValues.Where(x => x.originYear == year && x.developmentYear == rangeYear).SingleOrDefault();
                if (value1 != null)
                {
                    runningTotal += value1.incrementalValue;
                }
                values.Add(runningTotal);
            }
                    
        }
        output.Add(string.Format("{0}, {1}", product, string.Join(", ", values)));
    }

    return output;
}

private string GenerateHeader(List parsedCsv)
{
    // Get distinct list of all the years
    var years = parsedCsv.Select(x => x.developmentYear).Distinct();

    // 1990-1990 counts as 1 year so add one
    var developmentYears = (years.Max() - years.Min()) + 1; 
    var header = string.Join(", ", years.Min(), developmentYears);

    return header;
}

Bitcoin Clones use Same Network?

Another comment I posted over on the TechZing Podcast. It was addressing Justin’s comment about bitcoin clones using the same “network” which is true, in that they share the same protocol but each have their own blockchain.

Each of the “bitcoin” clones are actually their own network. As far as I am aware they have no communication between each network in any form. Its also why each one’s blockchain is so different in size. Also the difference between bitcoin and litecoin (and its clones, such as dogecoin) is the proof of work algorithm they use to verify transactions. Bitcoin uses SHA256 (hence you are seeing lots of ASIC devices) whereas litecoin uses Scrypt, which is more ASIC resistant (although ASIC is starting to come out for them as well).

Most of the coins fall into those two groups, either SHA256 or Scrypt. Two coins that I know of that are slightly different are Primecoin and Vertcoin. Primecoin calculates primes as its proof of work algorithm, so its output is vaguely useful to anyone studying prime numbers. Its also the only coin that I am aware of that can only be mined by CPU. This makes it popular to run on botnets and spot instances in the cloud as you don’t need a GPU. Vertcoin by difference uses Scrypt, but a modified version which is supposed to be very resistant to ASIC mining, presumably by using even more memory then Scrypt.

I think both of you would be wise to actually have a look at dogecoin. The community has gotten more traction then litecoin has in 2 months and is catching up to bitcoin at a staggering rate. Once you get past the meme (which makes it easier to get into I guess?) there is a lot to like and its certainly gaining a lot of a adoption. Lastly its about to have its first block rate halving soon, so now is probably a good chance to pick some up before the price doubles again.

It sounds crazy, but the price is going nuts right now. Its the 3rd highest martketcap coin now and the reward is going to drop in 3 days so expect it to go up again.

http://tuxedage.wordpress.com/2014/02/06/a-serious-analysis-of-dogecoin-or-why-I-am-all–in-on-dogecoin/

I highly suggest reading the above. I don’t agree with it all but mostly it seems right to me. Dogecoin has the potential to be the new litecoin and possibly the new bitcoin. Especially with all of the activity taking place.

Be sure to have a look at http://reddit.com/r/dogecoin/ as well. The community is VERY active, enthusiastic and generous. They are spending the coins making doge more of a currency and less a value store.

Python pep8 git commit check

Without adding a git commit hook I wanted to be able to check if my Python code conformed to pep8 standards before committing anything. Since I found the command reasonably useful I thought I would post it here.

git status -s -u | grep '\.py$' | awk '{split($0,a," "); print a[2]}' | xargs pep8

Just run the above in your projects directory. It’s fairly simple but quite effective at ensuring your Python code becomes cleaner ever time you commit to the repository. The nice thing about it is that it only checks files you have modified, allowing you to slowly clean up existing code bases.

Regarding the Zombie Apocalypse

This piece of content is taken from a comment I left on the TechZing podcast blog. I should note I have not even begun to explore issues such as what happens to a zombie in extreme heat or cold. Of course much of the below can be disregarded if the zombie virus is airborne, but this assumes the standard zombie canon of being spread through bites.

My take on the zombie apocalypse was always that it could never happen. The reasons being,

1. The zombies primary enemy is also its main food source. This is like having to tackle a Lion every time you feel like eating a sandwich. You are going to get mauled.

2. The zombies only method of reproducing is also biting its primary enemy. Again, every time you feel randy go tackle a Lion which has the intent to maul you. Keep in mind in order to be effective each zombie needs to bite at least 2 humans, which leads us nicely to…

3. Humans are bloody good at killing things. This includes a great number of creatures which have far more effective killing implements then we were given by nature (Lions, Tigers, Bears, oh my!) I don’t know about you, but I am pretty sure I could take out 20 zombies in a car without too many issues. Quite a few people have cars. Certainly more then 1 in 20 people in a first world country do. Even if they only take out 2 zombies each we are ahead.

Add in all the gun nuts looking for something to shoot, people with medieval suits of armor (bite that zombie!), wannabe ninjas with swords, kung-fu experts, bomb nuts and the fact that a tank or even lightly armored vehicle is totally impervious to a zombie and I can’t see them lasting too long. Heck a mob armed with rocks only has to take out one zombie each to be effective as each zombie still needs to bite two before being stoned to death. You can see the numbers are clearly on our side. Even armed with sticks I can see humans winning this one.

Anyway that’s my thinking on this. What I find more scary is that there are people prepared for the zombie apocalypse and even worse is that quite a few of them would be hoping it will occur.

New searchcode Logo

Just a quick post to show off the new searchcode.com logo. I have been working on a new version of the site for a few weeks now and want to get something out there for people to look at. Design wise its not done yet, but the logo is.


Searchcode Logo

There it is in all its glory. The new design has a similar look and I should be able to start talking about and showing it off soon. Things I can mention are,

  • Moving away from PHP to Python (Django) for the web server.
  • Twitter Bootstrap used for the design.
  • Large amount of tests which should hopefully fix some some known issues.
  • A full writeup of the conversion to follow!

Lastly I am looking at creating a self hosted version of searchcode for people to download and use themselves. If you are interested in running a version let me know via a comment or better yet email me.

The worst program I ever worked on

The worst program I ever worked on was something I was asked to maintain once. It consisted of two parts. The first was a web application writen in ASP. The second portion was essentially Microsoft Reporting Services implemented in 80,000 lines of VB.NET.

The first thing I did was chuck it into VS2010 and run some code metrics on it. The results were, 10 or so Methods had 2000+ lines of code. The maintainability index was 0 (number between 0 and 100 where 0 is unmaintainable). The worst function had a cyclomatic complexity of 2700 (the worst I have ever seen on a function before was 750 odd). It was full of nested in-line dynamic SQL all of which referred to tables with 100+ columns, which had helpful names like sdf_324. There were about 5000 stored procedures of which most were 90% similar to other ones with a similar naming scheme. There were no foreign key constraints in the database. Every query including updates, inserts and deletes used NOLOCK (so no data integrity). It all lived in a single 80,000 line file, which crashed VS every time you tried to do a simple edit.

I essentially told my boss I would quit over it as there was no way I could support it without other aspects of work suffering. Thankfully it was put in the too hard basket and nobody else had to endure my pain. I ended up peer reviewing the changes the guy made some time later and a single column update touched in the order of 500 lines of code.

There was one interesting thing I found with it however, there was so much repeated/nested if code in methods you could hold down page down and it would look like the page was moving the other way, similar to how a wheel on TV looks like its spinning the other way.

Why you should never ask permission to clean up code

This is something that took me 2 years or so to learn. One day I realised nobody was really looking at my timecards in depth so I started allocating extra time to things and using the extra time to fix the things I thought needed fixing. Once I started delivering on this I showed my manager who agreed that it was a good use of time. I was given free reign to fix anything I felt would add maximum value, provided the bug fixes continued to be delivered without any major compromise.

Since that time I have re-factored quite a few code-bases; added unit tests, fixed some build processes, improved performance and generally feel happier at work for getting things done that are important to me.

Don’t get stuck in constant bug fix mode. If you cant get approval to fix things then change jobs because bug fix after bug fix is depressing and will bring you down.

Why is storing, tracking and managing billions of tiny files directly on a file system a nightmare?

Its a real pain when you want to inspect the files, delete or copy them.

Try taking 300,000 files and copy them somewhere. Then copy 1 file which has the size of the 300,000 combined. The single file is MUCH faster (its also why we usually do a tar operation before copying stuff if its already compressed). Any database that’s not a toy will usually lay the 300,000 records out in a single file (depending on settings, sizes and filesystem limits).

The 300,000 files end up sitting all over the drive and disk seeks kill you at run-time. This may not be true for a SSD but I don’t have any evidence to to suggest this or otherwise.

Even if the physical storage is fine with this I suspect you may run into filesystem issues when you lay out millions if not hundreds of millions of files over a directory and then hit it hard.

I have played with 1,000,000 files before when playing with crawling/indexing things and it becomes a real management pain. It may seem cleaner to lay each out as a singe file but in the long run if you hit a large size the benefits aren’t worth it.

Counter-counter argument TDD

The following is taken from my response to a Hacker News comment. The comment follows (quoted) and my response below.

“I will start doing TDD when,

1. It is faster than developing without it.
2. It doesn’t result in a ton of brittle tests that can’t survive an upgrade or massive change in the API that is already enough trouble to manage on the implementation-side- even though there may be no functional changes!

Unit tests that test trivial methods are evil because the LOC count goes up”

1. It can be. For something like a standard C# MVC application (Im working on one now) the time taken to spin up Casini or deploy to IIS is far greater then running tests. For something like PHP where you are just hitting F5 and TDD can slow you down. As with most things it depends.

2. If you are writing brittle tests you are doing it wrong.

Increasing LOC (lines of code) isn’t always a bad thing. If those increased LOC improve quality then I consider it a worthwhile. Yes it can be more maintenance, but we know the cost of catching bugs in development is much cheaper then in production.

Mocking isn’t as bad as its been made out to be. Yes you can overmock things (a design anti-pattern), but that should be a sign of code smell and you should be re-factoring to make it simpler. If you cant re-factor and you cant easily mock then consider if you really need to test it. In my experience things that are hard to mock and cannot be re-factored usually shouldn’t be tested.

Exception being legacy code, but we are talking about TDD here which usually means greenfield development or else it would have tests already.

Unit testing does NOT promote 100% coverage. People using unit tests as a measure promote this. Sometimes its worth achieving, and sometimes its not. Use common sense when picking a unit test coverage metric. I have written applications with close to 100% coverage such as web-services and been thankful for it when something breaks and I needed to fix it. I have also written applications with no more then 20% over the critical methods (simple CRUD screens). Use common sense, testing simple getters and setters is probably a waste of time so don’t do it.

Unit testing isn’t all about writing tests. Its also about enforcing good design. Code that’s easily testable is usually good code. You don’t have to have tests to have testable code, but if you are going to that effort anyway why not add where they can add value and provide you with a nice safety harness?

Most of the issues with unit tests come with people preaching that they are a silver bullet. For specific cases they can provide great value and increase development speed. Personally I will continue to write unit tests, but only where my experience leads me to believe they will provide value.

Can anyone explain how this regex [- ~] matches ASCII characters?

Since I am pulling most of my content from other sites such as Mahalo and Quora I thought I would pull back some of my more interesting HN comments.

Can anyone explain how this regex [- ~] matches ASCII characters ?

It’s pretty simple. Assuming you know regex… Im going to assume you don’t since you are asking.

The bracket expression [ ] defines single characters to match, however you can have more then 1 character inside which all will match.

[a] matches a
[ab] matches either a or b
[abc] matches either a or b or c
[a-c] matches either a or b or c.

The – allows us to define the range. You can just as easily use [abc] but for long sequences such as [a-z] consider it short hand.

In this case [ -~] it means every character between <space> and <tilde>, which just happens to be all the ASCII printable characters (see chart in the article). The only bit you need to keep in mind is that <space> is a character as well, and hence you can match on it.
You could rewrite the regex like so (note I haven’t escaped or anything in this so its probably not valid)

[ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~]

but that’s not quite as clever or neat.