Not so unique GUID

I have been doing a lot of work with the Sitecore CMS recently. Once of the things you quickly learn is how it relies on GUID’s for pretty much everything. This means of course when you start testing and need to supply GUID’s into your tests that you end up with lots of GUIDs that look like the following sprinkled through your code {11111111-1111-1111-1111-111111111111}

Today I remarked that we should be using things like “deadbeef” for the first part of the GUID with a colleague. He suggested that we should try and actually write something. With a little bit of 1337 speak this is actually possible. Naturally we got back to work, but with a little free time I quickly coded up a simple Python application to generate “phrased” GUID’s. Some examples follow,

silicles-oafs-blob-tael-declassified -> {5111c1e5-0af5-b10b-7ae1-dec1a551f1ed}
deedless-gait-soft-goes-eisteddfodic -> {deed1e55-9a17-50f7-90e5-e157eddf0d1c}
libelist-diel-alls-flit-disaffiliate -> {11be1157-d1e1-a115-f117-d15aff111a7e}
offstage-diel-labs-scat-classifiable -> {0ff57a9e-d1e1-1ab5-5ca7-c1a551f1ab1e}

None of the above are make much sense, but by looking at the outputs you can attempt to write something such as,

 cassette soft gold dice collectibles
{ca55e77e-50f7-901d-d1ce-c011ec71b1e5}

Very zen. Some rough back of napkin calculations gives my program something like 10,000,000,000,000 combinations of GUID’s based on the word list I supplied. I may just turn it into a online GUID generator like this one http://www.guidgenerator.com/

EDIT – You can now get these guids at https://searchcode.com/guid/

Implementing C# Linq Distinct on Custom Object List

Ever wanted to implement a distinct over a custom object list in C# before? You quickly discover that it fails to work. Sadly there is a lack of decent documentation about this and a lot of FUD. Since I lost a bit of time hopefully this blog post can be picked up as the answer.

Thankfully its not as difficult as you would image. Assuming you have a simple custom object which contains an Id, and you want to use that Id to get a distinct list all you need to do is add the following to the object.

public override bool Equals(object obj)
{
	return this.Id == ((CustomObject)obj).Id;
}

public override int GetHashCode()
{
	return this.Id.GetHashCode();
}

You need both due to the way that Linq works. I suspect under the hood its using a hash to work out whats the same hence GetHashCode.

Installing Phindex

This is a follow on piece to my 5 part series about writing a search engine from scratch in PHP which you can read at http://www.boyter.org/2013/01/code-for-a-search-engine-in-php-part-1/

I get a lot of email requests asking how to setup Phindex on a new machine and start indexing the web. Since the article and code was written aimed at someone with a degree of knowledge of PHP this is somewhat understandable. What follows is how to set things up and start crawling and indexing from scratch.

The first thing to do is setup some way of running PHP and serve pages. The easiest way to do this is install Apache and PHP. If you are doing this on Windows or OSX then go and install XAMPP https://www.apachefriends.org/index.html For Linux follow whatever guide applies to your distribution. Be sure to follow the directions correctly and verify that you can create a file with php_info(); inside it which runs in your browser correctly.

For this I am using Ubuntu Linux and all folder paths will reflect this.

With this setup what you need to do next is create a folder where we can place all of the code we are going to work with. I have created a folder called phindex which I have ensured that I can edit and write files inside.

Inside this folder we need to unpack the code from github https://github.com/boyter/Phindex/archive/master.zip

boyter@ubuntu:/var/www/phindex$ unzip master.zip
Archive:  master.zip
2824d5fa3e9c04db4a3700e60e8d90c477e2c8c8
   creating: Phindex-master/
.......
  inflating: Phindex-master/tests/singlefolderindex_test.php
boyter@ubuntu:/var/www/phindex$

At this point everything should be running, however as nothing is indexed you wont get any results if you browse to the search page. To resolve this without running the crawler download the following http://wausita.com/documents10000.tar.gz and unpack it to the crawler directory.

boyter@ubuntu:/var/www/phindex/Phindex-master/crawler$ tar zxvf documents10000.tar.gz
......
boyter@ubuntu:/var/www/phindex/Phindex-master/crawler$ ls
crawler.php  documents  documents10000.tar.gz  parse_quantcast.php
boyter@ubuntu:/var/www/phindex/Phindex-master/crawler$

The next step is to create two folders. The first is called “document” and the second “index”. These are where the processed documents will be stored and where the index will be stored. Once these are created we can run the indexer. The folders need to be created in the root folder like so.

boyter@ubuntu:/var/www/phindex/Phindex-master$ ls
add.php  crawler    index       README.md   tests
classes  documents  interfaces  search.php
boyter@ubuntu:/var/www/phindex/Phindex-master$

With that done, lets run the indexer. If you cannot run php from the command line, just browse to the php file using your browser and the index will be built.

boyter@ubuntu:/var/www/phindex/Phindex-master/$ php add.php
INDEXING 1
INDEXING 2
.....
INDEXING 10717
INDEXING 10718
Starting Index
boyter@ubuntu:/var/www/phindex/Phindex-master/$

This step is going to take a while depending on how fast the computer you are using is. Whats happening is that each of the crawled documents is processed, saved to the document store, and then finally each of the documents is indexed.

At this point everything is good. You should be able to perform a search by going to the like so,

Phindex Screenshot

At this point everything is working. I would suggest at this point you start looking at the code under the hood to see how it all works together. Start with add.php which gives a reasonable idea how to look at the crawled documents and how to index them. Then look at search.php to get an idea on how to use the created index. I will be expanding on this guide over time based on feedback but there should be enough here at this point for you to get started.

More interview snippets….

Since I wrote the code to these snippets I thought I may as well add them here in case I ever need them again or want to review them. As the other interview ones they are the answers to a question I was asked, slightly modified to protect the innocent. These ones are written in Python.

Q. Write a function to reverse each word in a string.

def reverse_each_word(words):
    '''
    Reverse each word in a string 
    '''
    return " ".join([x[::-1] for x in words.split(' ')])

The only thing of note in here is the x[::-1] which is extended slice syntax which reverses a string. You could also to reversed(x) although I believe at the time of writing it is MUCH slower.

Q. Given two arrays find which elements are not in the second.

def find_not_in_second(first, second): 
    '''
    Find which numbers are not in the
    second array
    '''
    return [x for x in first if x not in second]

I am especially proud of the second snippet as it is very easy to read and rather Pythonic. It takes in two lists such as [1,2,3] and [2,3,6] and returns a new list with the missing elements.

Another day another interview…

Another day another interview. I actually have been getting some good results from them so far. In particular the last two I have been on. I will discuss them briefly.

The first had an interesting coding test. Rather then asking me to solve Fizzbuzz or implement a depth first algorithm over a binary tree (seriously, I have been programming for 10 years and never needed to do that. I can, but its something I did in uni and not really applicable to anything I have done since then). It was to implement a simple REST service.

You created your service, hosted it online (heroku was suggested as its free) passed in the URL to a form, submitted and it hit your service looking for error codes and correct responses/output to input. Since you got to implement it in any language you want I went with Python/Django and produced the following code.

def parse_json(self, data):
	filtered = self.filter_drm(data['payload'])
	filtered = self.filter_episode_count(filtered)

	return self.format_return(filtered)

def filter_drm(self, data):
	if data is None or data == []:
		return []

	result = [x for x in data if 'drm' in x and x['drm'] == True]
	return result

def filter_episode_count(self, data, count=0):
	if data is None or data == []:
		return []

	result = [x for x in data if 'episodeCount' in x and x['episodeCount'] > count]
	return result

def format_return(self, data):
	if data is None or data == []:
		return {"response": []}

	result = [{	"image": x['image']['showImage'], 
				"slug": x['slug'],
				"title": x['title']} for x in data 
				if 'image' in x and 'slug' in x and 'title' in x]
	return {"response": result}

Essentially its the code from the model I created. It takes in some JSON data, filters it by the field DRM and Episode count, then returns a subset of the data in it. The corresponding view is very simple, with just some JSON parsing (with error checks) and then calling the above code. I did throw in quite a few unit tests though to ensure it was all working correctly.

Thankfully, after writing the logic, some basic testing (curl to fake a response) it all looked OK to me. I uploaded on heroku (never used it before and it took most of the time) and submitted the form. First go everything worked correctly passing all of the requirements listed which made me rather happy.

As for the second interview, it raised a good question which highlights the fact while I know how to write a closure and lambda I cannot actually say what they are. It also highlighted I really need to get better at Javascript since while I am pretty comfortable with it on the front end for backend processes such as node.js I am an absolute notice.

For the first, I was right about a lambda, which is just an anonymous function. As for the second part a closure is a function which closes over the environment allowing it to access variables not in its function list. An example would be,

def function1(h):
    def function2():
        return h
    return function2()

In the above function2 closes over function1 allowing it to access the the variables in function1’s environment such as h.

The other thing that threw me was implementing a SQL like join in a nice way. See the thing is I have been spoilt by C# which makes this very simple using LINQ. You literally join the two lists in the same way SQL would and it just works. Not only that the implementation is really easy to read.

I came up with the following which is ugly for two reasons,

1. its not very functional
2. it has very bad  O(N^2) runtime performance.

var csv1 = [
    {'name': 'one'},
    {'name': 'two'}
];

var csv2 = [
    {'name': 'one', 'address': '123 test street'},
    {'name': 'one', 'address': '456 other road'},
    {'name': 'two', 'address': '987 fake street'},
];

function joinem(csv1, csv2) {
    var ret = [];
    $.each(csv1, function(index, value) {
        $.each(csv2, function(index2, value2) {
            if(value.name == value2.name) {
                ret.push(value2);
            }
        });
    });

    return ret;
}

var res1 = joinem(csv1, csv2);

Assuming I get some more time later I want to come back to this. I am certain there is a nice way to do this in Javascript using underscore.js or something similar which is just as expressive as the LINQ version.

searchcode screenshot

Since I have been working on searchcode for a while and its getting close to being ready for release (a few weeks away at this point I predict) I thought I would post a teaser screenshot.

The below shows how it looks for a sample search. The design is far cleaner then what is currently online which is a big win as the current design of searchcode is seriously ugly.

searchcode teaster

 

I still have quite a way to go before this is ready to be released, but it is getting closer. Will continue to post updates as I get closer to the release date along with how I migrated from an old PHP codebase to a Python one.

Sample Coding Test

Being in the job market again I been doing quite a few tests. Since I have already put in the effort to a test without result I thought I would post it here.

The test involved producing output from a supplied CSV input file which contained insurance claims. Something about taking the input and using it to predict future claims. Please forgive my explanation as I am not a financial expert. Anyway the idea was to take an input such as the following,

Header
One, 1992, 1992, 110.0
One, 1992, 1993, 170.0
One, 1993, 1993, 200.0
Two, 1990, 1990, 45.2
Two, 1990, 1991, 64.8
Two, 1990, 1993, 37.0
Two, 1991, 1991, 50.0
Two, 1991, 1992, 75.0
Two, 1991, 1993, 25.0
Two, 1992, 1992, 55.0
Two, 1992, 1993, 85.0
Two, 1993, 1993, 100.0

into the following,

1990, 4
One, 0, 0, 0, 0, 0, 0, 0, 110, 280, 200
Two, 45.2, 110, 110, 147, 50, 125, 150, 55, 140, 100

The test was mostly about proving that you can write maintainable code which is unit testable and the like. Anyway here is my solution. It takes in a list of objects which represent each of the four columns of the input.

The feedback I received back was that the coverage I achieved was high (I had a collection of tests over the methods), the code clean and well documented.

public class TriangleCSVLine
{
    public string product { get; set; }
    public int originYear { get; set; }
    public int developmentYear { get; set; }
    public double incrementalValue { get; set; }
}

public List TranslateToOutput(List parsedCsv)
{
    var output = new List();

    // Sanity checks...
    if (parsedCsv == null || parsedCsv.Count == 0)
    {
        return output;
    }
    output.Add(GenerateHeader(parsedCsv));

    // Used to determine where we are looking
    var totalYears = parsedCsv.Select(x => x.developmentYear).Distinct().OrderBy(x => x);
    var minYear = totalYears.Min();
    var maxYear = totalYears.Max();

    foreach (var product in parsedCsv.Select(x => x.product).Distinct())
    {
        // All of the products values and the years it has
        var productValues = parsedCsv.Where(x => product.Equals(x.product));
        var originYears = Enumerable.Range(minYear, (maxYear - minYear) + 1);

        var values = new List();

        foreach (var year in originYears)
        {
            // For each of the development years for this "period"
            var developmentYears = parsedCsv.Where(x => x.originYear == year)
                                                .Select(x => x.developmentYear).Distinct();

            // If we have no development years
            // that means we have an origin year without a year 1 
            // development year. This means we have no idea how many values
            // of zero should be in the file, so lets bail
            // should probably go into a pre validation
            if (developmentYears.Count() == 0)
            {
                throw new MissingOriginDevelopmentTrangleCSVException(
                    string.Format("Missing development years for origin {0} in product {1}", year, product)
                );
            }

            // The values are running values...
            // so we keep the total and increment it as we go
            double runningTotal = 0;
            foreach (var rangeYear in Enumerable.Range(developmentYears.Min(), (developmentYears.Max() - developmentYears.Min()) + 1))
            {
                var value1 = productValues.Where(x => x.originYear == year && x.developmentYear == rangeYear).SingleOrDefault();
                if (value1 != null)
                {
                    runningTotal += value1.incrementalValue;
                }
                values.Add(runningTotal);
            }
                    
        }
        output.Add(string.Format("{0}, {1}", product, string.Join(", ", values)));
    }

    return output;
}

private string GenerateHeader(List parsedCsv)
{
    // Get distinct list of all the years
    var years = parsedCsv.Select(x => x.developmentYear).Distinct();

    // 1990-1990 counts as 1 year so add one
    var developmentYears = (years.Max() - years.Min()) + 1; 
    var header = string.Join(", ", years.Min(), developmentYears);

    return header;
}

Bitcoin Clones use Same Network?

Another comment I posted over on the TechZing Podcast. It was addressing Justin’s comment about bitcoin clones using the same “network” which is true, in that they share the same protocol but each have their own blockchain.

Each of the “bitcoin” clones are actually their own network. As far as I am aware they have no communication between each network in any form. Its also why each one’s blockchain is so different in size. Also the difference between bitcoin and litecoin (and its clones, such as dogecoin) is the proof of work algorithm they use to verify transactions. Bitcoin uses SHA256 (hence you are seeing lots of ASIC devices) whereas litecoin uses Scrypt, which is more ASIC resistant (although ASIC is starting to come out for them as well).

Most of the coins fall into those two groups, either SHA256 or Scrypt. Two coins that I know of that are slightly different are Primecoin and Vertcoin. Primecoin calculates primes as its proof of work algorithm, so its output is vaguely useful to anyone studying prime numbers. Its also the only coin that I am aware of that can only be mined by CPU. This makes it popular to run on botnets and spot instances in the cloud as you don’t need a GPU. Vertcoin by difference uses Scrypt, but a modified version which is supposed to be very resistant to ASIC mining, presumably by using even more memory then Scrypt.

I think both of you would be wise to actually have a look at dogecoin. The community has gotten more traction then litecoin has in 2 months and is catching up to bitcoin at a staggering rate. Once you get past the meme (which makes it easier to get into I guess?) there is a lot to like and its certainly gaining a lot of a adoption. Lastly its about to have its first block rate halving soon, so now is probably a good chance to pick some up before the price doubles again.

It sounds crazy, but the price is going nuts right now. Its the 3rd highest martketcap coin now and the reward is going to drop in 3 days so expect it to go up again.

http://tuxedage.wordpress.com/2014/02/06/a-serious-analysis-of-dogecoin-or-why-I-am-all–in-on-dogecoin/

I highly suggest reading the above. I don’t agree with it all but mostly it seems right to me. Dogecoin has the potential to be the new litecoin and possibly the new bitcoin. Especially with all of the activity taking place.

Be sure to have a look at http://reddit.com/r/dogecoin/ as well. The community is VERY active, enthusiastic and generous. They are spending the coins making doge more of a currency and less a value store.

Python pep8 git commit check

Without adding a git commit hook I wanted to be able to check if my Python code conformed to pep8 standards before committing anything. Since I found the command reasonably useful I thought I would post it here.

git status -s -u | grep '\.py$' | awk '{split($0,a," "); print a[2]}' | xargs pep8

Just run the above in your projects directory. It’s fairly simple but quite effective at ensuring your Python code becomes cleaner ever time you commit to the repository. The nice thing about it is that it only checks files you have modified, allowing you to slowly clean up existing code bases.

Regarding the Zombie Apocalypse

This piece of content is taken from a comment I left on the TechZing podcast blog. I should note I have not even begun to explore issues such as what happens to a zombie in extreme heat or cold. Of course much of the below can be disregarded if the zombie virus is airborne, but this assumes the standard zombie canon of being spread through bites.

My take on the zombie apocalypse was always that it could never happen. The reasons being,

1. The zombies primary enemy is also its main food source. This is like having to tackle a Lion every time you feel like eating a sandwich. You are going to get mauled.

2. The zombies only method of reproducing is also biting its primary enemy. Again, every time you feel randy go tackle a Lion which has the intent to maul you. Keep in mind in order to be effective each zombie needs to bite at least 2 humans, which leads us nicely to…

3. Humans are bloody good at killing things. This includes a great number of creatures which have far more effective killing implements then we were given by nature (Lions, Tigers, Bears, oh my!) I don’t know about you, but I am pretty sure I could take out 20 zombies in a car without too many issues. Quite a few people have cars. Certainly more then 1 in 20 people in a first world country do. Even if they only take out 2 zombies each we are ahead.

Add in all the gun nuts looking for something to shoot, people with medieval suits of armor (bite that zombie!), wannabe ninjas with swords, kung-fu experts, bomb nuts and the fact that a tank or even lightly armored vehicle is totally impervious to a zombie and I can’t see them lasting too long. Heck a mob armed with rocks only has to take out one zombie each to be effective as each zombie still needs to bite two before being stoned to death. You can see the numbers are clearly on our side. Even armed with sticks I can see humans winning this one.

Anyway that’s my thinking on this. What I find more scary is that there are people prepared for the zombie apocalypse and even worse is that quite a few of them would be hoping it will occur.