Setup up ConcourseCI 2.6.0 behind Nginx with Self Signed Certificates on Ubuntu 16.04

Concourse CI is a very nice continuous integration server.

However for installs there are a few gotcha’s you need to keep in mind. Mostly these relate to how TLS/SSL works.

The first is that while it is possible to run concourse inside Docker I found this to cause a lot of issues with workers dying and not recovering. I would suggest installing the binarys on bare machines. When I moved from a docker cluser using Amazon’s ECS to a single t2.large instance not only were builds faster but it was a far more reliable solution.

I am also not going to automate this install, and will leave it as an excercise for you the reader to do this yourself. I would suggest using Python Fabric, or something like Puppet, Ansible or Saltstack to achive this.

Also keep in mind that with this install everything is running on a single instance. If you have need to scale out this is not going to work, but as a way to get started quickly it works pretty well.

Prerequisites are that you have a Ubuntu instance running somewhere. If you want to run the fly execute command you will also need a valid domain name to point at your machine. This is an annoying thing caused by GoLang when using SSL certs. Turns out you cannot set a hostfile entry and use it as such. You can in a insecure non SSL mode but otherwise cannot.

If you are using a virual machine from DigitalOcean/AWS/Vultr or other you will need to add some swap space. I noticed a lot of issues where this was missing. You can do so by running the following commands which will configure your server to have 4G of swap space,

sudo fallocate -l 4G /swapfile
 sudo chmod 600 /swapfile
 sudo mkswap /swapfile
 sudo swapon /swapfile
 sudo echo "/swapfile none swap sw 0 0" >> /etc/fstab

We will need to get the concourse binary, and to make it executable. For convenience and to match the concourse documentation lets also rename it to concourse. To do so run the following command.

wget && mv concourse_linux_amd64 concourse && chmod +x concourse

We now need to generate the keys that concourse requires.

mkdir keys
 cd keys

ssh-keygen -t rsa -f tsa_host_key -N '' && ssh-keygen -t rsa -f worker_key -N '' && ssh-keygen -t rsa -f session_signing_key -N '' && cp authorized_worker_keys
 cd ..

The above commands will create a directory called keys and setup all of the keys that concourse 2.6.0 requires.

We can now create some helper scripts which we can use to run concourse easily.


./concourse web \
 --basic-auth-username main \
 --basic-auth-password MySuperPassword \
 --session-signing-key ./keys/session_signing_key \
 --tsa-host-key ./keys/tsa_host_key \
 --tsa-authorized-keys ./keys/authorized_worker_keys \
 --external-url https://YOURDNSHERE/ \
 --postgres-data-source postgres://concourse:concourse@

chmod +x

This script will start running concourse. Keep in mind that the username and password used here are for the main group and as such you should protect them as they have the ability to create additional groups on your concourse instance.


./concourse worker \
 --work-dir /opt/concourse/worker \
 --tsa-host \
 --tsa-public-key ./keys/ \
 --tsa-worker-private-key ./keys/worker_key

chmod +x

This script will spin up a worker which will communicate with the main concourse instance and do all the building. It can be useful to lower the priority of this command using nice and ionice if you are running on a single core machine.

Now we need to install all of the postgresql packages required,

apt-get update && apt-get install -y postgresql postgresql-contrib

Once this is done we can create the database to be used

sudo -u postgres createdb concourse

Then login to postgresql and create a user to connect to the database

sudo -u postgres psql
 CREATE USER concourse WITH PASSWORD 'concourse'
 GRANT ALL PRIVILEGES ON DATABASE "concourse" to concourse

We also need to need to edit the pg_hba file allowing us to make the connection,

sudo pico /etc/postgresql/9.5/main/pg_hba.conf

Scroll down and look for the following line,

host all all md5

and change the md5 on the end to trust

host all all trust

Then save the file and restart postgresql

service postgresql restart

At this point everything we need to run concourse should be there. You will need to setup the concourse scripts we created earlier to run as a service, or just run them in a screen session if you are in a hurry.

What we want to do now is expose it to the big bad internet.

apt-get install nginx

Create a directory using either the domain name you want to use, a desired name or anything if you are going to connect to things using IP addresses.

mkdir -p /etc/nginx/ssl/

Nowe we want to swtich to the directory and setup the self signed TLS/SSL keys.

cd /etc/nginx/ssl/
 openssl genrsa -des3 -out server.key 1024

Enter whatever you want for the passphrase but remember it!

openssl req -new -key server.key -out server.csr

Enter the passhrase entered. The most important thing here is that when asked for the Common Name or FQDN you need to enter in the desired domain name.

With that done we need to sign the key.

cp server.key
 openssl rsa -in -out server.key

Remember to enter the same pass phrase as before. Finally sign they key with an expiry of 9999 days.

openssl x509 -req -days 9999 -in server.csr -signkey server.key -out server.crt

Make a copy of the file server.crt which will be needed for the concourse fly tool to talk to the server if you are using self signed certs.

With that done lets enable the site,

sudo nano /etc/nginx/sites-available/

And enter in the following details,

upstream concourse_app_server {
 server localhost:8080;

server {
 listen 80 default_server;
 rewrite ^ https://MYIPADDRESSORDOAMIN$request_uri? permanent;

server {
 listen 443 default_server ssl http2;

ssl on;
 ssl_certificate /etc/nginx/ssl/;
 ssl_certificate_key /etc/nginx/ssl/;

location / {
 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 proxy_set_header Host $http_host;
 proxy_redirect off;
 proxy_pass http://concourse_app_server;

The above nginx config defines an upstream concourse server running on port 8080. It then defines a server listening on port 80 that redirects all traffic to the same server on HTTPS. The last server config defines out site, sets up the keys and forwards everything to the upstream concourse server.

We can now enable it,

sudo ln -s /etc/nginx/sites-available/ /etc/nginx/sites-enabled/
 rm -f /etc/nginx/sites-available/default

service nginx restart

We need to remove the default file above because nginx will not allow you to have two default servers.
At this point everything should be working. You should now be able to connect to your concourse server like so,

fly --target myciserver login --team-name myteamname --concourse-url https://MYIPADDRESSORDOAMIN --ca-cert ca.crt

Where the file ca.crt exists whereever you are running fly. Everything at this point should work and you can browse to your concourse server.

If you are using the IP address to communicate to your concourse server you have the limitation that you will not be able to run fly execute to upload a task. You can get around this by using a real domain, or running your own DNS to resolve it correctly.

The only bit of homework at this point would be to configure the firewall to disable access to port 8080 so that everything must go though nginx. Enjoy!

Repository overview now in searchcode server

One feature that I have wanted for a long time in searchcode server was a page which would give an overview of a repository. I wanted the overview to give a very high look at the languages used, the total number of files, estimated cost and who would be the best people to talk to.

One thing that occurred to me when I started work was that it would be nice to calculate a bus factor for the repository as well. After all we all know that project managers do like to know who are the most critical contributors to any project and what the risk is.

Below is the what has been added into searchcode server.

The most interesting part of the above in my opinion is the overview blurb. It attempts to summarise all of the figures below and let anyone know in plain english where the repository stands.

An example for searchcode server’s code itself,

“In this repository 2 committers have contributed to 228 files. The most important languages in this repository are Java, CSS and Freemarker Template. The project has a low bus factor of 2 and will be in trouble if Ben Boyter is hit by a bus. The average person who commits this project has ownership of 50% of files. The project relies on the following people; Ben Boyter, =.”

Certainly there is room for improvement on this page and I am hoping to add what I am calling signals logic to it. This would involve scanning the code to determine what languages, features and libraries are being used and add those to the report. The end goal would be to find for instance C# code using MySQL and ReactJS.

The last bit of news is that I am moving over to the same codebase. This should improve things in a few ways. The first bring the improved performance when moving from Python to Java. It should also mean that I can focus on a single codebase.

Anyway you will be able to get the new repository overview in the next release of searchcode server 1.3.6 which will be released before the end of January 2017.

Sphinx Real Time Index How to Distribute and Hidden Gotcha

I have been working on real time indexes with Sphinx recently for the next version of and ran into a few things that were either difficult to search for or just not covered anywhere.

The first is how to implement a distributed search using real time indexes. It’s actually done the same way you would normally create an index. Say you had a single server with 4 index shards on it and you wanted to run queries against it. You could use the following,

index rt
    type = distributed
    local = rt1
    agent = localhost:9312:rt2
    agent = localhost:9312:rt3
    agent = localhost:9312:rt4

You would need to have each one of your indexes defined (only one is added here to keep the example short)

index rt1
    type = rt1
    path = /usr/local/sphinx/data/rt1
    rt_field = title
    rt_field = content
    rt_attr_uint = gid

Using the above you would be able to search across all of the shards. The trick is knowing that to update you need to update each shard yourself. You cannot pass documents to the distributed index but instead must make a separate update to each shard. Usually I split sphinx shards based on a query like the following,

SELECT cast((select id from table order by 1 desc limit 1)/4 as UNSIGNED)*2, \
         cast((select id from table order by 1 desc limit 1)/4 as UNSIGNED)*3 \
         FROM table limit 1

Where the 4 is the number of shards and the multiplier splits the shards out. It’s performant due to index use. However for RT I suggest a simple modulas operator % against the ID column for each shard as it allows you to continue to scale out to each shard equally.

The second issue I ran into was that when defining the attributes and fields you must define all the fields before the uints. The above examples work fine but the below is incorrect. I couldn’t find this mentioned in the documentation.

index rt
    type = rt
    path = /usr/local/sphinx/data/rt
    rt_attr_uint = gid # this should be below the rt_fields
    rt_field = title
    rt_field = content

Explaining VarnishHist – What Does it Tell Us

The varnishhist tool is one of the most underused varnish tools that come with your standard varnish install. Probably because of how it appears at first glance.

In short, you want as many | symbols as possible and you want everything far toward the left hand side. The closer to the left the faster the responses are regardless if they are cached or not. The more | symbols then more items were served from cache.

A small guide,

'|' is cache HIT
'#' is cache MISS
'n:m' numbers in left top corner is vertical scale
'n = 2000' is number of requests that are being displayed (from 1 to 2000)

The X-axis is logarithmic time between request request from kernel to Varnish and response from Varnish to kernel.

The times on the X-axis are as such,

1e1 = 10 sec
1e0 = 1 sec
1e-1 = 0.1 secs or 100 ms (milliseconds)
1e-2 = 0.01 secs or 10 ms
1e-3 = 0.001 secs or 1 ms or 1000 µs (microseconds)
1e-4 = 0.0001 secs or 0.1 ms or 100 µs
1e-5 = 0.00001 secs or 0.01 ms or 10 µs
1e-6 = 0.000001 secs or 0.001 ms or 1 µs or 1000 ns (nanoseconds)

Below is the varnishhist for showing that while most responses are served in about 100ms not many are cached. This can mean one of a few things.

  • The responses are not cache-able and you need to adjust the back-end responses to have the correct headers (or override the settings with VCL config).
  • The cache timeout for the back-end responses isn’t high enough to ensure that later requests are served from cache.
  • There isn’t a large enough cache to hold all the responses (that’s the problem in this case).
1:20, n = 2000

            |                    ####
            |                    ####
            |                    ####
            |                    #####
            |                    #####
            |                    #####
            |                   #######
            |                   #######
            ||  |    #      #   ##########
|1e-6  |1e-5  |1e-4  |1e-3  |1e-2  |1e-1  |1e0   |1e1   |1e2

MySQL Dump Without Impacting Queries

Posted more for my personal use (I have to look it up every time) but here is how to run a mysqldump without impacting performance on the box. It sets the ionice and nice values to be as low as possible (but still run) and uses a single transaction and ups the max packet size for MySQL.

ionice -c2 -n7 nice -n19 mysqldump -u root -p DATABASE --single-transaction --max_allowed_packet=512M > FILENAME

To all Companies Currently Recruiting

I am writing this on behalf of all developers/engineers out there. Please stop with the take home coding challenge questions. Really. Just stop it. They are a lazy and frankly an unprofessional way of sorting the wheat from the chaff. Before closing your browser in disgust hear me out on this one and hopefully I can convince you of the error of your ways.

There has become an alarming trend these days of companies during the hiring process to issue lengthy coding challenges in order to prove that the individual they are hiring knows their stuff. I totally understand why you might be doing this but frankly its flawed. Lets go through several reasons why.

1. It shows that you have a lack of regard for the individuals time. This is a serious red flag. After all if you fail to respect my time why should I respect yours? Consider it this way, for every hour of time that your test takes you are taking away 1 hour of that persons family/rest/sleep/eating/hobby time. What is your commitment in this situation? If you aren’t willing to invest the time in the hiring process then why should we? Sure you need to review the code (and hopefully give feedback) but that usually takes less than 15 minutes (yes I have done this and you can do it very quickly). The time investment balance is horribly skewed in your favour, which starts the relationship off to a bad start. Lets also consider those who do consulting or contracting outside. If you ask them to do a 3 hour coding challenge then you are asking them to forgo somewhere in the region of $300-700. Even worse is that usually at the end of these tests all you get back is “Pass” or “Fail”. Even when you do give some feedback there is no way it pays back the amount of time that went in.

2. Long tests are irrelevant anyway. When you set someone up to solve a problem that takes several hours you are setting them up to fail. The test you set is usually set up such that you are looking for a specific answer and only that answer. It might be obvious to you that using the strategy pattern is the best way to solve your test, but what about those who’s development backgrounds require business logic rules to be user editable and hence live in a database where it can be modified? You are also going to nitpick over every little detail in a classic case of bike shedding. “They didn’t clean up some spaces!” “They didn’t comment this method!” “They did comment this method!” Even aware of this problem I still find myself falling into the trap of nitpicking over the small stuff that usually doesn’t matter. You are going to be investing time in training them how you want things anyway so why expect them to know it without even working with you?

3. It’s going to drive away talented individuals. I am not counting myself in this category but I know of many talented dev’s who simply refuse to do any sort of test. Why? They have a huge online portfolios of work they have previously done. Why do you even ask for our Github/Gitlab profiles if not to look at them? Surely 5 minutes of looking though would let you know if this person is a pretender or knows their stuff.

4. The hiring process is skewed towards the hirer. People looking for work are usually in one of two situations. Either they have a role and are looking around or they don’t have one. Lets consider the risk for both groups.

For those with a job they will need to resign from their current position to take an opportunity with you. There is usually a probation period following the hiring that either party can say “No this isn’t for me lets agree to be friends”. The hiree is the one taking on the largest risk in this situation. They have already left what possibly was a stable situation for an unstable situation. The company takes on the risk of potentially investing in someone and they leave. Realistically though the company is in the position to hire so renumeration is the least of their problems. For the hiree though the potential is that they join and after a month get the boot. They are now without income and back to looking for a job, only now they are in the latter category.

For those without a job, they have the situation that they might find employment and in doing so passing on other opportunities only to get the boot after a few months. Once again its skewed towards the company.

5. They are easily gamed. A quick check on freelancer or other such sites shows that I can probably buy a solution to any coding challenge you have for less than $300. For a job with a steady pay-check even if its only for a few months (before you find out I can’t code a damn) this is a no brainer especially if I know my coding chops aren’t up to standard. But why pay at all? A lot of the solutions are posted on online forums and Github already. Heck just ask your best mate who happens to know their stuff to do it for you. Whats the cost to the organisation here? Well assuming you do hire me based on my fraudulent test I could lurk in the company for months, either contributing nothing or perhaps causing all sorts of damage. At the very least you have paid several months salary.

A-hah you say! I can defeat what you have written above! I will just have another test during the interview process I can hear you thinking. DING DING DING WINNER! However, if you are going to do that anyway why not drop the initial challenge? After all what are you gaining?

Here’s what I propose. Bring the person you want to interview in and have them actually code on a machine in front of you. Work through a few simple problems together. It dons’t need to be complex. Ask them to write a function that reverses all the words in a string but not the string itself. Ask them how to find elements that existing in list A that are not in B. Ask them why they implemented certain patterns, how did they decide on data types, why did they comment/not comment a method. Let them know in advance that this is what is going to happen and what will be expected. You will learn far more in this short investment of time then with any coding challenge. If you want to get more in depth then why not offer to pay them for a single day to come in and actually work. The monetary cost will be less than making the wrong hire and you both get to decide if things are going to work out.

The best interview process I have been with to date was actually the Microsoft one for a intern role many many moons ago. It involved several hours of interviews with different individuals discussing different technology roles to try and ascertain the best fit.

When I walked out (I didn’t get the role BTW) I felt like a better person. Not only were the discussions interesting, I learnt a lot from those conducting them and I felt like my time was valued. They even offered to cover my travel expenses which made me feel like they cared about my time investment. This is how the process is meant to work. Its as much about the person being interviewed as about the company. Consider it an investment in your advertising budget if you are tracking the time investment (yes it is an investment!) as a cost. Good interviews stick with people a long time and GREAT ones make those people want to praise your company.

GPL Time-bomb an interesting approach to #FOSS licensing

UPDATES Following some feedback I am going to rename my usage of “Time-Bomb” due to potential negative connotation on the words. I am going to call it “Eventually Open”. Also a few other things need mentioning. I am not looking for code submissions back into the source at this time. This was a move to show that there are no back-doors in the code sending source code back to a master server.

About a week ago I released searchcode server under the fair source licence. From day one I had wanted to release it using some form of licence where the code was available but I wanted to lock it somewhat because frankly I do want to make some money out of my time investment. That’s not the whole story however. I did not want to create another “Look but don’t touch” situation forever and I certainly didn’t want searchcode to be constrained by a licence in the event that I die, lose interest or stop updating the code.

The result of this was that I have added what I am going to call a GPL Time-Bomb into into the licencing of searchcode server. Here is how it works. After a specified period of time the current version of searchcode server can be re-licensed under the GPL v3. This is a shifting date such that each new release extends its own time-bomb further into the future. However the older releases time is still fixed. The time-bomb for version 1.2.3 and 1.2.4 takes place on the 27th August 2019 at which point you can take the source using GPL 3.0. Assuming searchcode server 6.1.2 comes out at roughly the same time its time-bomb will be set to the 27th of August 2022 but the 1.2.3 release will be unaffected.

In short I have put a time limit of 3 years to make money out of the product and if I am unable it is turned over to the world to use as they see fit. Even better, assuming searchcode server becomes a successful product I will be forced to continually improve it and upgrade if I want to keep a for sale version without there being an equivalent FOSS version around (which in theory could be maintained by the community). In short everyone wins from this arrangement, and I am not forced to rely on a support model to pay the bills which frankly only works when you have a large sales team.

Here’s hoping this sort of licencing catches on as there are so many products out there that could benefit from it. If they take off the creators have an incentive to maintain and not milk their creation and those that become abandoned even up available for public use which I feel is a really fair way of licencing software.

Agree? Disagree? Email me or hit me up on twitter.

searchcode server under fair source

A very quick blog today. I have released searchcode server under the fair source licence. This means that as of a few days ago you can view the source, change it modify it and run it as you see fit so long as you have less than 5 users.

The source is hosted on github (I may move this to GitLab sometime in the future) and you can view it here.

So what does this mean? Well the community edition still exists (run searchcode with as many users as you want) as do the paid versions with support and all the full features. The real advantage however is that you can now vet the source code to ensure that searchcode server is not secretly sending your most valuable asset to some hidden server somewhere. In addition it means I can now talk about the source openly and will be writing some posts about how I ran into some CPU branching issues which slowed down some code.

Good news all around then. Be sure to check out the source and let me know what you think.

The Worst Individual I Ever Worked With

Taken from a comment I posted on HN in a thread about a Soccer Con Man.

Not actually a programmer. The guy was hired to be a project manager.

After joining things were as expected but after a few weeks we noticed that he was rarely around after lunch and never around after lunch on a Friday.

We would email him at those times deliberately to catch him out and I recall starting to put sticky notes on his laptop “Came to see you a X time”. He would come back and just dump all the notes in the rubbish and claim he never got them. He would often claim to be working from home, despite his laptop being on his desk and usually closed. He would also never responding during those times to email or IM.

A classic seagull manager he would appear when something went wrong, making a lot of noise, writing a lot of emails and then vanishing. He would also be sure to be seen when something was delivered often staying back late on those times.

It got so bad one friend of mine started tracking when he was around and then tracking when one of his relatives died. During his tenure the following incidents occurred,

– Hot water system blew up. 4 times. He had pictures which he would show all the time.

– Uncles, aunts, and various over family members died to the total of 20 individuals.

– Our time tracking him showed him to be in the office less than 15 hours a week on average.

We started to suspect he had a second job and was pulling the same con on them. This was never proved, but we did find someone who had worked with him previously and they reported the same behavior.

The worst thing was it was raised with management at least several dozen times and nothing ever happened. He managed to pull this scam off for 4 years. I could not believe the waste of money this guy was, literally $500,000 burnt on a useless individual.

Types of Testing in Software Engineering

There are many different types of testing which exist in software engineering. They should not be confused with the test levels, unit testing, integration testing, component interface testing, and system testing. However the different test levels may be used by each type as a way of checking for software quality.

The following are all different types of tests in software engineering.

: A/B testing is testing the comparison of two outputs where a single unit has changed. It is commonly used when trying to increase conversion rates for online websites. A real genius in this space is Patrick McKenzie and a few very worthwhile articles to read about it are How Stripe and AB Made me A Small Fortune and AB Testing

: Acceptance tests usually refer to tests performed by the customer. Also known as user acceptance testing or UAT. Smoke tests are considered an acceptance test.

: Accessibility tests are concerned with checking that the software is able to be used by those with vision, hearing or other impediments.

: Alpha testing consists of operational testing by potential users or an independent test team before the software is feature complete. It usually consists of an internal acceptance test before the software is released into beta testing.

: Beta testing follows alpha testing and is form of external user acceptance testing. Beta software is usually feature complete but with unknown bugs.

: Concurrent tests attempt to simulate the software in use under normal activity. The idea is to discover defects that occur in this situation that are unlikely to occur in other more granular tests.

: Conformance testing verifies that software conforms to specified standards. An example would checking a compiler or interpreter to see if it will work as expect against the language standards.

: Checks that software is compatible with other software on a system. Examples would be checking the Windows version, Java runtime version or that other software to be interfaced with have the appropriate API hooks.

: Destructive tests attempt to cause the software to fail. The idea being to check that software continues to work even with given unexpected conditions. Usually done through fuzzy testing and deliberately breaking subsystems such as the disk while the software is under test.

: Development testing is testing done by both the developer and tests during the development of the software. The idea is to prevent bugs during the development process and increase the quality of the software. Methodologies to do so include peer reviews, unit tests, code coverage and others.

: Functional tests generally consist of stories focussed around the users ability to perform actions or use cases checking if functionality works. An example would be “can the user save the document with changes”.

: Ensures that software is installed correctly and works as expected on a new piece of hardware or system. Commonly seen after software has been installed as a post check.

: Internationalisation tests check that localisation for other countries and cultures in the software is correct and inoffensive. Checks can include checking currency conversions, word range checks, font checks, timezone checks and the like.

Non functional
: Non functional tests test the parts of the software that are not covered by functional tests. These include things such as security or scalability which generally determine the quality of the product.

Performance / Load / Stress
: Performance load or stress testing is used to see how a system performance under certain high or low workload conditions. The idea is to see how the system performs under these conditions and can be used to measure scalability and resource usage.

: Regression tests are an extension of sanity checks which aim to ensure that previous defects which had a test written do not re-occur in a given software product.

: Realtime tests are to check systems which have specific timing constraints. For example trading systems or heart monitors. In these case real time tests are used.

Smoke / Sanity
: Smoke testing ensures that the software works for most of the functionality and can be considered a verification or acceptance test. Sanity testing determines if further testing is reasonable having checked a small set of functionality for flaws.

: Security testing concerned with testing that software protects against unauthorised access to confidential data.

: Usability tests are manual tests used to check that the user interface if any is understandable.