Setup up ConcourseCI 2.6.0 behind Nginx with Self Signed Certificates on Ubuntu 16.04

Concourse CI is a very nice continuous integration server.

However for installs there are a few gotcha’s you need to keep in mind. Mostly these relate to how TLS/SSL works.

The first is that while it is possible to run concourse inside Docker I found this to cause a lot of issues with workers dying and not recovering. I would suggest installing the binarys on bare machines. When I moved from a docker cluser using Amazon’s ECS to a single t2.large instance not only were builds faster but it was a far more reliable solution.

I am also not going to automate this install, and will leave it as an excercise for you the reader to do this yourself. I would suggest using Python Fabric, or something like Puppet, Ansible or Saltstack to achive this.

Also keep in mind that with this install everything is running on a single instance. If you have need to scale out this is not going to work, but as a way to get started quickly it works pretty well.

Prerequisites are that you have a Ubuntu instance running somewhere. If you want to run the fly execute command you will also need a valid domain name to point at your machine. This is an annoying thing caused by GoLang when using SSL certs. Turns out you cannot set a hostfile entry and use it as such. You can in a insecure non SSL mode but otherwise cannot.

If you are using a virual machine from DigitalOcean/AWS/Vultr or other you will need to add some swap space. I noticed a lot of issues where this was missing. You can do so by running the following commands which will configure your server to have 4G of swap space,

sudo fallocate -l 4G /swapfile
 sudo chmod 600 /swapfile
 sudo mkswap /swapfile
 sudo swapon /swapfile
 sudo echo "/swapfile none swap sw 0 0" >> /etc/fstab

We will need to get the concourse binary, and to make it executable. For convenience and to match the concourse documentation lets also rename it to concourse. To do so run the following command.

wget https://github.com/concourse/concourse/releases/download/v2.6.0/concourse_linux_amd64 && mv concourse_linux_amd64 concourse && chmod +x concourse

We now need to generate the keys that concourse requires.

mkdir keys
 cd keys

ssh-keygen -t rsa -f tsa_host_key -N '' && ssh-keygen -t rsa -f worker_key -N '' && ssh-keygen -t rsa -f session_signing_key -N '' && cp worker_key.pub authorized_worker_keys
 cd ..

The above commands will create a directory called keys and setup all of the keys that concourse 2.6.0 requires.

We can now create some helper scripts which we can use to run concourse easily.

pico concourse.sh

./concourse web \
 --basic-auth-username main \
 --basic-auth-password MySuperPassword \
 --session-signing-key ./keys/session_signing_key \
 --tsa-host-key ./keys/tsa_host_key \
 --tsa-authorized-keys ./keys/authorized_worker_keys \
 --external-url https://YOURDNSHERE/ \
 --postgres-data-source postgres://concourse:concourse@127.0.0.1/concourse

chmod +x concourse.sh

This script will start running concourse. Keep in mind that the username and password used here are for the main group and as such you should protect them as they have the ability to create additional groups on your concourse instance.

pico worker.sh

./concourse worker \
 --work-dir /opt/concourse/worker \
 --tsa-host 127.0.0.1 \
 --tsa-public-key ./keys/tsa_host_key.pub \
 --tsa-worker-private-key ./keys/worker_key

chmod +x worker.sh

This script will spin up a worker which will communicate with the main concourse instance and do all the building. It can be useful to lower the priority of this command using nice and ionice if you are running on a single core machine.

Now we need to install all of the postgresql packages required,

apt-get update && apt-get install -y postgresql postgresql-contrib

Once this is done we can create the database to be used

sudo -u postgres createdb concourse

Then login to postgresql and create a user to connect to the database

sudo -u postgres psql
 CREATE USER concourse WITH PASSWORD 'concourse'
 GRANT ALL PRIVILEGES ON DATABASE "concourse" to concourse
 \du

We also need to need to edit the pg_hba file allowing us to make the connection,

sudo pico /etc/postgresql/9.5/main/pg_hba.conf

Scroll down and look for the following line,

host all all 127.0.0.1/32 md5

and change the md5 on the end to trust

host all all 127.0.0.1/32 trust

Then save the file and restart postgresql

service postgresql restart

At this point everything we need to run concourse should be there. You will need to setup the concourse scripts we created earlier to run as a service, or just run them in a screen session if you are in a hurry.

What we want to do now is expose it to the big bad internet.

apt-get install nginx

Create a directory using either the domain name you want to use, a desired name or anything if you are going to connect to things using IP addresses.

mkdir -p /etc/nginx/ssl/mydesireddomain.com

Nowe we want to swtich to the directory and setup the self signed TLS/SSL keys.

cd /etc/nginx/ssl/mydesireddomain.com
 openssl genrsa -des3 -out server.key 1024

Enter whatever you want for the passphrase but remember it!

openssl req -new -key server.key -out server.csr

Enter the passhrase entered. The most important thing here is that when asked for the Common Name or FQDN you need to enter in the desired domain name.

With that done we need to sign the key.

cp server.key server.key.org
 openssl rsa -in server.key.org -out server.key

Remember to enter the same pass phrase as before. Finally sign they key with an expiry of 9999 days.

openssl x509 -req -days 9999 -in server.csr -signkey server.key -out server.crt

Make a copy of the file server.crt which will be needed for the concourse fly tool to talk to the server if you are using self signed certs.

With that done lets enable the site,

sudo nano /etc/nginx/sites-available/mydesireddomain.com

And enter in the following details,

upstream concourse_app_server {
 server localhost:8080;
 }

server {
 listen 80 default_server;
 rewrite ^ https://MYIPADDRESSORDOAMIN$request_uri? permanent;
 }

server {
 listen 443 default_server ssl http2;
 server_name mydesireddomain.com;

ssl on;
 ssl_certificate /etc/nginx/ssl/mydesireddomain.com/server.crt;
 ssl_certificate_key /etc/nginx/ssl/mydesireddomain.com/server.key;

location / {
 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 proxy_set_header Host $http_host;
 proxy_redirect off;
 proxy_pass http://concourse_app_server;
 }
 }

The above nginx config defines an upstream concourse server running on port 8080. It then defines a server listening on port 80 that redirects all traffic to the same server on HTTPS. The last server config defines out site, sets up the keys and forwards everything to the upstream concourse server.

We can now enable it,

sudo ln -s /etc/nginx/sites-available/mydesireddomain.com /etc/nginx/sites-enabled/mydesireddomain.com
 rm -f /etc/nginx/sites-available/default

service nginx restart

We need to remove the default file above because nginx will not allow you to have two default servers.
At this point everything should be working. You should now be able to connect to your concourse server like so,

fly --target myciserver login --team-name myteamname --concourse-url https://MYIPADDRESSORDOAMIN --ca-cert ca.crt

Where the file ca.crt exists whereever you are running fly. Everything at this point should work and you can browse to your concourse server.

If you are using the IP address to communicate to your concourse server you have the limitation that you will not be able to run fly execute to upload a task. You can get around this by using a real domain, or running your own DNS to resolve it correctly.

The only bit of homework at this point would be to configure the firewall to disable access to port 8080 so that everything must go though nginx. Enjoy!

Types of Testing in Software Engineering

There are many different types of testing which exist in software engineering. They should not be confused with the test levels, unit testing, integration testing, component interface testing, and system testing. However the different test levels may be used by each type as a way of checking for software quality.

The following are all different types of tests in software engineering.

A/B
: A/B testing is testing the comparison of two outputs where a single unit has changed. It is commonly used when trying to increase conversion rates for online websites. A real genius in this space is Patrick McKenzie and a few very worthwhile articles to read about it are How Stripe and AB Made me A Small Fortune and AB Testing

Acceptance
: Acceptance tests usually refer to tests performed by the customer. Also known as user acceptance testing or UAT. Smoke tests are considered an acceptance test.

Accessibility
: Accessibility tests are concerned with checking that the software is able to be used by those with vision, hearing or other impediments.

Alpha
: Alpha testing consists of operational testing by potential users or an independent test team before the software is feature complete. It usually consists of an internal acceptance test before the software is released into beta testing.

Beta
: Beta testing follows alpha testing and is form of external user acceptance testing. Beta software is usually feature complete but with unknown bugs.

Concurrent
: Concurrent tests attempt to simulate the software in use under normal activity. The idea is to discover defects that occur in this situation that are unlikely to occur in other more granular tests.

Conformance
: Conformance testing verifies that software conforms to specified standards. An example would checking a compiler or interpreter to see if it will work as expect against the language standards.

Compatibility
: Checks that software is compatible with other software on a system. Examples would be checking the Windows version, Java runtime version or that other software to be interfaced with have the appropriate API hooks.

Destructive
: Destructive tests attempt to cause the software to fail. The idea being to check that software continues to work even with given unexpected conditions. Usually done through fuzzy testing and deliberately breaking subsystems such as the disk while the software is under test.

Development
: Development testing is testing done by both the developer and tests during the development of the software. The idea is to prevent bugs during the development process and increase the quality of the software. Methodologies to do so include peer reviews, unit tests, code coverage and others.

Functional
: Functional tests generally consist of stories focussed around the users ability to perform actions or use cases checking if functionality works. An example would be “can the user save the document with changes”.

Installation
: Ensures that software is installed correctly and works as expected on a new piece of hardware or system. Commonly seen after software has been installed as a post check.

Internationalisation
: Internationalisation tests check that localisation for other countries and cultures in the software is correct and inoffensive. Checks can include checking currency conversions, word range checks, font checks, timezone checks and the like.

Non functional
: Non functional tests test the parts of the software that are not covered by functional tests. These include things such as security or scalability which generally determine the quality of the product.

Performance / Load / Stress
: Performance load or stress testing is used to see how a system performance under certain high or low workload conditions. The idea is to see how the system performs under these conditions and can be used to measure scalability and resource usage.

Regression
: Regression tests are an extension of sanity checks which aim to ensure that previous defects which had a test written do not re-occur in a given software product.

Realtime
: Realtime tests are to check systems which have specific timing constraints. For example trading systems or heart monitors. In these case real time tests are used.

Smoke / Sanity
: Smoke testing ensures that the software works for most of the functionality and can be considered a verification or acceptance test. Sanity testing determines if further testing is reasonable having checked a small set of functionality for flaws.

Security
: Security testing concerned with testing that software protects against unauthorised access to confidential data.

Usability
: Usability tests are manual tests used to check that the user interface if any is understandable.

What is Chaos Testing / Engineering

A blog post by the excellent technical people at Netflix about Chaos Engineering and further posts about the subject by Microsoft in Azure Search prompted me to ask the question, What is chaos engineering and how can chaos testing be applied to help me?

What is Chaos Testing?

First coined by the afore mentioned Netflix blog post, chaos engineering takes the approach that regardless how encompassing your test suite is, once your code is running on enough machines and reaches enough complexity errors are going to happen. Since failure is unavoidable, why not deliberately introduce it to ensure your systems and processes can deal with the failure?

To accomplish this, Netflix created the Netflix Simian Army, which consists of a series of tools known as “monkeys” (AKA Chaos Monkey’s) that deliberately inject failure into their services and systems. Microsoft adopted a similar approach by creating their own monkey’s which were able to inject faults into their test environments.

What are the advantages of Chaos Testing?

The advantage of chaos engineering is that you can quickly smoke out issues that other testing layers cannot easily capture. This can save you a lot of downtime in the future and help design and build fault tolerant systems. For example, Netflix runs in AWS and as a response to a regional failure changed their systems to become region agnostic. The easiest way to confirm this works is to regularly take down important services in separate regions, which is all done through a chaos monkey designed to replicate this failure.

While it is possible to sit down and anticipate some of the issues you can expect when a system fails it knowing what actually happens is another thing.

The result of this is you are forced to design and build highly fault tolerant systems and to withstand massive outages with minimal downtime. Expecting your systems to not have 100% uptime and planning accordingly to avoid this can be a tremendous competitive advantage.

One thing commonly overlooked with chaos engineering is its ability to find issues caused by cascading failure. You may be confident that your application still works when the database goes down, but would you be so sure if it when down along with your caching layer?

Should I be Chaos Testing?

This really depends on what your tolerances for failure are and based on the likely hood of them happening. If you are writing desktop software chaos testing is unlikely to yield any value. Much the same applies if you are running a financial system where failures are acceptable so long as everything reconciles at the end of the day.

If however you are running large distributed systems using cloud computing (think 50 or more instances) with a variety of services and process’s designed to scale up and out injecting some chaos will potentially be very valuable.

How to start Chaos Testing?

Thankfully with cloud computing and the API’s provided it can be relatively easy to begin chaos testing. These tools by allowing you to control the infrastructure through code allow the replication of a host of errors not easily reproducible when running bare hardware. This does not mean that bare hardware systems cannot perform chaos testing, just that some classes of errors will be harder to reproduce.

Lets start by looking at the way Microsoft and Netflix classify their “monkey’s”.

Low chaos
: This refers to failures that our system can recover from gracefully with minimal or no interruption to service availability.

Medium chaos
: Are failures that can also be recovered from gracefully, but may result in degraded service performance or availability.

High chaos
: Are failures that are more catastrophic and will interrupt service availability.

Extreme chaos
: Are operations are failures that cause ungraceful degradation of the service, result in data loss, or that simply fail silently without raising alerts.

Microsoft found that by setting up a testing environment and letting the monkey’s loose that they were able to identify a variety of issues with provisioning instances and services as well as scaling them to suit. They also split the environments into periods of chaos where the monkey’s ran and dormant periods where they did not. Errors found in dormant periods were considered bugs, and flagged to be investigated and fixed. During chaos periods any low issues were also considered bugs and scheduled to be investigated and fixed. Medium issues raised low priority issues to on call staff to investigate along with high level issues. Extreme operations once identified were not run again until a fix had been introduced.

The process for fixing issues identified through this process was the following,

* Discover the issue, identify the impacts if any and determine the root cause.
* Mitigate the issue to prevent data loss or service impact in any customer facing environments
* Reproduce the error through automation
* Fix the error and verify through the previous step it will not reoccur

Once done the monkey created through the automation step could be added the the regular suite of tests ensuring that whatever issue was identified would not occur again.

Netflix uses a similar method for fixing issue, but by contrast run’s their monkey’s in their live environments rather then in a pure testing environment. They also released some information on some of the monkey’s they used to introduce failures.

Latency Monkey
: Induces artificial delays into the client-server communication layer to simulate service degradation and determine how consumers respond in this situation. By making very large delays they are able to simulate a node or even an entire service downtime. This can be useful as bringing an entire instance down can be problematic when an instance hosts multiple services and when it is not possible to do so through API’s.

Conformity Monkey / Security Monkey
: Finds instances that don’t adhere to best-practices and shuts them down. Examples for this would be checking that instances in AWS are launched into permission limited roles and if they are not shutting them down. This forces the owner of the instance to investigate and fix issues. Security monkey as an extension that performs SSL certificate validation / expiry and other security best practice checks.

Doctor Monkey
: Checks existing health checks that run on each instances to detect unhealthy instances. Unhealthy instances are removed from service.

Janitor Monkey
: Checks for unused resources and deletes or removes them.

10-18 Monkey (Localisation monkey)
: Ensures that services continue to work in different international environments by checking that languages other then the base system consisting to work

Chaos Gorilla
: Similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone.

Well hopefully that explains what Chaos Testing / Engineering is for those who were previously unsure. Feel free to contact me over twitter or via the comments for further queries or information!

Running three hours of Ruby tests in under three minutes

Recently the very cool hard working developers working on Stripe released a post about how they modified their build/test pipeline to reduce their test suite runtime from 3 hours to about 3 minutes.

The article is very much worth reading, as is the discussions that have come around it including those on Hacker News.

A few key takeaways,

* For dynamic languages such as Ruby or Python consider forking to run tests in parallel
* Forks are usually faster then threads in these cases and provide good test isolation
* For integration tests use docker which allows you to revert the file system easily

The above ensures that the tests are generally more reliable and you avoid having to write your own teardown code which restores state, both in memory for the forks and on disk using Docker.

A Culture of Quality

The best working environment I had the pleasure to work in had a strong emphasis on testing and software quality in general. Product teams were encouraged to spend extra time ensuring that everything worked over shipping before it was ready. The transformation it went through was incredible. Having come from a culture very much wild west through to where it was. An example of the advantages this brought was that before adopting this culture website launches were a traumatic event. Teams would block out 48 hours stretches and work solid fixing bugs at go live. Very stressful for all involved and not a healthy working environment. Contrast to where things ended up where several websites were launched on the same afternoon without a hitch. Very impressive considering the scale of the websites being dealt with (several million uniques a day).

I attribute most of these improvements being due to cultural changes that started in upper management and filtered down. In short the organisation adopted a culture of writing quality software and implemented in part by insisting on solid development process backed by a testing. This culture was so successful I remember the following happening one afternoon. Two individuals were discussing a new piece of functionality to be worked on. After several minutes discussing they convinced themselves that for a few situations that perhaps they didn’t need tests. This conversation was overheard by a less senior developer in another team who happened to sit behind them. Without pause he turned around and calmly insisted that not only would they be writing the tests they had tried to convince themselves were not required and that he would show them how to implement it.

This sort of culture where taking ownership of quality not just for your own work but for others can produce incredible results. As far as I am aware it is still being practiced as I experienced it there and I can attest to how effective it has been.

A/B Testing

A/B testing is testing the comparison of two outputs where a single unit has changed. It is commonly used when when trying to increase conversion rates for online websites. Also known as split testing an example would be trying to increase user clicks on a specific button on a website. You may have a theory that red buttons work better then green. You would try out both against real users and see which one performs better.

You should use A/B testing when you have some existing traffic to a website (or mailing list), know what outcomes you are trying to achieve and most importantly are able to track what actions resulted in the outcome. Without these three things split testing will not achieve the outcomes you wish.

Numerous things are able to be A/B split tested and are limited to what you have to pivot on and your imagination. A few examples include,

* Pricing tiers and structures
* Free trial time lengths
* Titles E.G. length and word placement
* Headlines and Sub-Headlines E.G. size, length, word placement
* Forms E.G. adding and removing requested fields
* Paragraph Text E.G. adding more content, changing spacing
* Testimonials E.G. adding more or less, increasing the length
* Call to Action (text, links, buttons or images) E.G. “add to cart” vs “buy now”
* Movement of content around the page
* Press mentions and awards

Things to keep in mind when A/B testing is that running your tests for a long time can result in SEO penalties from Google and other search engines. Quoted from the Google Webmaster Central Blog on Website Testing,

“If we discover a site running an experiment for an unnecessarily long time, we may interpret this as an attempt to deceive search engines and take action accordingly. This is especially true if you’re serving one content variant to a large percentage of your users.”

It is highly recommended to read and understand the post mentioned in order to ensure you are following best practices. The consequences can be dire indeed including being black listed by Google and other search engines as the worst possible result.

A/B testing can be implemented in a variety of ways. Perhaps the best known is using using Google Analytics. However there are other free and paid for solutions. Visual Website Optimizer is one example of a paid for service and if you are using Ruby on Rails there are many libraries to help you out.

A few things to keep in mind when doing A/B testing.

* Test all the variations of a single entity at the same time. If you perform a test on one variant for a week and another the following week your data is likely to be skewed. Its possible that you had some excellent high value links added the second week but it had a lower conversion rate.
* Keep the test running long enough to have confidence in your results but not so long as to be penalised by Google. You need to have statistical significance. Any A/B tool worth money will be able to report on this metric for you and let you know when to finish the test. It is very useful to know how long you are going to run the test before starting.
* Trust the data over feeling. If the data is telling you that an ugly button works better then your beautiful one either trust the data or run the test again at a later date to confirm. It can be hard to do what feels counter intuitive but you need to remember that humans generally are not rational and will not behave how you expect.
* If a user complains about seeing a different price offer them the better deal. Always respect your users and customers. It builds good will. Another thing to do is avoid split testing paying customers. Adobe Omniture runs a lot of A/B tests in their online product and it drives some customer’s crazy as everything they need moves around on a regular basis. Just don’t do it.
* Don’t A/B test multiple things at the same time. If you are doing to A/B test a better design then test the better design against the other one. Don’t chop and change various parts of the website. It will be confusing.
* Keep trying. Its possible a single test will produce no meaningful results. If so try again. Not only will you get better with experience you are more likely to find the correct things to optimise.

A real genius in this space is Patrick McKenzie and a few very worthwhile articles to read about it are A-B Testing Made Me a Small Fortune and A-B Testing. Other articles worth reading include, Practical Guide to Controlled Experiments on the Web by Microsoft Research (PDF), Writing Decisions: Headline Tests on the Highrise Sign-Up Page], “You Should Follow Me on Twitter Here”, How We Increased our Conversion Rate by 72%, Human Photos Double your Conversion Rate

Five ways to avoid and control flaky tests

Having a reliable test suite should always be the goal in software development. After all if you can’t trust the tests then why bother running them at all? This is especially important in a shared coding environment and when running through Continuous Integration (CI).

keep-calm-and-don-t-break-the-build-e1437343295860

1. Test in Isolation

It may seem obvious but writing focused tests which do a single thing is one of the most effective ways to avoid them being flaky. Tests which do multiple things increases the chance for failure and can make the tests non deterministic. Always remember to test features and issues in isolation.

2. Write Helpful Error’s

When a test does fail having an error such as “Error 23: Something went wrong ¯\_(ツ)_/¯” is incredibly frustrating. Firstly you need to run the test again with either a debugger or some code modifications to spot the bug which slows down development its also unprofessional. Write meaningful error messages. For example “Error: The value “a” was unexpected for this input” is a far better error. Another thing to remember is avoid swallowing the exception in languages which support exception handling.

3. Control the Environment

Regularly run tests should run in a controlled environment which will be the same for the current test and future tests. This usually means a clean deploy, restoring databases and generally ensuring that however the application was setup originally is done again. This ensures the tests always start with the same conditions. This also ensures you have a good CI process and are able to recreate environments from scratch when required which is good development process.

4. Fix it, delete it or mark it

A test that fails is proving its value, unless its flaky. Tests that fail randomly slow down your development process. In time they will be ignored and neglected. The moment a test is identified as failing it should be fixed. If it will take time then mark it as being flaky, remove it from the CI pipeline and investigate as part of paying down technical debt. If after time it still isn’t resolved it should be investigated to see if it is providing any actual value. Odds are if it hasn’t been fixed for a month it may be a test you can live without.

5. Be forgiving but verify

For any integration test you need to have your tests be forgiving in how the application responds. After all submitting an image into a text field may result in an error which is probably acceptable. Other things to keep in mind are that there will be timeouts you will need to deal with. Be sure to have a reasonable length of time to wait for a response and only once this has expired to fail. Be wary of any test that waits forever for something to happen.

Why Does Software Contain Bugs?

“Why does all software contain bugs?” this was a question recently asked of me. My response at the time was because all software is not perfect, but is this true?

Lets take a very simple example.


    public class Hello {
        public static void main(String[] args) {
            System.out.println("Hello World!");
        }
    }

The above example is arguably the simplest program that can be written using Java. It also happens to be the first program usually written by any Java programmer. It simply print outs the text “Hello World!” when it is run. Surely this program is so simple that it is perfect and therefore bug free?

Ignoring the obvious that this program does nothing useful, lets assume for the moment that we have been tasked to write a “Hello World!” program in Java. Surely the above is 100% bug free.

Yes. The application is 100% bug free. But thats not the whole story. What happens when this application run?

The first thing to happen is it needs to be compiled. This takes the application from its text form converting it into something that the computer can understand. In the case of Java it turns it into something the Java Virtual Machine can understand. This allows you to take the same compiled program and in theory run it on your computer, phone, playstation, blu ray, ipad or any other device that runs Java.

The Java Virtual Machine or JVM is itself a compiled program running on a device. The catch is that it is compiled using a different compiler for every platform (computer, phone etc…). When it runs it takes your compiled application and converts the instructions into something that the computer understands.

However underneath the JVM is usually the Operating System. This hands out resources to the programs that are running. So the JVM runs inside the operating system and the operating system talks to the physical hardware.

Actually while the operating system does talk to the hardware directly there is usually software inside the hardware itself which controls the actual hardware these days. Not only does the hardware contain software the hardware itself such as a CPU is literally software turned into hardware. This means CPU’s and the like can also contain bugs.

This means in order for your perfect application to run perfectly the following pieces of software also need to run perfectly,


    Your Program -> Java Compiler -> JVM -> JVM Compiler -> Operating System -> Operating System Compiler -> Hardware Software -> Hardware Software Compiler -> Hardware Itself

As you can see it really is an iceberg with your perfect program at the top and lot of things going on underneath. Any bug in any level can result in “perfect” software not working as expected making it flawed.

This is why perfect software does not currently exist. The only way to do so would to be by writing perfect software at every level which is a monumental undertaking. There are estimates around that suggest that the cost to rewrite the Linux kernel as being around 500 billion dollars, and thats not really accounting for making it “perfect”, and as shown is literally one small piece of the puzzle.

So should we just give in? Well no. At every level there are thousands of testers and processes designed to make the software as bug free as possible. Just because we cannot reach perfection does not mean it is not at least worth trying.

The benefit of testing for Developers, Managers and the Business

“Fixing regression bugs is analogous to digging a hole only to find the next day it has been filled in and having to dig it out again”

Ask any manager, developer or tester working on software without tests what the main pain points are. Nearly all the time the main one mentioned is dealing with regressions. This is bugs that were fixed a year ago which returned. Regression bugs cost the software industry billions of dollars a year. Worse still they are demoralising to everyone involved. Finding or fixing the same bug over and over causes you to start looking for new projects or new jobs.

A good regression test suite generally solves these problems. It may not prevent every bug from reoccurring but it will catch most of them. It also gives peace of mind that you have not reintroduced these bugs again once fixed. Lastly it saves time by checking if the bug is fixed for you.

“Software is never finished, only abandoned”

Another advantage to come out of having a good test suite are the improvements to the software itself. Not only is testable software generally better written then non-testable software, a collection of tests provides a nice safety next when you start to improve your product. Software as a general rule is constantly in flux with changes and improvements or left as is. If your software is never improved or built upon you may want to consider if you really need tests. That said there are other reasons to test software beyond what is mentioned above.

If none of the above points are selling testing to you consider this. As mentioned testable software is usually well designed modular software. This allows greater code reuse saving time and money. It also allows new developers to quickly understand what they are dealing with allowing them to become productive faster. If nothing else writing testable software will save you time and money in the long run by making things easier to understand.

AWS EC2 Instance Types to Use as Test Agents

When you are running test agents on AWS knowing what instance type to run as test agents (for TeamCity or otherwise) can involve a lot of trial and error. Not only can there be great savings to be made by picking the correct instance type you can speed up your builds and get test feedback back faster which can be far more valuable the cost of a few additional cents an hour.

The following are some results that I have found when playing with different instance types. Before Amazon released the burstable t2 instance types one of the most common instances I had seen used was general purpose instances such as the m3.medium. This always seemed like a good choice as most tests tend to use a mixture of CPU/Disk/Network and thats what the agents are supposed to be good at.

The moment that the bustable instances were released several agents were relaunched as t2.mediums and left in the cluster for a week.

The outcome was that no only were they saving money since they cost lest per month, they were able to run tests and build faster then the previous agents. This was a surprise at first until we observed that with very few exceptions every test was CPU bound. This included browser tests which we had expected to be more network bound. The performance increase as such was mostly down to them accumulating credits over time faster then they could be spent. See the below image which was taken from a live instance where you can clearly see how this works.

t2_credit_usage

For the record this agent runs 24/7 running builds and tests over dozens of different projects including a lot of selenium tests for multiple browsers.

There were however a few tests which consumed considerably more CPU then expected. These tests comprised of a collection of very heavy math operations and integrations all running on the same machine. A single agent was boosted to a c4.medium to take care of these tests and everything has been working fine since. Build times were down and the developers had feedback sooner.

We also tried relaunching the instances with a higher number, such as a m3.large into a m4.large and the result was far faster builds. This is probably due to the underlying hardware AWS is using being faster. It was however still worth using t2 agents due to the cost saving and roughly equivalent performance.

Conclusions

It really depends on your environment and how much you are using the agents. I think the following guidelines apply fairly well though.

* For any test agents running Windows you want the minimum of a t2.medium on AWS or the equivalent with 2 CPU’s and 4 gig of RAM.
* Test agents running Linux want to be the minimum of a t2.small on AWS or the equivalent with a single CPU and 2 gig of RAM.
* For agents that run tests infrequently as in less then 6 times an hour stick with the lower end t2 instances.
* For agents that run heavy loads consider using a c4.large as the increased CPU will really cut down on the test time.
* Always go for the latest type in AWS, IE use a c4.large over a c3.large for increased performance

The main takeaway however is ensure you can relaunch your instances as different types easily. Try out different types and see what happens. The winning strategy I found was to launch as a t2.medium at first and then dial it down to a t2.small if it was overpowered (which was never the case for Windows) and relaunch as a c4.medium if it was underpowered.

The result was much faster builds saving developer time and frustration.