Updates Coming Soon

A few updates should be rolling out soon to searchco.de with the first being a big jump on the amount of open source/free software code being indexed. At last check there was about 1.1 billion lines of code indexed. This should more then double in the next release to well over 2 billion. When I have an exact count I will publish it here.

Secondly some massive speed increases are on the way. This will mostly be due to beefing up the server that its running on (subject of another to be written blog post). Finally I am working on adding a tree view of projects into the mix so you can navigate between files on the same project.

I do have a backlog of requests from people which are yet to be worked on. The most important as far as I can see is extending the regex search to work over lines as currently its limited to individual lines. I have this working in a branch but need to fix some performance issues before rolling it out to the public at large.

Mutation Tester for All Languages

As a fan of unit tests for bug checking and development (where applicable) I always have a doubt that even though my tests pass they might not be written correctly. This is especially true where you write the tests after development rather then following TDD.

Regardless you can still stress your code by doing some mutation testing. It’s essentially a way of testing your tests. Quite a few mutation testing frameworks are out there such as Heckle, Insure++, Nester etc.. but I was looking for one for my Python and PHP code that goes into making searchco.de

Unable to find one that met my requirements (or worked at all) I wrote the below github version in 10 mins or so. Rather then fiddling with opcode (like the previously mentioned versions) it applies the changes to the source files themselves. Because of this you should sure what you have has a backup somewhere before running it. I have tested it against a few languages (PHP, C#, JavaScript) and the results show my tests failing as expected so I am pretty happy with the outcome.

Usage is pretty simple,

python mutator.py DIRECTORY EXTENTION

Where directory is the directory you want to recursively target and extension is the file extension you want to target. You can get the code here at github.

Errors in Search

EDIT – This has now been resolved. All the below searches should work correctly, with the exception of XCompositeGetOverlayWindow. I am adding that to the index to be refreshed sometime in the next month or so.

Well thanks to some sample searches being thrown against the codesearch index I can finally start tuning issues that have cropped up. The main issue I have currently is searches returning no results where you would expect some. Some examples are included below.

/com\.google\.gwt.*A/ ext:java
/sql\.append.*se/ ext:java
/memcpy.*sizeof/
/com\.google\.gwt.*AsyncCallback/ ext:java
/XCompositeGetOverlayWindow/

Most of the problems I found were during my outage window which is good to discover. With the exception of the last query above which returns nothing due to the index not having X.org indexed (yet) all issues are due to a undiscovered bug.

Take the following example,

/memcpy.*sizeof/

The problem is the way which I interpreted the regex. Essentially on the back-end a lot of the regex is expanded out fully for a certain amount of cases and a list of matches are generated. This is to ensure that it can run quickly. Think of it as precaching every possible regex against all lines in all the files. The problem in this case is that I have a unit test case missing. I never included a test which matched the above and because of this there is a bug in the way that is expanded out to match everything. A modified version of the above which does return results is,

/memcpy.* sizeof/

The above query returns results as would be expected. In fact all of the ones above can be rewritten to work correctly. Such as the below,

/com\.google\.gwt.* A/ ext:java
/sql\.append.* se/ ext:java
/memcpy.* sizeof/
/com\.google\.gwt.* AsyncCallback/ ext:java

If you try the above queries you will see they act as expected. I will be updating code shortly to take this case into consideration, and of course post an update here.