Clean Repository Data Access in C#

Mostly as a self reference here is an extremely clean data access pattern possible using C# and Entity Framework. It saves you the effort of mocking the database context as the code you end up writing is so simple it is all compile time checked.

Essentially you define a very simple class which provides a single method for getting data (although you may want a save data method too) and make sure you add an interface to make unit testing/mocking easier.

public interface IUrlRepository
{
	IQueryable GetUrl();
	void Save(Url url);
}

public class UrlRepository : IUrlRepository
{
	public DbContext _context = null;

	public UrlRepository()
	{
		_context = new DbContext();
	}

	public IQueryable GetUrl()
	{
		return from u in _context.Urls
			   select u;
	}

	public void Save(Url url)
	{
		_context.Urls.AddObject(url);
		_context.SaveChanges();
	}
}

As you can see rather then returning a list you return an IQueryable. Because entity framework is lazy you can then add extension methods over the return like so.

public static class UrlRepositoryExtention
{
	public static IQueryable ByCreatedBy(this IQueryable url, string User)
	{
		return url.Where(p => p.Created_By.Equals(User));
	}

	public static IQueryable OrderByCreateDate(this IQueryable url)
	{
		return url.OrderByDescending(x => x.Create_Date);
	}
}

With this you end up with a very nice method of running queries over your data.

var url = _urlRepo.GetUrl().OrderByCreateDate();

Since it can all be chained you can just add more filters easily as well.

var url = _urlRepo.GetUrl().OrderByCreateDate().ByCreatedBy("Ben Boyter");

What about joins I hear you ask? Well thankfully you this pattern takes care of this too. Just have two repositories, pull the full data set for each and do the following.

var users = _userRepo.GetUser();
var locations = _locationRepo.GetLocation();

var result =  from user in users
              join location in locations on user.locationid equals location.id && location.name = "Parramatta"
              select user;

The best thing is that its all lazy evaluation so you don’t end up pulling back the full data set into memory. Of course at a large enough scale you will probably hit some sort of leaky abstraction issue and end up rewriting to use pure SQL at some point, but for getting started this method of data access is incredibly powerful with few chances of errors.

Finally you get the advantage that you can provide pure unit tests over your joins. Because you can mock the response from your repository easily you don’t have to create a seed database and provide a connection. This is fantastic for TDD especially when running offline or on your local machine.

Gigablast Aquired and Code Posted

Interestingly it seems that Matt Well’s search engine Gigablast has been acquired by Yippy.com [1] [2] [3] (demo here http://demo.yippy.com/). Gigablast has always been one of my favorite search engines simply because it is so interesting. Started by a single guy, with an interesting blog and being one of the last true new indexes of the web it was always worth a look. While its sad to see it go this way I am happy that Matt presumably has been able to cash out on his creation. Well done to him. I must admit Gigablast had been going downhill for a while and this might explain why ProCog appeared and then vanished so quickly.

The site is still running at this point (no idea on how long that will last) but in a totally unexpected move it is also now open source as free software (not sure if free software as no license is posted) See comments, Philippe Ombredanne has identified that its under the Apache Licence https://github.com/gigablast/open-source-search-engine/blob/master/LICENSE You can download the source on GitHub https://github.com/gigablast/open-source-search-engine. I have not had a chance to go through it yet, and the folder structure leaves much to be desired but the code is there for review.

List of useful CAPTCHA Decoding Articles

This website ranks quite high in most search engines for the search term “captcha decoding” or some permutation of it. As such here are a collection of useful links if you are looking into doing such a thing. If any more come up I will be sure to update this post.

http://www.boyter.org/decoding-captchas/

Shameless self promotion but this link is why this page ranks so highly. Its an article I wrote some time ago about how to go about decoding a simple CAPTCHA. There is full source code and the principles can be applied to 90% of CAPTCHA’s out there. For the record it only came about because a colleague bet me that I couldn’t decode his websites CAPTCHA which was the one used in the article. Of course I waited till he changed it before publishing.

http://bokobok.fr/bypassing-a-captcha-with-python/

Interesting post on how to bypass a CAPTCHA using python. The CAPTCHA broken in this article is far more complex then most of the others in this list. Full source code is provided so its an excellent source to look at even though the article is missing a lot of details.

http://www.debasish.in/2012/01/bypass-captcha-using-python-and.html?m=1

Another Python post about breaking CAPTCHA’s. I think that might be due to how powerful the PIL is. Has full source code. This one is worth looking at because unlike the two previous one it uses an existing OCR engine Tesseract to perform the recogniton.

http://www.mperfect.net/aiCaptcha/

This is one of the older CAPTCHA articles around and does not supply source code. It does however go into a good amount of detail about how the author looked for weaknesses in the CAPTCHA and then went about writing an algorithm to defeat it. It really is a pity the code was never released to this one.

http://www.troyhunt.com/2012/01/breaking-captcha-with-automated-humans.html

A slightly different approach. Rather then try to code around the problem here is how to get humans to do it for you.

http://caca.zoy.org/wiki/PWNtcha

A PHP project that has been around since 2004 for defeating CAPTCHA’s. Code is available so its work taking a look at.

http://tech.slashdot.org/story/11/01/11/1411254/google-recaptcha-cracked
http://www.youtube.com/watch?v=dLgvrsAoPeE

It seems the original content that went with the above posting on slashdot has disappeared but I am sure it exists somewhere else on the web. I may have a copy lying around which I will upload if I find it. Goes into detail of how to defeat the RECAPTCHA projects CAPTCHA.

http://bhiv.com/defeating-diggs-captcha/

This article about defeating Digg 2.0’s CAPTCHA is hopelessly out of date however it shows how easily a simple CAPTCHA can be defeated if the person creating it has little knowledge of what they are doing. I believe it ties in well with this post http://www.boyter.org/2010/08/why-you-shouldnt-roll-your-own-captcha/

http://www.cs.sfu.ca/~mori/research/gimpy/

This is the grandaddy of all the above posts, papers and articles. The full paper is linked in there and has far more detail. It is one of the main sources I used when I started learning about decoding CAPTCHA’s.

https://medium.com/p/e8f2a748f95f

How reCAPTCHA Works, plus, how to cheat it, and how it contributes to the common good.

http://stevenhickson.blogspot.com.au/2014/01/hacking-snapchats-people-verification.html

How to defeat SnapChats CAPTCHA. Fairly light on on details but provides the source code (C++) to defeat it.

https://github.com/mieko/sr-captcha/blob/gh-pages/index.md

Breaking the SilkRoad’s CAPTCHA. Its follow up about breaking the new SilkRoad’s CAPTCHA is worth reading as well. https://github.com/mieko/sr-captcha/blob/gh-pages/silk-road-2.md