Don’t use IEnumerable for table references in C#
We have a C# project that recently started to run into a few scaling issues. That search can slow down a little when you get a lot more records isn’t so surprising, but when your basic individual item retrieval starts getting sluggish, it’s time to worry. We were using Entity Framework as the ORM layer, but I suspect this would apply for anything in C#.
LINQ, and the overuse of abstraction is a lot of the problem here. I’m a big fan of LINQ, but as with all abstractions, it fails, at least a little, and every so often it fails a lot. Instead of doing a
and expecting that the results to be identical. Well, to some extent they are, but the intermediate steps can be very, very different, and it depends on your original data types.
LINQ is data-structure agnostic as far as it can, and being able to call
on an array is pretty funky, but there’s a whole world of LINQ providers and deep magic behind the curtain, and they’re not all very efficient. One of the pairs of interfaces that people don’t pay enough attention to is IEnumerable and IQueryable (generally the generic versions thereof, but the distinction is irrelevant for the moment). If you’ve got only a handful of records, or even quite a lot of records and a reasonably powerful machine, the difference between the two is unimportant. We’d been using the IEnumerable interface (backed by an ObjectQuery on the application side, and a simple List in the tests), which turned out to be the wrong option. IEnumerable as it turns out does almost everything in memory i.e.
results in the whole of FooTable getting loaded into memory, then the Single() function gets run across those items in memory. IQueryable on the other hand, treats any reference to a data table as merely that, and only executes a query once you call certain functions that actually want objects back (Single, First, ToList, etc), which generally results in a lot less data being pulled in.
Those of you running to your codebase and replacing every IEnumerable in there with IQueryable should slow down a second. This efficiency comes at a price, namely the need to be able to translate LINQ code into SQL. If you’re just using relatively simple built-in C# methods, everything might just work. If however you stray from the supported function list, it’s a bit more of a problem. If you’re way off the path and making references to your custom getters and setters, life gets a bit more interesting. There’s still a way, using the magic of
can be best described as a “uncompiled”/AST
(and indeed, there’s a Compile() method). Luckily they can for the most part be built just like a
) and if they’re built inside an object and then returned they can even play around with private variables. Again, they need to only use the allowed functions, but now you can return an
that works on any instance of that class. What happens is then the LINQ provider can “compile” those
into SQL code, whereas it can’t do the same for a
LinqKit lets you then do things like actually call those
’s inside of the LINQ query using
where MyFunc is an
. LinqKit also incorporates PredicateBuilder so you can combine multiple
’s together easily, and you generally need its
method on your data tables before they get called, but it’s not too bad once you get the hang of it.
Of course, sometimes you want to do something a bit fancier and the allowed function list isn’t quite enough. This is still doable, but you’re going to have to tell Entity Framework how to translate your function into SQL. There’s a few suggestions out there already about how to do this (and for our example case, SqlMethods.Like gets close, but appears to be a bit picky about where it’ll work), but here’s an improved example.
- Find your .edmx file, and locate the
<edmx:Runtime><edmx:ConceptualModels><Schema Namespace="Your.Namespace" ...>
. Note the “Your.Namespace” bit, and then insert the following just after the<Schema>
- Create a new StringExtensions.cs with this in replacing “Your.Namespace” as appropriate
), and it’ll do the right thing whether it’s dealing with a database or something in-memory. If it’s the former, the SQL fragment from the .edmx will be used; and the latter will use the C# code.