3/17/2010 Update: I have submitted this problem as a bug on Microsoft Connect. If you've been bitten by this bug, vote it up (click the “I can too” link indicating that you can reproduce the bug), and leave some helpful comments.
I'll keep the introduction short and sweet. Everyone says you'll eventually get burned by Microsoft data access strategies, because they eventually get abandoned. Well, my turn finally came. And it’s not that Entity Framework is being abandoned, it’s just that it DOESN'T WORK.
Regarding inheritance mapping, EF advertises that it offers you 3 solutions:
- TPT: Table per type
- TPH: Table per hierarchy
- TPC: Table per concrete class
I'll be focusing on TPT: Table per type.
Let’s say I have a line of business application where users answer a bunch of Yes/No questions, and depending on their answers, fill out a series of forms, each form having a unique set of fields that we want to capture end-user data for.
Q: Did you open a bank account?
Name of bank:
Date account opened:
Purpose of account:
EVERY form has a common set of fields as well: The end-user that the particular instance belongs to, the date it was created, whatever. Now, when I have a
User object, and I want to see all the forms he’s filled out, I don't want to have to access 20 different collections of the
User object, one collection for each form type:
User user = User.GetByKey(20);
I want an
IList<FormBase> or something similar. This means I need to use inheritance, and each form needs to be derived from my
FormBase class (actually named
FormInstance in this case).
If you need this type of design, or think that your software at some point in the future will require this type of design, DO NOT use the Entity Framework. The Table-Per-Type inheritance is 100% unusable. If you feel that one of the other forms of inheritance suits you, well, you may be crazy, but just the same….tread lightly. (And please don't have 50 different discriminator values in a table that’s 200 columns wide…the SQL gods will strike you down).
So what’s wrong with EF’s TPT inheritance? The SQL that EF generates. With some simple testing (2 or 3 subclasses) everything may appear to be fine. In fact, you may even be impressed that EF manages to fetch all the data you need in a single trip, indeed, in a single query. Wow, their SQL generation must be top notch… But once you start adding some more subclasses, surprise! Your application grinds to a halt. So how bad is it? I decided to find out definitively.
The setup is very basic. Here’s the Linq query I'm executing against my model:
var q =
from insts in context.FormInstances
List<FormInstances> instances = q.ToList();
Very basic. Give me the
FormInstance is the base class. I started with just a single derived class, ran my test 500 times, averaging the time it took to execute my simple query. Then I added a 2nd derived class, a 3rd one, etc . The base class has less than 10 fields. Each derived class (to keep things consistent) is exactly the same, with 3 simple fields. Also, the tables are empty. Yes, empty. No data! So the performance numbers here strictly represent the time it takes EF to generate its abhorrent SQL, and the amount of time it takes SQL Server to grind it out.
Most of these are averaged over 500 executions. Some of the later ones were fewer, because it started to take over an hour to execute.
|Num Derived Classes||Avg ms for query generation and execution||Lines of SQL Generated|
So what happens when your application has 30 subclasses? It takes 108 seconds to generate and execute the SQL to retrieve the records in them! That’s 3.6 seconds PER FORM! And that’s when the query returns ZERO RECORDS, because the tables are EMPTY. Why does it take so long? Well, look at the 7,862 lines of SQL it generates, to start with (I won't post it here; I'll spare you that).
Here’s what it looks like graphed:
Ouch! That sure doesn't look linear folks… And guess what, my “I got burned” experience doesn't have 30 form tables. It has 89. Based on the trend outlined above, it would take 97 minutes to generate and execute the SQL to select the data from those 89 tables. Funny, because 30 SQL statements querying those tables, even with
WHERE fk IN(SELECT id FROM BaseTable) clauses on each one only take 132ms on the same computer… And it returns the same data.
Pragmatists vs Idealists
Joel Spolsky has some great words (scroll down to “The Two Forces at Microsoft”) regarding two developer camps at Microsoft; the Pragmatists, and the Idealists. Windows XP was a product of the Pragmatists; Vista, of the Idealists. Enough said. While the SQL generated by EF in cases such as these may call to the inner relational mathematician idealist in us, the result is impossible to read or easily understand, takes a long time to be generated, and even longer to execute. In other words, it’s 100% unusable.
Now, as a matter of research, I have to create this same test in NHibernate, to see what the generated SQL looks like, and to see how well it performs. I'll update this post when I have that data. Until then, if you decide to use EF’s inheritance “features”, here is a list of movies that are all 97 minutes long, so that your end users will have something to do while your site grates off your SQL Server’s face from the comfort of their browsers.