This is the third article in the ORM anti-pattern series.

ORM anti-patterns - Part 3: Lazy loading

We use lazy loading because we do not want to think about how a fetched entity is going to be used: we materialise a database record into an entity and as it goes through business logic related entities required by business logic are traversed and fetched from database. This certainly works, business logic is not cluttered with code to fetch the required entities, and the fact that related entities have or have not been materialized from database is nicely abstracted away.

Some Arguments for Lazy Loading

Some people believe lazy loading helps achieve better performance. Lazy loading might help achieve better performance depending on your design. In other words, with some architecture and/or within some design limitation it is better to lazy load a graph than to eager load it!

If you are loading an object graph without knowing how it is going to be used, then you are better off leaving some parts of it out to be fetched later and upon request as they may never be requested. For example you have a generic method in your app that fetches a customer without knowing how and in what context the method is going to be used. Then the result could be used to either edit customer details or edit his/her orders or check his/her eligibility for an upgrade and so on and so forth. Of course if you are using the same method for all these cases then you do not want to eager load the graph because it is going to load a lot of data while only a small subset of it is required in each case. Also in cases like this, the graph usually does not have a clear end. In other words, due to the generic nature of your method and the way it is used, you do not know in which path and how deep you should prefetch related entities. Instead you want to read the customer into memory, and then as the caller uses the record it will fetch bits and pieces it requires. This way you will not load things you do not need and you have spread the data fetching load over a longer period which gives you an illusion of better performance.

Using the same method to fetch customer details, without knowing about the context in which it is going to be used, is not code reuse; it is poor design. If you have several contexts or different needs then you need to have several explicit paths in your code where you know everything you need in advance and can load it in one go (as always there are exceptions).

You may say: "I know it is an illusion of better performance but I need it for the sake of users.". This raises an alarm in my head that perhaps you are binding your UI to ORM entities (to be discussed in another post). Unless your application is forms over data, users of your application should never be exposed to how your object graph is being loaded.

Lazy loading might help

Lazy loading can provide some help when it comes to forms over data applications and when user gets exposed to the illusion explained above. For example, user opens a customer's details, when they click on the orders tab, orders are lazily loaded and if they want to see details of each order, order lines are loaded.

Even in forms over data there are still quite a few issues. I will explain some of these issues in a future post about binding UI to ORM entities. Other issues are explained below:

Problems

I have a few issues with lazy loading that I will try to explain below:

1. Inconsistent state

You load a customer, then you lazy load its orders. At the same time someone changes one of the orders and adds an order line to it. You then lazy load order lines and get the latest state of lines while your order entity is not refreshed and contains the old data. From this point forward you are working with an inconsistent object graph in memory.

Regardless of how lazy loading is implemented you are going to have this problem. Simply put, when you use lazy loading the object graph you are working with is potentially inconsistent.

We are very strict about ACID property of relational databases and we use transactions on database to make sure that database is in a consistent and valid state before and after a transaction; but when it comes to loading that state into memory we do not quite care if it is being loaded consistently. We are happy to load parts of our graph before a transaction and some other parts after changes are made easily breaking consistency forced by ACID attributes of our relational database on the application level.

2. Unnecessary database hits

Database call is one of the most expensive operations in any software solution, and yet we liberally write code to hit database many times in a simple operation! Between 1 to 3 database hits, in my experience, is enough for most operations and if you are hitting database more than that, then there is a good chance you are doing it wrong. Some operations could take more hits, but they are more of an exception than a rule.

Here is two examples of unnecessary database calls I encounter frequently:

Great abstraction leads to stupid mistakes

Frameworks are very good at abstracting and hiding away a lot of complexities from developers, and lazy loading is one of those things that is abstracted away very neatly. This abstraction leads some developers to make stupid mistakes. For this I am just going to use an example I have come across a lot. Assuming the following class dependencies:

class Customer
{
    public IList<Order> Orders { get; set; }
}

class Order
{
    public IList<OrderLine> OrderLines { get; set; }
    public Customer Customer { get; set; }
}

class OrderLine
{
    public Product Product { get; set; }
}

class Product
{
}

I have seen code like the following a lot:

private static void SomeOperation()
{
    // some customers loaded into 'customers' variable
    var customers = LoadSomeCustomers();

    foreach (var customer in customers)
    {
        foreach (var order in customer.Orders)
        {
            foreach (var orderLine in order.OrderLines)
            {
                // do something here perhaps with orderLine.Product
            }
        }
    }
}

When lazy loading, a nested loop like this could simply lead to several hundred database calls! First, customers are loaded, and then inside the first loop Orders are loaded once per customer, and then inside the second loop OrderLines are loaded per order and so on and so forth.

Fetching the same record so many times

Given the above class dependency, here is another example of unnecessary database hits:

private static void SomeOperation()
{
    // load some orders into 'orders' variable
    var orders = LoadSomeOrders();

    foreach (var order in orders)
    {
        var customer = order.Customer;
    }
}

A very contrived example, but say you have 100 customers with average of 50 orders. If you load 200 orders into memory and call the above method it is going to hit database 200 times to load customer while those 200 orders could have been placed by 5 customers; i.e. you only need at most 5 database hits to load all needed customers.

To be fair, this happens when you do not have an Identity Map (e.g. you are using Active Record) or caching on you ORM level. Every time Customer property on order is called ORM checks to see if that property has already been loaded and if not it loads it from database. In the presence of Identity Map ORM holds a collections of all loaded entities and each time you ask for an entity it checks to see if it has already been loaded. If it supports caching then it may return the already loaded entity. In the absence of an Identity Map ORM does not know that that entity has been loaded on another path or graph, and hits the database again to load it. This problem exhibits only when you are traversing your object graph bottom up.

3. Business requirements lost in abstraction

Lazy loading sometimes hides some of the explicit business requirements away. If to perform some business logic I need to load a few things from database, I need that to be an explicit action and I want it to be readable. I want the next programmer taking over my code to know that to achieve that goal we need to load a few things from database. With lazy loading you do not know when something is loaded; you just call properties one after another without any real business meaning.

This can be improved with strict coding practices. Programmers can still use explicit methods to do the same with lazy loading; but my experience is that when that convenience is there it is abused and it is very hard to trace because it looks like a very normal property call. If lazy loading was not there lazy programmer would end up putting that code and the rest of the business logic inside one big method; but it would be very easy to find and refactor.

Solutions

1. Domain Aggregates to the rescue

Domain Driven Design is a beautiful book that every programmer should read at least once. The book is full of great concepts and techniques; but the one that stands out for me is the concept of Aggregates. From the book:

"*It is difficult to guarantee the consistency of changes to objects in a model with complex associations. Invariants need to be maintained that apply to closely related groups of objects, not just discrete objects. Yet cautious locking schemes cause multiple users to interfere pointlessly with each other and make a system unusable.

Put another way, how do we know where an object made up of other objects begins and ends? In any system with persistent storage of data, there must be a scope for a transaction that changes data, and a way of maintaining the consistency of the data (that is, maintaining its invariants). Databases allow various locking schemes, and tests can be programmed. But these ad hoc solutions divert attention away from the model, and soon you are back to hacking and hoping.*"

Aggregates define consistency boundaries. You can eager load your entire aggregate in one database call. This not only reduces the number of database calls but it also avoids the problem with inconsistent in-memory state.

I am not going to discuss this further because I am sure I cannot do justice. If you are interested you should read the book.

2. Prefetch all entities you need

Even if you are not using DDD, you should always know how you are going to use an entity before fetching it, and thus you should prefetch/include related entities you are going to need. Do not use GetCustomer method from everywhere in your code you need something related to customers. You should create more explicit methods that are context-aware and from those methods you should prefetch all required entities.

In other words, design your application so whenever you are loading an entity from database you know exactly what you need and how that entity is going to be used; so load everything you need upfront. If you are not loading something upfront it means that it is not going to be used. So you will not need lazy loading because everything you need is already loaded, and ONLY things that are needed in that specific context are loaded – nothing more, nothing less.

3. Implement explicit methods with business meaning to load required entities

There are some scenarios where you cannot know in advance what you are going to need. For example, you may need to load and check the last 100 orders for silver customers before authorizing a large payment while for gold customers you do not. In cases like this I prefer to load required entities explicitly from my business logic code to make that requirement very clear and readable in the code. A side benefit of this is that I will not need lazy loading support from ORM.

I must say that this is much harder to implement in Active Record because, in the absence of identity map and a proper unit of work, every object graph only tracks its own changes, and if you load an object explicitly and not through lazy loading then its changes are not saved as part of the main object graph and have to be handled explicitly. So you have to remember to call save on all separately loaded object graphs you have changed (as part of one transaction). But hey, we are not going to use Active Record so let's not talk about it ;-)

4. Use caching

If you are using Lazy Loading and it is too late to change then caching could really help. Recently I added a cache with 5 second sliding expiry to a property call which reduced the number of database hits significantly. The call that took 600 database hits now only needs 5 hits, because the other 595 calls were made to load the objects that had already been loaded. Caching could significantly decrease the number of database calls in your application; but it also means that you use potentially stale data. (You should always think very carefully before caching. This is not an article about caching and I am not going to dig deep in it; but it is very important to note that caching could cause inconsistent/corrupt in-memory and ultimately database state just like the one I explained above on lazy loading).

When you are using caching to avoid unnecessary database calls, that is when the entity being lazily loaded has just been loaded from database, then you should cache your entities for a very short period of time. Cached items' expiry should be shorter than the operation response time. For example, if you have an operation that takes 10 seconds to run and hits database 1000 times due to lazy loading and if you know a lot of those calls are unnecessary and are to load a very small number of entities, then you may cache those entities for 10 seconds. If your operation response time is reduced to half a second, then you can reduce your cache expiry to half a second. Also my personal preference is to use a cache key like LazyLoadCustomerForCheckUpgradeEligibility-CustomerId-{PK goes here} and use this key/item only for this operation.

Conclusion

Lazy loading may provide a tiny bit of benefit in some rare cases; but it is being used almost everywhere an ORM is used! The way it is used is doing far more harm than good and I believe the good it is doing is not justifiable at all. In my opinion, lazy loading should be limited to Active Record pattern, and Active Record pattern should only be used in forms over data applications. Even in that case lazy loading should be used only when there is no high concurrency requirement or you will have the risk of working with inconsistent state. All the other issues explained above will also apply.

Explaining all of the edge cases and providing code for everything I explained above would be rather impractical. If you have used ORMs and lazy loading then I am sure you know what I am talking about, and hopefully the solutions I provided above could help you avoid some of these issues. If you still have any questions feel free to post in the comments.

Hope this helps.

ormanti-patternsanti-patternsconsistency
Posted by: Mehdi Khalili
Last revised: 16 Feb, 2011 07:50 PM

Trackbacks

Comments

Robert Wagner
Robert Wagner
14 Feb, 2011 02:25 AM

@Mehdi,

Finally got around to reading your series, very good so far. I got a further question about lazy loading.

Using NH and your example (Customer/Orders), I take it you would setup your mappings to match your structure above. In that case, with lazy loading off, if I select a customer, would that then read out all the Orders, Order Lines and related products?

If so, what would you do for a screen that only listed the customer information? It seems wasteful to load all the order information for that screen.

Rob

14 Feb, 2011 07:31 AM

@Robert

If I read you correctly, your example is about using Lazy Loading when UI is bound to orm entities. As I mentioned above I think this is an antipattern itself (unless for forms over data - FOD). For FOD lazy loading is inevitable (and as mentioned above provides value because you do not know how your entities are going to be used). While I think FOD could provide a lot of benefits; I think many applications are inappropriately implemented with that design. I have a post coming up on that too.

If your entity is going to be used in a context of a business operation then you may find the fifth part useful.

I hope this answers your question; otherwise please feel free to leave a comment and correct me.

Robert Wagner
Robert Wagner
14 Feb, 2011 10:25 PM

@Mehdi,

In the scenario I am talking about doesn't involve UI, rather the data would be read out and then transformed into somekind of Dto/ViewModel to be passed back up the stack. Perhaps I'm misunderstanding how the PM is used. I image one would select an order from the db, then transform it into the Dto/VM.

With some implementations of LINQ one could hydrate the Dto directly without first hydrating the PM, but I don't think that is possible without it. How would you get the required Dto without hydrating the whole graph of the PM?

Rob

15 Feb, 2011 02:28 AM

The code to read the data and transform it into DTO would have to know what is required.

For example there may be a method called GetCustomerOrderDetails which returns a DTO or VM comprised of customer record and some of his orders. That method would have to know what is required on DTO and would have to populate it with or without lazy loading. The fact that UI is bound to objects that cannot lazy load means that the logic to load the DTO is put elsewhere. You know the fetching logic; then why cannot you just eager load what you need in that method? Am I missing anything here?

Robert Wagner
Robert Wagner
15 Feb, 2011 03:48 AM

@Mehdi,

Eg: // This line will load all the orders and OrderLines var data = session.CreateQuery("from Order").List().Cast().ToArray();

// I only need the order records to copy the order number in var myItems = data.Select(d => new OrderSummary() { OrderNumber = d.OrderNumber });

I'm not that familiar with NH, perhaps I'm not doing it right :). I'm assuming no access to LINQ to NH, otherwise L2NH could construct the correct query.

Perhaps I'm misunderstanding what you mean by "Don't use LazyLoading", I'm taking it as turning off lazy loading in (say) NH, but perhaps you mean that we should eager load the entities we need and then not access any collections that would be lazy loaded.

Rob

15 Feb, 2011 06:37 AM

@Rob

Is OrderNumber an entity? If yes, then if there are 50 records in your data collection then you are going to hit database 50 times (once per record/instance). And that is what I meant by unnecessary database hits!

If OrderNumber is not an entity, then would you mind providing me with an example of your code needing Lazy Loading? I am not very good with NH either; perhaps between the two of us we can find a good solution ;-)

David
David
12 Jan, 2011 08:08 PM

I have seen code like the following a lot:... private static void SomeOperation() { // some customers loaded into 'customers' variable var customers = LoadSomeCustomers();

foreach (var customer in customers)
{
    foreach (var order in customer.Orders)
    {
        foreach (var orderLine in order.OrderLines)
        {
            // do something here perhaps with orderLine.Product
        }
    }
} }

Is your code really regularly loading your entire customer database and all their orders into memory?

12 Jan, 2011 09:15 PM

@David: That is an example of improper use of lazy loading. It does not have to load the entire customer database. Even if you load 10 customers into 'customers' variable with each customer having an average of 5 orders and each order an average of 2 order lines, with that nested loop you are going to hit database 100 times!

My point is not that you should not load some customers into memory; but that whenever possible you should get all you need in one go.

David
David
13 Jan, 2011 12:28 AM

Sorry, probably didn't really express myself very well there. I guess my point was I saw the anti-pattern in your code being the nested loop rather than the lazy loading.

To ask a more useful question then... If you don't use lazy loading and you have an Orders collection (or another collection) on a customer, how do you avoid loading the entire collection when, eg, you just want to change the customer's name?

13 Jan, 2011 03:19 AM

It is a rather broad question to answer and it is not related to lazy loading. The first thing to consider is that you have to have shown customer details to a user or to customer for changing. So we should first see how that data is displayed on the page/form. I will explain this in a later post. From that point forward all you need to do is to call a method to save changes (I will explain this in another post too). For now, I would really like one of the following commands depending on the context:

void ChangeCustomerName(int customerId, string name);

or

void ChangeCustomerDetails(CustomerDto customerDetails);

or commands with the same names depending on your architecture. Using customerId you load only customer entity (one record, one entity), apply changes and save it.

David
David
13 Jan, 2011 03:27 AM

Sure. But doesn't that mean that when you load the customer entity you also (if you don't use lazy loading) load all the related objects?

13 Jan, 2011 08:17 AM

I am inside a method called ChangeCustomerName. I know exactly where this method is being called from and what it needs to do. Nothing is also being returned from the method; so it does not matter what I do or don't load with customer.

If to change customer name I need to load his/her nationality and validate a few things then I load the nationality property (Country entity) with the customer; otherwise if I only need to change name, then I only load customer. With that explicit method with void return I am in control and I load whatever I need for that bit of business logic.

I hope it makes sense.

David
David
13 Jan, 2011 08:53 AM

So given you have a customer object with an orders property, how does your client code know whether or not you've loaded the orders collection on the customer?

Apologies for pursuing this by the way. I've not so far had any serious issues with lazy loading, and conversely I've found it quite useful, so I'm genuinely curious as to how you manage your domain model without it.

It seems to me that without lazy loading your choices are: (a) client code has to be aware of how an object has been loaded, (b) you have different representations of your entity (in this case the customer) to fit different contexts (I guess in DDD these would be Bounded Contexts). So in this example there'd be one Customer class with an Orders property, and another without it. (c) you move to a more transaction script based system.

Sounds to me like you're heading down route (a), but perhaps I've misread?

Anon
Anon
13 Jan, 2011 09:02 AM

@David When you are using lazy loading and when you write instance.prop, then ORM will hit the DB to fetch the prop and not before that (var customers = LoadSomeCustomers(), initiates a proxy and not the real data). That's why Mehdi mentioned that we will have 100/1000 of calls if we don't know what we are doing here. To test that just try lazy load a data-grid to show some rows and profile the DB hits. And if you just want to edit instance.prop, don't use eager loading.

13 Jan, 2011 09:52 AM

@David From your comments it seems like you are binding your UI to your ORM entities. (To avoid running around circles) Is that the case?

David
David
13 Jan, 2011 08:52 PM

@Anon - for grids I generally use objects loaded from views. This way for large data sets I get exactly what I need. @Mehdi - for input forms I bind the views to command objects.

14 Jan, 2011 08:34 AM

David- if you could please provide some high level view of your design then we will have better idea and can discuss the usage of lazy loading better.

David
David
14 Jan, 2011 10:40 PM

Mehdi - actually that's what I was hoping you would do. Provide some high level view of your design and discuss how you do the things that lazy loading would otherwise do for you (so, do you have different models for different contexts, do you flag what properties are loaded, does the client just know what properties have been loaded, ...).

It's fine if you don't want to do that, I just thought it could add value to your article.

15 Jan, 2011 09:33 PM

I do not have A design that I am happy with and apply everywhere, and that is kind of the message behind this whole series: we should not apply the same techniques and tools to all problems. In fact I have used and will use some of the patterns I am discussing here as anti-patterns in some projects.

That said I am planning to provide one example of a design I would be happy with at the end of the series.

Anon
Anon
16 Jan, 2011 05:11 AM

@David Using views with lots of joins is called eager loading in ORM world. So you are NOT using an ORM. One of the importance of ORMs is to separate your APP dependencies to a specific database. Then you will be able to switch to a different Db easily.

David
David
16 Jan, 2011 10:57 PM

@Mehdi, sounds good!

@Anon, I either don't understand you or I don't agree with you. When did I say I was using lots of joins? How does eager loading imply I'm not using an ORM? How does writing a View tie me to a specific db - which dbs do you use in practice that don't support both views and standard SQL? (And how often do you actually switch dbs mid-project anyway?)

blog comments powered by Disqus