Ideas for Entity Framework Core Users that You Might Consider

In this post, we’ll look at some pitfalls and ideas EF Core users like yourself may want to consider when developing an application.

Entity Framework Core (EF Core) is a ground-up rewrite of Microsoft’s object-database mapping framework. Getting started with EF Core is relatively straightforward, but mastering it can take an entire career.

As a long-time user of earlier Entity Framework versions, I started writing this post with many preconceived notions about pitfalls from that experience. While some conceptual pitfalls have made it from previous versions of EF, the EF Core team’s care in addressing previous problems was evident to me as I read through the EF Core documentation and worked on this post’s code samples.

Knowing that EF Core is different, I took some time to explore the EF Core documentation and determine the existing pitfalls you might encounter when starting your EF Core journey. I’ve compiled and consolidated this information to help you get a general overview of potential mistakes you may encounter.

Though this post is packed with important information, it is by no means conclusive.

Global vs. local tools

EF Core has gone through several iterations to deliver a repeatable command-line interface. The upgrade from previous versions might tempt us to install the EF Core CLI tooling globally on our development environments. While we can install the tooling globally, it’s best to version the CLI tooling and EF Core together in your solution.

To install the tool locally for each solution, we first need to create a tool manifest. From the solution folder, we can run the following dotnet command.

dotnet new tool-manifest

This command will create a new .config directory with a dotnet-tools.json file. We can then install the CLI tools to our solution locally. From here, we can install the EF Core CLI tools.

dotnet tool install dotnet-ef

The EF Core documentation recommends that developers use the tools package that matches the runtime packages’ major version. The latest versions are 5.0.1 for both the CLI and EF Core packages, as of the publication of article. Installing the tools to the solution helps us avoid drift and conflicts when working on multiple solutions with varying versions.

Adopting non-conventional naming

EF Core 5 runtime packages contain conventions around schema naming, which are typically considered best practices. EF Core allows developers to work with organizational units to adapt to different naming styles. Still, modifications to the naming conventions within EF Core come with significant maintenance costs and the potential to introduce mapping bugs. Developers should embrace EF Core defaults as much as possible to avoid strange edge cases and exceptions.

Depending on your organization and responsibility delegation, it may be best to have a conversation about EF Core naming conventions and adopt them as an understood methodology to design and evolve schema. Database administrators and developers should work towards delivering value for users. If it’s impossible to do so due to organizational reasons, consider the next section as a possible solution.

Forgetting about reverse engineering

When working with a legacy database or complicated organizational power dynamics, we may be tempted to handcraft each entity and define every relationship within the DbContext.OnModelCreating method. Developers should first look at the scaffolding features built into the EF Core CLI tools.

We can ask the CLI to scaffold entities and relationships from an existing database using the scaffold command.

dotnet ef dbcontext scaffold "Data Source=(localdb)\MSSQLLocalDB;Initial Catalog=Chinook" Microsoft.EntityFrameworkCore.SqlServer

We must specify the appropriate database provider when scaffolding so that the scaffolding can respect the SQL syntax of our database.

We can also selectively scaffold parts of an existing database by specifying table names.

dotnet ef dbcontext scaffold ... --table Artist --table Album

Targeting tables are useful when building microservices or .NET applications that are designed to perform a niche task.

There are drawbacks to scaffolding that we can avoid by starting with a Code-First approach. We can define logical relationships in C# that we cannot represent through schema alone; these concepts include inheritance hierarchies, owned types, and table splitting. Additionally, concurrency mechanisms are not reverse engineered by the scaffold command. Finally, nullable reference types are also not reverse engineered, and we must add them in after the fact.

The biggest concerns of reverse engineering an existing database come from concept shifts that occur from a diverging code base and database schema. If the database is relatively stable, this might not be an issue. Still, an actively progressing database may cause developers headaches as they try and keep up with schema changes.

Not all databases are equal

EF Core provides a common interface for multiple relational database engines. These providers include SQLite, Microsoft SQL Server, PostgreSQL, and Oracle. The similarities between each provider might lull us into a false sense of security about the interchangeability of these engines, but it is essential to recognize their differences.

The EF Core query abstraction is a leaky one, with many of the database concepts finding their way into the interface. The conceptual leaks are not bad, but they are something to be mindful of as we start using features like DbFunction and FromSql to execute database-engine-specific features.

When testing, the EF Core documentation recommends three options:

  • Running tests against the same database engine used in production.
  • Running tests against some other database engine that’s “easier” to manage.
  • Using test doubles to avoid using a database at all.

The best option is to run our tests against a production-like environment, as we’ll eliminate issues that could arise from engine variations. It can be difficult depending on the database, but with containerization technologies like Docker, the problem of managing isolated instances has never been easier for developers.

Some folks may want to test against a different engine, for instance SQL Server might be the target database engine, but tests run against a SQLite instance. While possible, this approach leads to more problems like SQL syntax differences and having to manage a different migration strategy entirely.

Finally, stubs or mocks might seem like a “good” idea, but database engines are filled with a degree of nuance and complexity that’s difficult to fully understand without a significant time investment. The mismatch between the assumed behavior of the engine and actual behavior can lead us to introduce bugs.

Queries not including related data

EF Core moved away from lazy loading as a default feature. Navigation properties still exist as part of the Code-First design approach, but many of these navigation properties require explicitly telling EF Core to load the data.

Given the following EntertainmentDbContext, we have a Movie model.

public abstract class Production
{
    public int Id { get; set; }
    public string Name { get; set; }
    public DateTime Release { get; set; }

    public List<Character> Characters { get; set; } = new List<Character>();
    public List<Rating> Ratings { get; set; } = new List<Rating>();
}

public class Movie : Production
{
    public int DurationInMinutes { get; set; }
    public double WorldwideBoxOfficeGross { get; set; }
}

EF Core will not load the Characters and Ratings navigation properties in the following query.

var movies = database.Movies
    .OrderByDescending(x => x.WorldwideBoxOfficeGross);

Only after using the Include method will we see the collections populated.

var movies = database
    .Movies
    .Include(m => m.Characters)
    .Include(m => m.Ratings)
    .OrderByDescending(x => x.WorldwideBoxOfficeGross);

Forgetting to include important data can lead to panic when users think the system is not saving data or has deleted existing data.

Tracking read-only queries

EF Core implements the Unit of Work pattern, keeping track of changes to our entities and then ultimately sending those changes back to our database engine. The approach has many advantages when modifying data, but comes with a noticeable performance cost.

Instantiating a DbContext instance is a relatively cheap operation, where most of the cost of using EF Core comes from expression parsing, network operations, and object tracking. We can opt-out of object tracking by using the AsNoTracking method on our LINQ queries.

var movies = database
    .Movies
    .AsNoTracking()
    .OrderByDescending(x => x.WorldwideBoxOfficeGross);

Opting out of object tracking can speed up read scenarios like read-only endpoints on an HTTP API. There is a potential risk of using the AsNoTracking feature. While it reduces the cost of processing entities for object tracking, it can add to memory overhead. Let’s take a look at an example.

var movies = database
    .Movies
    .AsNoTracking()
    .Include(m => m.Characters)
    .ThenInclude(c => c.Actor)
    .AsNoTracking()
    .OrderByDescending(x => x.WorldwideBoxOfficeGross);

In the case of this query, an Actor may be in multiple movies. Since EF Core is no longer tracking entities, EF Core will allocate an entity for the same Actor multiple times.

We can opt into identity resolution while opting out of object tracking by using the newly added AsNoTrackingWithIdentityResolution method.

var movies = database
    .Movies
    .AsNoTrackingWithIdentityResolution()
    .Include(m => m.Characters)
    .ThenInclude(c => c.Actor)
    .AsNoTracking()
    .OrderByDescending(x => x.WorldwideBoxOfficeGross);

All instances of the same Actor model will point to the same object, thus saving us from over-allocating.

Do not enable lazy loading

Lazy loading is a feature that has caused countless production issues, and the EF Core team has rightfully made it inconvenient to enable this feature.

It’s highly advised not to enable this feature, as navigation properties become opaque mechanisms that trigger unnecessary and costly roundtrips to the database.

To enable lazy loading, folks have to install the Microsoft.EntityFrameworkCore.Proxies package.

dotnet add package Microsoft.EntityFrameworkCore.Proxies

Then, from the OnConfiguring method of a DbContext, we can call the UseLazyLoadingProxies method.

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) => optionsBuilder .UseLazyLoadingProxies() .UseSqlServer(myConnectionString);

Let’s look at an N+1 example with our Movie model.

var movies = database
    .Movies
    .OrderByDescending(x => x.WorldwideBoxOfficeGross);

foreach (var movie in movies)
{
    // a database call per Character
    foreach (var character in movie.Characters)
    {
        // database call for Actor
        Console.WriteLine(character.Actor.Name);
    }
}

While it may look like we executed a single query, as we loop through our Movie entity, we are making an additional query for each navigation property. For every Movie (N), we are making a separate call to the database for every related Character entity (+1). And in this example, another separate call for Actor. This is highly inefficient! Friends don’t let friends enable lazy loading.

Forgetting about projections

As much as the EF Core abstraction looks like C#, it’s best to treat it more as the database interface. The models defined in our DbContext represent our schema, and should be used to write more efficient queries. Using the entities is equivalent to performing a Select * From Table query. It’s not wrong to select *, but by employing the Select method, we can use projections, which allows us to:

  • Reduce the amount of data transmitted over the network.
  • Improve SQL execution times.
  • Explicitly return only the data we need.

Let’s look at projecting our Movie model into a result that includes characters and actor data.

var movies = database
    .Movies
    .AsNoTracking()
    .OrderByDescending(x => x.WorldwideBoxOfficeGross)
    .Select(m => new
    {
        m.Name,
        Characters = m.Characters.Select(c => new
        {
            c.Name,
            Actor = c.Actor.Name
        })
    });

Instead of using anonymous object results, we can also project directly into data transfer objects meant to be consumed by callers. In this example, we select directly into our HTTP API response models for maximum efficiency.

var movies = database
    .Movies
    .AsNoTracking()
    .OrderByDescending(x => x.WorldwideBoxOfficeGross)
    .Select(m => new MovieResponse
    {
        Id = m.Id,
        Name = m.Name,
        Characters = m.Characters.Select(c => new CharacterResponse
        {
            Name = c.Name,
            Actor = c.Actor.Name
        })
    });

Forgetting to add indexes

It’s easy to forget the scale of the data we’ll be dealing with in production during development. Additionally, as our application grows and we add features, our query needs will change. Most database engines will add indexes to primary and foreign keys, but we may also have columns that provide other partitioning opportunities.

With EF Core, we can apply an index using the ModelBuilder instance within the OnModelCreating method. As of writing this post, indexes can only be applied using the ModelBuilder, and there is no data annotation attribute equivalent.

modelBuilder.Entity<Movie>()
    .HasIndex(b => b.Release);

EF Core has many options for specifying an index, from uniqueness, naming, filtering, multiple-column, and applied filters.

Indexes are an essential part of optimizing database access. It’s vital to measure query performance and to apply indexes when necessary.

Running migration operations

EF Core supports a much-improved migration model compared to its predecessor, EF 6. The newer EF Core boasts version-control-friendly migration files, CLI tooling, and repeatable workflows.

We may want to let our applications run migrations on app startup, but its best not to do this for multiple reasons:

  • Expensive operations can exceed .NET’s default SQL timeout limit, throw exceptions, and leave applications in a broken startup state.
  • Multiple instances of your application may attempt to compete to apply migrations, causing thrashing and unnecessary contention.
  • Migrations are executed without quality check mechanisms and an opportunity for review.

EF Core can generate scripts that we can apply to a production database. The Microsoft documentation suggests SQL Scripts as the recommended approach to deploying schema changes to production environments:

  • Teams can review SQL scripts for accuracy and test for potential data loss.
  • SQL scripts can be used in conjunction with a deployment technology and integrate with a continuous integration process.
  • SQL scripts can be provided to a database administration unit and applied on off-peak hours.

We can use the same EF CLI tools to generate SQL scripts. The most powerful approach is by using From and To arguments.

dotnet ef migrations script [From] [To]

We can leave out the To argument, as we can imply that we want to generate up to the latest migration.

We should always approach migrating production data with extreme caution as any modification could lead to data loss. Automating the application of schema changes is optimal, as it removes the potential for human error.

Using table per type

EF Core has two inheritance models: Table Per Hierarchy (TPH) and Table Per Type (TPT).

The TPH approach uses a single table with columns for each property unique to an entity with a Discriminator column. In the following example, both Movie and Series share a Productions table.

public abstract class Production
{
    public int Id { get; set; }
    public string Name { get; set; }
    public DateTime Release { get; set; }

    public List<Character> Characters { get; set; } = new List<Character>();
    public List<Rating> Ratings { get; set; } = new List<Rating>();
}

public class Movie : Production
{
    public int DurationInMinutes { get; set; }
    public double WorldwideBoxOfficeGross { get; set; }
}

public class Series : Production
{
    public int NumberOfEpisodes { get; set; }
}

Movie rows will contain Null values for the NumberOfEpisodes columns, while Series rows will have empty DurationInMinutes and WorldwideBoxOfficeGross columns.

The TPT approach would put each respective entity into a separate table.

Properties that belong solely to a base type or derived type are stored in a table that maps to that type. Tables that map to derived types also store a foreign key that joins the derived table with the base table. — Microsoft Documentation

While it may feel like this would be a “cleaner” approach to data modeling, it comes at an expensive cost when querying across the hierarchy.

Let’s look at a straightforward query.

var highestRated = database
    .Productions
    .Select(x => new
    {
        id = x.Id,
        name = x.Name,
        avg = x.Ratings.Average(r => r.Stars),
        type = x.GetType().Name
    })
    .OrderByDescending(x => x.avg);

With TPH, we would generate an SQL Statement equivalent to querying a single table.

select * from [Productions]

With the TPT, we now would need to join multiple tables. The cost of joining numerous tables compounds as we add new entities to our hierarchy, to the point where our application performance suffers and we have some miserable users.

Developers should only consider using a TPT approach when there is no chance they’ll need to run a hierarchical query that involves multiple entities. Even in that scenario, though, what is entity inheritance achieving?

Shared columns duplication

When using inheritance, we usually define shared properties on base types, but there might be an instance where that’s not the case. Let’s look at an example where two models in our hierarchy share the same property name without inheriting from the base type.

public abstract class Vehicle {}
public class Motorcycle : Vehicle {}
public class Car : Vehicle
{
    public int NumberOfDoors { get; set; }
}
public class Plane : Vehicle
{
    public int NumberOfDoors { get; set; }
}

If not explicitly told to use the same column, EF Core will create a duplicate column for NumberOfDoors. Using ModelBuilder, we can use the same column.

modelBuilder.Entity<Car>()
    .Property(c => c.NumberOfDoors)
    .HasColumnName(nameof(Car.NumberOfDoors));

modelBuilder.Entity<Plane>()
    .Property(c => c.NumberOfDoors)
    .HasColumnName(nameof(Plane.NumberOfDoors));

Using the model builder interface to combine columns should add to the efficiency of SQL queries.

Deleting data

“Delete” is one of those words that can have a different meaning depending on our perspective. As users, we want deletes to account for mistakes and recover any information we accidentally removed. As developers, and in the instance of EF Core, delete is unflinching and unapologetic about removing our data from existence, permanently. When developing a new application, we may tend towards the convenience of hard deletes, but we can add soft deletes to our applications with a little effort.

EF Core ships with Global Query Filters that make implementing soft deletes straightforward. We first start by adding a boolean column of IsDeleted to our entity. When this column is true, then our data is soft deleted.

public abstract class Production
{
    public int Id { get; set; }
    public string Name { get; set; }
    public DateTime Release { get; set; }
    public bool IsDeleted { get; set; }

    public List<Character> Characters { get; set; } = new List<Character>();
    public List<Rating> Ratings { get; set; } = new List<Rating>();
}

In our OnModelCreating method, we can then apply a query filter that applies to any query performed on the Production entity.

modelBuilder.Entity<Production>()
    .HasQueryFilter(m => !m.IsDeleted);

We can write queries as we usually would with our query filter applied to any Production query.

Cartesian Explosion

Joins are an integral part of relational databases. They allow us to create rows from related tables. While a powerful feature, it does come with drawbacks. Take the following relationship.

Movie -> Characters -> Actor

A movie where an actor plays multiple characters will duplicate data from the Movie and Actor tables in our results. Let’s write a query and look at the generated SQL. Given the following LINQ statement:

SELECT "p"."Id", "p"."Discriminator", "p"."Name", "p"."Release", "p"."DurationInMinutes", "p"."WorldwideBoxOfficeGross", "t"."Id", "t"."ActorId", "t"."Name", "t"."ProductionId", "t"."Id0", "t"."Name0"
FROM "Productions" AS "p"
LEFT JOIN (
    SELECT "c"."Id", "c"."ActorId", "c"."Name", "c"."ProductionId", "a"."Id" AS "Id0", "a"."Name" AS "Name0"
    FROM "Characters" AS "c"
    INNER JOIN "Actors" AS "a" ON "c"."ActorId" = "a"."Id"
) AS "t" ON "p"."Id" = "t"."ProductionId"
WHERE "p"."Discriminator" = 'Movie'
ORDER BY "p"."WorldwideBoxOfficeGross" DESC, "p"."Id", "t"."Id", "t"."Id0"

Running our query on a database, we’ll see these results:

1,Movie,Avengers: Endgame,2019-04-26 00:00:00,181,2797800564.0,1,1,Tony Stark,1,1,Robert Downey Jr.
1,Movie,Avengers: Endgame,2019-04-26 00:00:00,181,2797800564.0,2,2,Steve Rogers,1,2,Chris Evans
1,Movie,Avengers: Endgame,2019-04-26 00:00:00,181,2797800564.0,3,3,Okoye,1,3,Danai Guira
5,Movie,Downton Abbey,2020-09-20 00:00:00,120,194051302.0,8,8,Violet Crawley,5,8,Maggie Smith
5,Movie,Downton Abbey,2020-09-20 00:00:00,120,194051302.0,9,9,Lady Mary Crawley,5,9,Michelle Dockery
3,Movie,Ip Man 4,2019-12-25 00:00:00,105,192617891.0,6,6,Ip Man,3,6,Donny Yen
4,Movie,Gemini Man,2019-11-20 00:00:00,116,166623705.0,7,7,Henry Brogan,4,7,Will Smith
2,Movie,The Lion King,2019-07-19 00:00:00,118,1654791102.0,4,4,Simba,2,4,Donald Glover
2,Movie,The Lion King,2019-07-19 00:00:00,118,1654791102.0,5,5,Nala,2,5,Beyoncé

As we can see, the movie information repeats. This repeated data can lead to what’s known as a Cartesian Explosion if there are enough tables involved within a query.

EF Core introduced an AsSplitQuery method that allows Entity Framework to split each related entity into a query on each table.

var movies = database
    .Movies
    .AsSplitQuery()
    .Include(m => m.Characters)
    .ThenInclude(c => c.Actor)
    .OrderByDescending(x => x.WorldwideBoxOfficeGross);

By adding the AsSplitQuery method, we now have three SQL statements for MoviesCharacters, and Actor. Looking at the diagnostic query string from EF Core, we see the following SQL.

SELECT "p"."Id", "p"."Discriminator", "p"."Name", "p"."Release", "p"."DurationInMinutes", "p"."WorldwideBoxOfficeGross"
FROM "Productions" AS "p"
WHERE "p"."Discriminator" = 'Movie'
ORDER BY "p"."WorldwideBoxOfficeGross" DESC, "p"."Id"

This LINQ query is being executed in split-query mode. The SQL shown is for the first query to be executed. Additional queries may also be executed depending on the results of the first query.

It’s worth experimenting with AsSplitQuery, as it can speed up queries and reduce network traffic by removing data duplication.

Constants in expressions

EF Core optimizes queries by caching LINQ Expressions, but first, it has to compile our LINQ expressions. This processing of LINQ expressions is noticeable in production environments. Take the following example.

var first = database.Movies.First(m => m.Id == 1);
var second = database.Movies.First(m => m.Id == 2);

These two queries seem identical, but they compile into two separate cache entries when the identifier is a constant. We can optimize these queries to share the same shape with seemingly minor modifications.

var id = 1;
var first = database.Movies.First(m => m.Id == id);
id = 2;
var second = database.Movies.First(m => m.Id == id);

We now get the same results and have cached the same query in two separate instances of its use. Using constants in our queries, we may be inadvertently polluting EF Core’s internal query cache and hurting overall query performance.

SQL injection by accident

String interpolation is one of the best C# features of the last decade. It’s a convenience that has saved every C# developer countless hours of shuffling index values in format strings. EF Core allows for the use of string interpolation when using raw SQL. Still, developers must be careful not to refactor their code into an SQL injection vulnerable state inadvertently. Let’s start with a parameterized SQL using an interpolated string.

var id = 1;
var results = database
    .Movies
    .FromSqlInterpolated($"Select * From Movies where id = {id}")
    .ToQueryString();

When we look at the SQL from our query, we see that EF Core has parameterized the id.

.param set p0 1

SELECT "p"."Id", "p"."Discriminator", "p"."Name", "p"."Release", "p"."DurationInMinutes", "p"."WorldwideBoxOfficeGross"
FROM (
    Select * From Movies where id = @p0
) AS "p"
WHERE "p"."Discriminator" = 'Movie'

We may want to move our query into another variable for a cleaner looking LINQ statement.

var id = 1;
var query = $"Select * From Movies where id = {id}";
var results = database
    .Movies
    .FromSqlRaw(query)
    .ToQueryString();

Let’s see what the resulting SQL statement is after our refactoring.

SELECT "p"."Id", "p"."Discriminator", "p"."Name", "p"."Release", "p"."DurationInMinutes", "p"."WorldwideBoxOfficeGross"
FROM (
    Select * From Movies where id = 1
) AS "p"
WHERE "p"."Discriminator" = 'Movie'

Yikes! We’ve just introduced a potential SQL injection. EF Core is no longer parameterizing our id variable. The risk becomes more dangerous if we were accepting user input in the form of a string parameter.

The EF Core abstraction protects developers from introducing SQL injection attacks. Still, when using the SQL interfaces, developers are at a higher risk of piercing the protection provided by using LINQ statements.

Inefficient Updates

Given EF Core’s LINQ interface, it’s tempting to do all our operations using our C# entities. When dealing with large dataset operations, it is better to embrace the underlying data storage engine rather than LINQ our way through thousands of records.

var movies = database
    .Movies
    .OrderByDescending(x => x.WorldwideBoxOfficeGross);

foreach (var movie in movies)
{
    movie.DurationInMinutes += 10; // trailers
}
database.SaveChanges();

We could instead use a single SQL query.

database.Database
    .ExecuteSqlRaw(
        "UPDATE [Productions] " +
        "SET [DurationInMinutes] = [DurationInMinutes] + 10 " +
        "WHERE Discriminator = 'Movie';");

We’ve accomplished the same result without the cost of network chatter.

Connection resiliency

When developing against local databases, we can sometimes take the stability of our connections for granted. Production is a little less reliable when it comes to connection reliability.

EF Core has built-in connection resiliency mechanisms that support retry and cool-down strategies. These are enabled by default when using SQL Server variant database engines, but we can configure resiliency for our specific needs.

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder
        .UseSqlServer(
            "<connection string>",
            options => options.EnableRetryOnFailure(
                maxRetryCount: 4,
                maxRetryDelay: TimeSpan.FromSeconds(1),
                errorNumbersToAdd: new int[] { }
            ));
}

It’s important to understand that connections to databases are not guaranteed to be stable. Using monitoring software, we’ll notice that there are strange outliers in performance, likely due to the execution strategy compensating for a failed connection or SQL timeout. Sometimes we can fix these issues through infrastructure management or by optimizing queries to be less taxing on our database.

Conclusion

EF Core has improved many issues found in previous versions of Entity Framework. While still being an abstraction for database engines, the interface additions to EF Core seem to embrace the inevitability of dealing with the database.

Enhancements include combining, splitting, and executing SQL in previously tricky ways. EF Core also brings more extensibility options to developers to address data access concerns like query filtering, table splitting, and inheritance. The added benefits of better debugging in EF Core make the library easier to recommend to developers. EF Core is less of a black box than previous versions, helping developers diagnose issues faster.

As a final word, the new enhancements to EF Core embrace the database and SQL more than ever, which is why its important that we learn all we can about the database we’ve chosen. Understanding a database engine’s inner-workings and the particular SQL syntax will give developers the most significant advantage, as developers can use that knowledge to optimize around any EF Core limitations.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *