An Opinionated Usage of Interface

Every C# developer knows what interface is. From the code perspective it is interface ISomething. There is no requirement for the I in the interface name. But that is just the naming convention. And it is good to know it is an interface by looking at the name. A more interesting question is when do we use interface?. And I guess each developer might have their own answer and reasons behind them.

Back in time, about 10 years ago, I started to use interface a lot. Back then the obvious reason was mocking and unit testing. And then the dependency injection came to my developer life. I read somewhere that you should inject interfaces instead of concrete implementations.

Have you ever heard of these?

  • Depend on interfaces instead of concrete implementations
  • It is easier for you to change the implementation later
  • It helps you mocked the implementation in the unit test
  • Your code looks clean and professional

In some codebases, I ended up with interfaces everywhere. The unit test was a bunch of mocks and mocks, everywhere. It looked good in the beginning. However, after a while, it was a pain.

  • It was hard to do the refactoring. For example, when I did not want to move a piece of code from this class to another class without changing the outcome behavior, the unit test failed. Because the expected interface was no longer there. The unit test knew too much about the implementation detail. I have the habit of refactoring code quite often. And I expect the unit test to catch my mistake if I accidentally change the outcome behavior. With mocking, I failed to archive that
  • Had to test at every layer. Basically, there were behavior test with mocking and there were tests for actual implementation. There were too much test code to maintain. Well, that was a waste of time and effort and error prone in the test
  • The chances of actually changing an implementation was rare

Ok, so is interface useful? Of course yes it is. And here are my opinions when to use it.

Code Contract

The interface tells the consumers the functionalities it supports. A class might have 10 methods. But not all consumers use or care 10 methods. They might be interested in 2 or 3 methods. It is fine to inject the class. However, the consumer is confused and might misuse.

Let take an example of an imagination log service. Here is the concrete implementation

public class SimpleLogService
{
    public void WriteLog(string logMessage)
    {

    }

    public IList<string> GetLogs()
    {
        return new List<string>();
    }
}

// API Controller to read the log
public class LogController : Controller
{
    private readonly SimpleLogService _logService;
    public LogController(SimpleLogService logService)
    {
        _logService = logService;
    }

    public IActionResult Get()
    {
        return OK(_logService.GetLogs());
    }
}

There is nothing wrong with the above code. However, I do not want the LogController to see/use the WriteLog method. That method is used by another controllers or services. And the SimpleLogService class might grow in size over the time. More and more methods are developed.

To solve that problem, I want to create a contract to tell LogController what it can use.

public interface ILogReaderService
{
    public IList<string> GetLogs();
}

public class SimpleLogService : ILogReaderService
{
    public void WriteLog(string logMessage)
    {

    }

    public IList<string> GetLogs()
    {
        return new List<string>();
    }
}

// API Controller to read the log
public class LogController : Controller
{
    private readonly ILogReaderService _logService;
    public LogController(ILogReaderService logService)
    {
        _logService = logService;
    }

    public IActionResult Get()
    {
        return OK(_logService.GetLogs());
    }
}

And I can do the same for the WriteLog part if necessary.

Decouple Implementation Dependency

In many projects, there is data involve. There are databases. And then comes the concept of Repository. And if the repository implementation is easy and that the database is ready. A developer can write a complete feature from API layer down to database layer. But I am doubt that is the reality. So the situation might look like this

  • One developer takes care of front end development
  • One developer takes care of the API (controller) implementation
  • One developer takes care of designing database, writing the repository. This might be the same developer that implements the API

The API layer depends on the Repository. However, we also want to see the flow and speed up the development. Let’s see some code

public class UserController : Controller
{
    private readonly IUserRepository _repository;

    public UserController(IUserRepository repository)
    {
        _repository = repository;
    }

    public async Task<IActionResult> GetUsers()
    {
        var users = await _repository.GetAllUsers();

        return Ok(user);
    }
}

The IUserRepository is the Code Contract between API and the repository implementation. To unlock the development flow, a simple in memory repository implementation is introduced

public class InMemoryUserRepository : IUserRepository
{
    public async Task<IList<User>> GetAllUsers()
    {
        await Task.CompletedTask();

        return new List<User>{
            new User("Elsa"),
            new User("John"),
            new User("Anna")
        };
    }
}

And the API can function. This releases the dependency on the actual repository implementation. When such an implementation is ready, switch to use it.

However, do not overuse it. Otherwise, you end up with interfaces everywhere and each developer gives their own temporary implementations. Choosing the right dependencies is an art and context matters a lot.

Conclusion

I rarely create interfaces with the purpose of unit testing in mind. Rather, it is the outcome of writing code, refactoring from a concrete implementation and then extracting into interfaces where they make the most sense. When I do, I pay close attention to its meaning. If I can avoid an interface, I will do it.

Code Contract and Decouple Implementation Dependency are the 2 big benefits from having proper interfaces. There are other reasons to use interfaces. They are all valid depending on the context. Sometime, it is the way the project architect wants it.

What are yours?

Watch Out Abstract Dependency

The context is a C# .NET Core WebAPI system. The system employs the Onion Architecture. There is an API layer which is ASP.NET MVC controller. Then there is an Application Service layer inside the API layer. In the core of the onion, it is the domain. One can quickly google for Onion Architecture. If you do not know it, I suggest you take a look at it first.

One of a key points in the Onion Architecture is the inner layer must not know the outer layer. Which translates into the C# project that the Application Service Layer must not reference directly to the API project. The API consumes the Application Service NOT the other way around.

There is a requirement that application configuration (note that it is not system configuration like connection string, certificates …) should be maintained in a JSON file. Let’s call it appConfig.json. The file is deployed with the API. The implementation takes the advantage of .NET Core JSON configuration so the runtime can load the file.

Let simplify the requirement to focus on the architecture. The appConfig defines the default currency. The value is used to display in the UI and for other processing in the Application Service.

Come to the tricky part. Where do we should place the implementation: API or Application Service?

My answer is always in the API. What if the implementation is in the Application Service? Let’s see the implementation and analyze.

By using the IConfiguration interface, the Application Service has a dependency on Microsoft.Extensions.Configuration package which seems fine. Because it is abstraction.

Has the dependency between API and Application Service changed? Implementation-wise NO. Because there is no direct reference from Application Service to the API.

But there is in term of design. The Application Service implementation is using the infrastructure supplied by the API – the configuration is managed by the API layer.

Will it cause any problem? Well, it depends on what we care most about. If the architecture is the main concern, then yes, it is a problem. The architecture is broken unintentionally. Think about a scenarios where the Application Service is by another client, such as WPF application. Does it make sense to bring the Microsoft.Extensions.Configuration package to WPF application?

The Application Service defines the interface, which says “hey! I need to know the default currency, but I cannot figure it out by myself.” The outer layers, can be a WebAPI, can be WPF application, will supply the implementation, which says “Not a problem! I know where to get the default currency for you.”

Avoid Potential Problems with Explicit API Design

A year ago, I wrote Leaky Abstraction – Linq Usage. I use the design whenever I meet the same challenge. The system works as expected. I am happy about the design.

Until recently, there was a need to order the collection. Say that we need to order a TeacherCollection by years of experience. Assuming that there is a requirement to print the result in the years of experience order.

It is a pretty simple requirement. In the TeacherCollection constructor, this code will do the job.

public TeacherCollection(IEnumerable<Teacher> teachers)
        {
            if (teachers != null)
                _teachers = teachers.Where(x => x.IsStillAtWork).OrderBy(x => x.StartedOn).ToList();
        }

Everything works as expected.

Boom! The system is very slow when having more data. The profiler shows that a high number of time spent on the ordering. In that system, the ordering logic is much more complicated. It is not a pure Linq sorting as in the example. Still, the sorting cannot be a problem. That is for sure.

The problem is that the collection is accessed too many times. It is also an expected result because the collection is designed to filter data, to work in a pipeline in a safe way.

The ordering logic should not be placed here. The collection itself has all the information to do the sorting, filtering.

What should we change in term of the design to solve the problem and also support the sorting?

Identify Responsibilities

In my opinion, this is the most difficult part of writing code. I have not found any exact formula to get it right. Identifying responsibilities is a heuristic. Experience matters here.

Filtering and ordering should be treated as two separated responsibilities. It is very easy to mix them in one implementation and thus error-prone. When defining responsibility, one should consider at least 2 factors

  1. The purpose of each: One is for filtering, the other is for sorting. They are 2 different operations.
  2. When it is used and the usage frequency. Filtering is used a lot to extract sub collection from the original collection. Ordering is, on the other hand, only used when a final result is displayed to the end user or other form of presentation such as Console screen, word document.

It is kind of tricky to see them as separated responsibilities. In many case I even not bother to think about it. Well, It proves that I was wrong. Sometimes, it sounds cool and simple if just order the list.

Extract Explicit Interfaces

Before moving on, let’s take a look at the TeacherCollection. The additional feature we need is the ability to get exact index of a teacher.

ITeacherCollection – The default interface is extracted from the current TeacherCollection. The School now holds an instance of ITeacherCollection, instead of TeacherCollection implementation. This refactoring step will not break anything.

IIndexedTeacherCollection – A simple interface which supply only GetIndex API. A key point is that a consumer cannot instantiate it. The only way to have this API is a transition from ITeacherCollection.BuildIndex.

The sorting cost is paid only whenever a need arises.

The actual implementation of the improved design is almost identical with the original version. All the major implementation logic is there in the TeacherCollection class. The refactoring is safe because the compiler will tell us what goes wrong.

The client (Program class) only deals with interfaces and interfaces transition.

Why didn’t I have the ITeacherCollection at the first time? Well, we have not needed it at that time. We should not make thing complicated if there is no demand. Design is evolved.

I just solved a bug beautifully.

Code Readability – Method Parameters Tell The Scope and Dependencies

Writing code is hard. Writing human readable code is much harder. One of the contributing factor is that it depends on who read the code. Each developer has a different background, experiences, coding style, and even how their mind is wired up.

I do not believe there is a code that readable for every developer (because when talking about code, we prefer to developers). However, there is code that improves readability over time for the development team. It is hard for an external developer to understand the code just by looking at some files or pull requests. Context really matters.

This post is purely my point of view in term of code readability. That’s said the code that I consider as readability. There are many factors contributing on the readability. The method parameters is one of them. Let’s walk through some example code.

Let’s build a piece of code that stimulates the job applicant CV verification. The detail does not matter.

When verifying an applicant, there are 2 piece of information: ApplicantCv and Special Notes. Some candidates might get a direct introduction from the top managers.

And the consumer code

The interesting part is at the Verifying method of the SeniorPosition and JuniorPosition. The JuniorPosition passes the execution to a private implementation, whereas the Senior does not. 

Let’s follow the code from the consumer side, the Program class. A list of applicants wrapped in VerificationContext. Each context has 2 important information: SpecialNotes and ApplicantCv.

The next level is the HrDepartment. All it does is simply passing the context to all implementations of IVerifyingByPosition, SeniorPosition and JuniorPosition. The HrDepartment depends on the VerificationContext. That is the information that it is going to use to do the work, the logic of verifying applicants.

The next level is the implementation of IVerifyingByPosition. The JuniorPosition.Verifying method consumes the VerificationContext. However, it delegates the logic to a private method which take only ApplicantCv. The scope is narrowed down from VerificationContext to ApplicantCv. By having that private method, I know for sure that the JuniorPosition depends only on the ApplicantCv, not the entire VerificationContext.

Why is that important? Imagine that later the VerificationContext is expanded with more properties. If there are many code in JuniorPosition, we have to read the code to know what information is using.

It sounds confusing and not obvious. The rule is very simple, a method should take just enough information, all parameters should be used. That is easy to understand. But what we do not pay attention to is the properties in objects passed in the parameter list. In the example that is the JuniorPosition uses VerificationContext in the method signature. But its implementation only uses a property in the context – the ApplicantCv.

When reading code, I usually pay attention to what information it consumes and depends on. Usually they are reflected via constructor or method’s parameters. The code should be refactored to gain that obvious dependencies for free.

Another factor in the code readability is the proper use of instance vs static. If a method can be made static, it should be declared with static. Why? Because by static, we know for sure that it does not depend on any instance fields. The dependency is narrowed to a smaller scope.

What else could we do? Method visibility is an important factor as well. A public method means it is being used by other consumers. A private method is safer to refactor.

Code readability is important and it is hard to get it right. However, with some small tips and care, we can do it better over the time. The readability also takes into account the human factor. The one who write the code and the one who read the code. They are human. They communicate via code.

Hidden Cost of an Architecture

In a normal day of developer life, I was hunting a performance issue and memory leak. It sounds mysterious, but, just another bug, another issue to solve, after all. 

When come to the performance/memory issue, one should go for PerfView. The tool gives a very detail picture of what was going on in the memory at a reasonable level that a developer can understand.

The system is a WCF service which works base on the DataContract. From the profiler, I found out that if a returned value has 10MB in size, it will cost the OS of 50MB, proximately of 3 times extra cost. That does not count the memory consumed by the WCF framework to serialize the contract.

Note that I do not judge the architecture good or bad. There were good reasons what it was designed that way.

A very simplified version looks

With that simple code setup, a simple console app that consumes the service

Here is the result of downloading a file of 74MB. The total memory consumed in the heap is 146MB.

There are 2784 objects with total of 146MB memory

Where are those extra cost coming from? The extra cost comes from BinaryDataContractSerializer.Serialize method.

  1. The memory consumed by the DataContractSerializer.
  2. And the memory consumed by the MemoryStream to return an array of bytes.

In many cases, with modern hardware, it is not a big problem. There is Garbage Collector taking care of reclaiming the memory. And if both request and response are small, you do not even notice. Well, of course, unless one day in the production, there are many requests.

There are a couple of potential issues about consuming more memory

  1. If the size is more than 85K (85000 bytes), it is stored in Gen 2 eventually. I would suggest you read more about memory allocation, especially Large Object Heap (LOH). I am so amateur to explain it.
  2. Cause memory fragmentation. Memory keeps increasing. GC has a very hard time to reclaim them.
  3. Of course the system is not in a good shape.

How could we solve the problem without changing the design, with less impact?

We know that some operation will consume lots of memory, such as downloading a file, returning a data set. Instead of returning the byte array, we extend the response to carry the object. We could do that for all operations and get rid of the byte array. However, there are hundreds of operations. And we want to keep the contract simple and with less changes as much as possible.

So an improved version looks like

Design a LargeObject contract to reduce the serialization cost

Run the application and see the memory again

There are 366 objects with total of 73MB memory

Comparing the two, there is a big win: 2784 objects vs 366 objects; 146MB vs 73MB.

With the increasing power of hardware, RAM and Disk are not problems anymore. With the support from the managed language (such as C#/DotNet), developers code without caring too much about memory, memory allocation. I am not saying all developers. However, I believe there are many that do not care much about that issue.

It is about time to care every single line of code we write, shall we? We do not have to learn and understand every detail about the topics. These are good enough to start

  1. Memory allocation in Heap, Gen 0, Gen 1, and Gen 2.
  2. Memory fragmentation. Just like disk fragmentation.
  3. Memory profiler at abstract level, such as using dotMemory, PerfView.
  4. Garbage Collector. Just have a feel of it is a good start.

I am sure you will be surprised with how fun, how far it takes you.