Recognizing and avoiding the long tail

I would argue the long tail was one of the main reasons why agile methodologies replaced older methodologies.

Prior to agile, most software was developed and delivered over long periods of time, and if done well there various checks and balances (documentation, reviews, structural adherence) that addressed things like the long tail. Within agile these things have been relaxed, and ideally if done well things cannot deviate and create the long tail, but in practice the larger the feature the more likely the long tail will occur.

What do I mean by long tail? Well a team is given a feature (or worse an epic), say “we’d like to send notifications to users before their next activity, providing them with some details and encouragement”. In a well connected organization that may be sufficient, because those who don’t have an understanding of what that means can quickly work with others to flesh out the details. Even then gaps can emerge, as ‘details’ may evolve as people start to ask questions. There may be other questions, when should we ask the user if we can use notifications? How does that impact first time users? How often should we send notifications?

The long tail starts to emerge because the team at some point takes the information they have, and they estimate how long it will take to deliver what is requested. They have an incomplete idea in their head of what they need to do, and assumptions on what they need to know. What happens? They try something, go down certain paths, change direction and execute to ‘get it working’ within the expected time. But it is far from done, they put in great effort to get to the point where they are and have definitely produced something of value. But now they try to push their efforts as production ready, and lots of problems start to arise. Maybe they refactored code that already existed, further creating ‘risk’. Maybe they used a new external API that they are not familiar with, which has some non-obvious behaviors.

And this is where the long tail surfaces, causing stress and possible distrust across teams. There is value there, you can see parts of it, but there are problems. And it’s hard to understand where the problems are, because it is not obvious what part of what was created has problems, and more likely one problem is hidden by later problems.

Yes there are external forces that contribute to the long tail, not focusing on the feature, inability to easily flesh out a feature or the feature significantly changes are just two cases, but for this discussion I want to focus on elements developers can directly control.

I’ll say the most fundamental roadblock is that as a developer we fail to recognize complexity. That is we get ‘in the zone’ and write a bunch of code that is the initial vision of what we are trying to create. If you take a step back, you’ll realize you’ve touched many files, created new ones with many functions, and that all has to integrate with existing work that has been going on around us. You can still be in the zone and validate your actions, by simply running some code and writing unit tests for functions. Maybe you used new APIs and aren’t 100% sure how they should behave. A unit test just to validate your assumptions is well worth the effort.

In the notification example I started with there are calendar methods in the OS, and we had one assumption as to how they work. After testing it showed our assumptions were wrong and we had to add a few wrapper functions to get the results we wanted. But because we did this late we saw odd results higher up in the logic, which took time drill down to the root cause of the problem, which was then relatively trivial to fix.

Testing the intermediate functions also fleshes out your assumptions and expectations, to expose any failure points before looking at the whole feature. You can feed structured data to these functions, and validate your expected results, long before you get the structured data from other methods. This can further strengthen your ability to prevent the long tail, as you can then look at the other functions and ask “am I getting the structure I expect?”

At the same time you should document your assumptions, as they can help guide others (like the testers, security teams, end user documentation) and make it easier to understand in the future.

Beyond documentation you should also log your assumptions. Have debug logs that state something to the effect of “based on this data at this point I should have this value”. If you run a test or someone brings you results of a run and you don’t see what you expect, it’s much easier to know the problem occurred before that point, or after. And it’s easier to then write tests that feed that data to the next part, making the analysis and validation of later code easier.

As a final note you might have gone down an initial path only to abandon it. You don’t want to do too much testing, as you then spend lots of time on dead ends, but you want to validate the dead end will or will not work. You also want to get rid of the dead end as quickly as possible. Too many times code just ‘exists’ and is never used. In the notification case we explored using background processing, found it to be unreliable, and not necessary given product use patterns. But background processing sat around for a while, with hooks into many parts of the code.

Recognize that the long tail can happen, apply strategies to log, test and validate your assumptions at multiple layers. Where questionable validate external APIs. The more reliable the work you produce, the more confidence you have. Those around you will also build confidence, and if done well the long tail will become a thing of the past.

Recognizing and avoiding the long tail

Like this:

Leave a ReplyCancel reply

Like this:

Leave a ReplyCancel reply

Discover more from PartWood