After the recent post Remove yourself from the critical path my friend and ex-colleague Fabrizio Mirabito gave me an insightful review: my definition of critical path is a synonym for dependency and while removing yourself from the critical path is a great operational technique, the strategic and tactical approach should be to remove all dependencies altogether. Will you be able to remove all of them? Maybe not, but to him this process is part of kaizen, the “change for the better” Japanese business philosophy which I would roughly translate as continuous improvement.

I totally agree with his review, so here I am delving into the topic. This post is called Shortening the critical path for a reason: while you must tend to remove all dependencies, this will likely be an enormous task to undertake. In respect to kaizen what we want is to aiming at removing all dependencies, thus shortening the critical path. The shortest critical path possible is zero: the only thing to do is doing the thing to do. No overhead.

If it’s impossible, does it make sense trying?

mission impossible to impossible mission

Short answer is yes.

Long answer starts from understanding the goal that I think we, as engineers working for a product company, have: build great products that customer love to use, solving real needs, in an efficient and sustainable way. Is my opinion that this cannot be achieved just by focusing on making money, it requires an active focus on the main keywords of the previous sentence: great product, customer, needs, efficient, sustainable. In my experience making money is a side effect of a great product, and a great product is a side effect of focusing on customer needs.

Is not just about the goal, a long critical path (or a lot of dependencies) will bring in some relevant risks:

partial done work: when you don’t finish something that partial work will be left in an uncompleted state; you may work on it again soon enough to complete it or it will become waste, as a raw food will become waste;
over production: if you are at constant risk of interruption you will naturally try to foresee possible future requests, baking them into your current product, hoping for them to prevent future work; most of the time this overproduction will become waste;
extra processing: handling dependencies (upstream or downstream) will require more work: more coordination, more planning, more meetings, more documents; Steve Jobs said “less is more” for a reason: you can’t cope with too much overhead, in the end it will just become a bloat that no one likes;
hand-offs: nothing kills velocity like hand-offs; have you ever watched a relay race? There is a clear additional challenge in passing the baton in a relay race, with require to slow down and great synchronisation.

It’s curious that 4×100m relay race world records are less than 4 times the 100m records. Why? In the 4×100 only 1 runner start from 0 speed, so that “penalty” does not impact that much the overall time.
It’s also curios that 4×100m world records are less than 400m records of ~5 seconds. They are requiring 4 times the people involved for a ~12% increase.

waiting: a key risk is waiting times and delay that reflect across the entire chain; if you’re part of a chain, any upstream delay will affect you and your delay will affect your downstream dependencies; waiting also hides in plain sight in some of the other risks, like hand-offs, extra processing or over production;
context switching: all the above risks sum up to generate the Megazorg of velocity killers; context switch also plays a bug part in rework (when something needs to be reworked because is not working as intended);
defects: what do you expect to happen if you never finish work, over produce, add additional processing, more hand-off and wait times and context switch across multiple domains? Defects. And in a downward spiral that will increase other risks.

The bad parts

Michael Jackson singing Bad

While trying to mitigate this risks there are some anti-patterns that I suggest to look for and avoid doing.

🟨 Waterfall development

This post is not about Agile per se, even if is heavily inspired by it. Doing waterfall development (or any “agile” framework that package waterfall in shorter time frames) is in my opinion an anti-pattern. Waterfall implies hand-offs, waiting and context switching.

It may be I’ve never been exposed to a working waterfall development cycle, but I don’t think is the correct approach for building great software products. At least not in this present days with the fast changing landscape of requirements, needs and user expectations.

🟨 Layered development

When deciding how to work on a feature, fight the tendency to split it across infrastructure boundaries: there is no DB/Back-end/Front-end in a feature. No customer will ever see your Back-end. This will be looking at your application and judge it in its entirety.

Splitting across boundaries seems a great solution but tend to bring in partial done work (feature implemented in Back-end but not in Front-end), over production, hand-offs and waiting.

🟨 100% utilisation

You never want to be running at 100% utilisation. I spent a lot of time as SysAdmin and a 100% utilised resource (say CPU or disk space) always signal troubles.

The reason is intuitive: if you are running at 100% utilisation there is no room for error, no room for any unexpected occurrences. Anything different than “planned” will mean troubles, leading to partial done work, extra processing, context switch, defects.

🟨 Bus factor 1

Bus factor is a risk measurement that aims to surface risks associated with information silos: what is going to happen if - put here key person in your team, gets hit by a bus? We really hope this is just speculative, as being hit by a bus is bad.

The bus factor is “the minimum number of team members that have to suddenly disappear from a project before the project stalls due to lack of knowledgeable or competent personnel.” (Thanks Wikipedia! 💗)

How would your team/org/company cope with such an occurrence? If your bus factor is 1, all risks will be amplified but in particular waiting can become an infinite wait time.

The good parts

There are some corrections that could be applied to mitigate these risks.

🟦 Self-organised and self-managed teams

The first countermeasure for the risks above is having self-organised teams that can self-manage themselves. Given a purpose, the right cross functional skills and a strategy to execute, use self-organisation and self-management as empowering tools for your teams.

What do I mean by self-organised and self-managed team? Is a team where the work to be done is not assigned by the manager. The team members find their own work and manage the associated responsibilities and timelines. Is a team that has authority to choose the most effective way to organise their work. Is a team that is constantly looking around for ways to improve.

There are some key principles for this to happen:

ownership and responsibility: the core, having full ownership of a domain (it may be a specific software, or system, or library, …); this full ownership is balanced by full responsibilities on that domain.
trust: no team can be a great team without trust; trust is based on psychological safety.
communication: they require high bandwidth communication channels within the team itself; managing communications is probably one of the key challenges of creating such teams.

In the long term this kind of team will increase their efficiency and problem-solving capabilities, becoming true experts of the domain they have ownership on.

A key strategy here for managers of such a team is to build bridges with other teams. Your team will be focused on delivery, understanding the domain and becoming experts. They will probably struggle with keeping in touch with other teams or other part of the organisation. There is a huge part to play here in building bridges for your team, as bridges will be the communication channels the team can use.

This is also where some of the strategies highlighted in Remove yourself from the critical path may come handy.

🟦 Own your tools

Tool ownership is paramount. By this I don’t mean “they should own their own bare metal servers”, I mean they need to own the tool directly involving their day to day work. All the DevOps principles, self-servicing internal platforms of sort are the foundation for tool ownership.

Think of it like this: if you were an artisan (any craft applies) would you borrow your tools? Maybe that one that you use not often, costs a lot and you need only for a brief time. You would never borrow the entire toolbox each time. So why do we allow or force our teams to work like that?

🟦 Kaizen

To become better you can’t leave improvements to chance. You need discipline and a process. Kaizen is the Japanese term but you could just use “continuous improvement”. Invest part of your time in sharpening your tools, improving them, fixing them, revisiting processes and looking around constantly for things that may be improved.

This attitude, more than process, will be groundbreaking. We always underestimate the power of consistent small wins. Is the same concept as “compound interest”: each small win will be on top of other small wins. Each win stake on top and after some iteration you will end up with big wins that would have been otherwise out of reach.

A key strategy here is to always question the process: does this process/step still provides value? Be obsessive in asking this question and ruthless in pursuing improvements. Be mindful of dependencies though; in a fully self-organised team there may be 0 dependencies affected by a process, but in case the number of dependencies is not 0 keep in mind the effort required for others to adjust to the new process.

🟦 “Stop starting, start finishing”

How many things are you trying to do at once? If the answer is “too many”, you are failing prey of an anti-pattern. Context switching and high wait times leads to starting many things at once. You are waiting, “wasting” precious time, so the next obvious step is to start something else. Wrong!

When you are blocked, stuck in a queue or waiting for any reason the only logical next step is to remove the blocker that is making you wait. This idea builds on top of Owning your tools and Kaizen. Instead of context switching to a new task, think how you could improve the current situation. Is your CI taking 1 hour to run? How could you make it run in 10 minutes? Do you need to get answers from outside your team? How can you arrange your communication channels to reduce wait?

Stop starting new things, focus on finishing what you have on your plate. 100 in progress and 0 done tasks provide 0 value to your company and your customers.

A key strategy here is limiting work in progress. Like for real, don’t do more.

🟦 Is done really done?

One of the key elements in Scrum is the definition of done. I think this concept is powerful and well worth exploring.

Define what does it mean to be “done” with a task. And make it complete enough so that once done it creates value. “Implement a feature” does mean nothing if it’s not deployed and customers are not using it. It will just be another “In review” (or worst, “Done”) item that will hunt your future self. If you happen to work with long release cycles you know this feeling, and is not great when 3 months after you merge a PR you discover there was a bug, but no one noticed because it slipped through testing (and no, no amount of testing will ever replace real user behaviour).

This strategy may be impractical to apply, but shortening your release cycles and focusing on providing short and effective feedback loops so that “done is really done” and you can move forward without worries is a superpower.

Let it be

I don’t think it’s ever possible to condense all the information useful for such a journey in a blog post. Maybe in some hundreds books, but still, there is so much to say, to explore, to try.

Don’t make yourself overwhelmed by the challenge though. Use kaizen as your guide, start looking for things to improve and allocate time to improve them. I’m quite sure you will never reach the perfection, as I’m sure than in 1 year from now looking back you will see a huge leap forward.

“The longest journey begins with a single step.” - Pantanjali

Shortening the critical path

If it’s impossible, does it make sense trying?

The bad parts

🟨 Waterfall development

🟨 Layered development

🟨 100% utilisation

🟨 Bus factor 1

The good parts

🟦 Self-organised and self-managed teams

🟦 Own your tools

🟦 Kaizen

🟦 “Stop starting, start finishing”

🟦 Is done really done?

Let it be