Tackling a Tough Programming Challenge: Calculating Overlapping Downtime in Takt!

when multiple alerts are raised and completed at the same and also different times, how do we handle overlapping downtime alerts? We can’t double count when two alerts are open at the same time.

Kaleb McKelvey, Software Engineer

In my previous article about stepping into the Takt Tech Lead role, I talked about Takt and calculating Takt time. Giving line managers and workers insight into their current progress against demand, real-time, increased the data-driven decisions being made on the shop floor.

Additionally, measuring downtime creates another opportunity for productivity improvements. Gaining awareness around areas that caused issues with meeting demand allowed line managers to better address challenges, giving answers to questions such as was it because of worker productivity or processes that delayed the workers?

Since we had a suite of productivity apps, the Takt team was able to utilize an alerting system already onboarded in many of our shops as a way to track downtime. We allowed alerts to be configured as downtime causing, and when these alerts were raised from the Takt dashboard, we would start displaying a downtime clock counting in real-time as well. It showed real-time delays along with historical data to the alerts raised during shifts.

Calculating downtime was not an easy task for us. The main challenges to tackle with this feature came down to the following: when multiple alerts are raised and completed at the same and also different times, how do we handle overlapping downtime alerts? We can’t double count when two alerts are open at the same time.

The first step to developing a solution started with defining scenarios (or use cases). I hope that these create a baseline to show how the code works as well!

Three Main Scenarios

Operators could raise alerts at any time, which sounds like many varying use cases. After drawing on the whiteboard at that time (does anyone remember when we could do this in person? Seems like ages ago!), the team and I boiled down the scenarios into three.

In all scenarios, the line consists of 4 stations, any of which could raise downtime alerts at different times.

1 - No Overlapping Downtime

The first case is the easiest. This case handles independent alerts that were raised and resolved without any overlapping time.

Scenario:

Station 1 downtime alert raised at 1:00 pm and resolved at 2:00 pm
Station 2 downtime alert raised at 3:00 pm and resolved at 3:30 pm

Downtime:

1:00 pm to 2:00 pm - 1-hour downtime 3:00 pm to 3:30 pm - 30 minutes downtime Total: 1 hour 30 minutes of downtime

2 - Total Engulfed Downtime

The second case isn’t much more challenging. This case handles when one alert was open that caused downtime, while a second alert was raised and resolved before the original alert was completed. In other words, the first alert engulfs the second, and we only need to count the first in our downtime calculation.

Scenario:

Station 1 downtime alert raised at 3:00 pm and resolved at 6:00 pm
Station 2 downtime alert raised at 4:00 pm and resolved at 4:30 pm
Station 2 non-downtime alert raised at 5:00 pm and resolved at 6:00 pm

Downtime:

Alert 1 - 3:00 pm to 6:00 pm - 3-hours downtime Alert 2 - 4:00 pm to 4:30 pm - 30 minutes downtime, but engulfed by the first alert Total: 3 hours of downtime

3 - Extended/Overlapped Downtime

The third case can be a bit more challenging. This case handles when one alert was open that caused downtime, while a second alert was raised sometime afterwards. The first one then resolves while the second one remains open. In other words, the first alert starts the downtime clock and the second alert continues it even though the first one was resolved.

Scenario:

Station 1 downtime alert raised at 3:00 pm and resolved at 4:00 pm
Station 2 downtime alert raised at 3:30 pm and resolved at 6:00 pm
Station 2 non-downtime alert raised at 5:00 pm and resolved at 6:00 pm

Downtime:

Alert 1 - 3:00 pm to 4:00 pm - 1-hours downtime Alert 2 - 3:30 pm to 6:30 pm - 3-hours downtime Total: 3 hours and 30 of downtime (3:00pm to 6:30pm)

The next step after breaking down the scenarios? Write some code!

The Code

I created a codepen to better show the code below with examples! Check it out if you’d rather play around with it live: https://codepen.io/avatar-kaleb/pen/NWRNpVr?editors=0002

Some worthy notes:

There may be bugs, please let me know and I’d be happy to correct them. This particular problem was solved years ago and I no longer have access to it. I tried to recreate it based on my memory, which means, of course, I’ve learned quite a bit since then and the original data formats, etc are lost to me.
Past me chose recursion for this particular solution, it seems that a loop might be cleaner. The mental model of recursion does still make sense to me while revisiting.
I’ve added many comments to help explain what’s going on, more than usual so please forgive me if you’re a “no code comments” enthusiast.

For those who prefer seeing the static code here on the article, here you go:

calculateDowntime(

createdTime, resolvedTime, totalDowntime, index, taktAlerts = [] ) { // make sure there isn't any funny business in our data before trying to calculate downtime if (this.hasInvalidCalculationData(createdTime, resolvedTime, totalDowntime, index)) { return 0; }

// conventient local consts to ease referencing the times // we could further validate that these have data before continuing, left it out for concise example

const currentTaktAlertCreationTime = taktAlerts[index].creationTime; const currentTaktAlertResolvedTime = taktAlerts[index].alertResolved ? taktAlerts[index].resolvedTime : new Date().getTime();

// //////////////// // initialization // //////////////// if (index === 0 && taktAlerts.length === 1) { // only 1 alert so let's just return downtime return currentTaktAlertResolvedTime - currentTaktAlertCreationTime; } else if (index === 0 && taktAlerts.length > 1) { // start recursing with init data set return this.calculateDowntime( /// Why recursive?? This can be done with a loop currentTaktAlertCreationTime, // createdTime currentTaktAlertResolvedTime, // resolvedTime currentTaktAlertResolvedTime - currentTaktAlertCreationTime, // downtime index + 1, // index taktAlerts ); }

// ////////////////

// engulf case // example: [ // { id: 1, creationTime: 10, resolvedTime: 20}, // { id: 2, creationTime: 11, resolvedTime: 19} // ] // //////////////// if (currentTaktAlertCreationTime >= createdTime && currentTaktAlertResolvedTime <= resolvedTime) { // check for next iteration if (index + 1 < taktAlerts.length) { return this.calculateDowntime( /// Save as above createdTime, resolvedTime, totalDowntime, index + 1, taktAlerts ); } // return total downtime since theres not a next alert return totalDowntime; // //////////////// // overlap case // example: [ // { id: 1, creationTime: 10, resolvedTime: 20}, // { id: 2, creationTime: 11, resolvedTime: 25} // ] // //////////////// } else if ( currentTaktAlertCreationTime >= createdTime && currentTaktAlertCreationTime <= resolvedTime && currentTaktAlertResolvedTime >= resolvedTime ) { // set resolve to TA resolve totalDowntime += currentTaktAlertResolvedTime - resolvedTime; resolvedTime = currentTaktAlertResolvedTime; // check for next iteration if (index + 1 < taktAlerts.length) { return this.calculateDowntime( createdTime, resolvedTime, totalDowntime, index + 1, taktAlerts ); } // overlap base case -> return downtime return totalDowntime; // //////////////// // next alert case // example: [ // { id: 1, creationTime: 10, resolvedTime: 20}, // { id: 2, creationTime: 40, resolvedTime: 50} // ] // //////////////// } else if (currentTaktAlertCreationTime >= resolvedTime) { if (index + 1 < taktAlerts.length) { return this.calculateDowntime( currentTaktAlertCreationTime, currentTaktAlertResolvedTime, totalDowntime + (currentTaktAlertResolvedTime -currentTaktAlertCreationTime), index + 1, taktAlerts ); } // next alert base case -> return total downtime return totalDowntime + (currentTaktAlertResolvedTime - currentTaktAlertCreationTime); } else { console.error('Unknown alert case when calculating downtime.'); return 0; } } }

Concluding Thoughts

Solving impactful, complex challenges drives the joy we get from our chosen profession of Software Engineering. The collaborative brainstorm and whiteboard sessions with the aroma of freshly brewed coffee creates the creative atmosphere that leads to solutions!

This is the essence of why I love being an Engineer, especially when you are able to see the time you save or the impact you make for the users that provide the challenges!

Writing code feels like art as we tell computers how to behave to solve them :). It takes creativity, patience, and persistence. Follow that up with testing and you are on your way to a masterpiece!

And with that, I will leave you with this: “Computers are good at following instructions, but not at reading your mind.” - Donald Knuth

Thank you for reading!!

Kaleb McKelvey, Software Engineer