Category: Uncategorized

Our Kanban Boards Are Backwards!

If you’re using a Kanban board to visualize your work, in its most general form, it probably looks something like:

Screen Shot 2018-02-09 at 5.59.11 PM

It turns out that this is exactly backwards with how we intuitively visualize time. As Jabe Bloom points out, the “set of columns reflects an inversion of our innate understanding of the flow of time.”1 Instead of time flowing from left to right in a Past, Present, Future order, it is inverted, and on a Kanban board it flows from right to left.

Screen Shot 2018-02-09 at 6.05.42 PM

Contrast this with what a Kanban board would look like if it was coherent with how we usually think about the time arrow.

Screen Shot 2018-02-09 at 6.07.59 PM

Once I rearranged a Kanban board this way, I have a hard time thinking of it differently. There are multiple things that become coherent with this arrangement.

  • The new arrangement is coherent with how left-to-right readers visualize the arrow of time. By coherent, I mean a very abstract coherence in the sense that “Happy” is coherent with “Up” or that “Sad” is coherent with “Down”.
  • When “walking the board”, we are taught to walk “backwards” from DONE to TO DO. Well, in this new arrangement, there is nothing backwards about it. Walking the board becomes coherent with the board arrangement and the flow of time.

  • The cards end up traveling from right to left. For some reason, that is more coherent with “pull”. To contrast with a standard Kanban board, cards traveling left to right seems to me more coherent with “push”.

  • Looking at “TO DO” column in the future and on the right, feels more coherent with “TO DO” being our vision of the future that we are “pulling” into reality one card at a time. It also, to me, highlights better my opinion that a backlog is just a place where everything goes stale without us worrying about it.

  • The clutter of a “TO DO” column seems easier to dismiss when it’s on the left. When it’s on the right, the clutter of “TO DO” goes from “we have a lot of work to do” to “we have no coherent view of what we want in the future.” The difference is very subtle, but I think it’s there.

Will changing your Kanban board this way make you 50% more productive? No. However, while I see no compelling reason for the predominant TO DO, DOING, DONE arrangement, there seems coherence to be gained by switching to DONE, DOING, TO DO.

Endnotes:

1 Bloom, Jabe (2012). The Moment of Pull – Meditations on time and the movement of cards. Retrieved 9 Feb 2018.

2 While this Kanban board arrangement came to me while reading Jabe’s “The Moment of Pull,” it is not a new idea. For example, see: Rybing, Tomas (2015). Mirrored Kanban Board. Retrieved 9 Feb 2018.

Advertisements

How Long Will It Take? – Part 2 – So, Does It Work?

After writing the original How Long Will It Take? post, I kept wondering how to measure if the estimation method described therein (from here on referred to as “Historical Lead Time”, or HLT) is effective, for some definition of effective. What I realized is, since the estimation method does not require human input, I could use historical data and simulate what the method would estimate at any particular point in time. This blog post describes this experiment and demonstrates a very surprising finding.

TL;DR: It turns out that the HLT method minimizes estimation error better than every other tested method except one, which is… *drumroll*… “pick the average so far” as the estimate. Read below for details and caveats. Also, it would be very helpful to run this experiment on many data sets instead of just the one I used, please contact me if you can provide a data set to run this experiment on.

Experiment Objective

Determine usefulness of estimating software work using percentile estimates based solely on observed past data as described in How Long Will It Take? .

Experiment Hypothesis

HLT estimates are better than random estimates. (Spoiler: they are! … or more correctly: experiment results do not refute this hypothesis)

Experiment Metric

Sum of square error, which will be the difference between estimated duration and actual work item duration, squared.

Experiment Design

The experiment is a simulation of what would the estimates be at specific times in the past. Given a data set of work start and stop times, simulation starts after completion of first work item and ends after completion of last work item in the data set.

Experiment uses multiple estimation models. The model with the least cumulative sum of square error is deemed the best. Where appropriate, models are tracked per 25th, 50th, 75th, 90th, 95th, and 99th percentiles. Models used in the experiment are:

  • HLT: Estimation method described in How Long Will It Take?.
  • Levy: Estimation method that assumes distribution of observed work item durations can be described as a Levy distribution. This model is included to showcase a terrible model.
  • Gaussian: Estimation method that assumes distribution of observed work item durations can be described as a Gaussian/Normal distribution. This model is included to showcase a “dumb” model as a sanity check.
  • Random: Estimation method that simply picks a random number between zero and longest duration observed so far. This model is included to provide a baseline to compare against.
  • Weibull: Estimation method that assumes distribution of observed work item durations can be described as a Weibull distribution. This model is included because it is seems to be the go-to model used by people who take estimation seriously.

In addition to the above models, each model is also tested with and without sample bootstrap to a sample size of 1000, as described in How Long Will It Take?

Data set used for the experiment is Data Set 1, consisting of 150 work item start and stop times.

Each simulation is performed using the following procedure:

  1. Using Data Set 1, create a timeline of start and stop events to playback.
  2. Playback the timeline created in 1.
    • Upon observing a work item start event, notify the estimation model of work item start. If the model can generate an estimate (model must observe two completed work items prior to generating an estimate), compare the generated estimate with actual known duration, calculate the square error, and record it.
    • Upon observing a work item stop event, notify the estimation model of work item stop.
  3. Continue playback until the timeline is exhausted.

Experiment Results

A rich way to demonstrate the results is to plot the cumulative sum of square error for each model and each percentile together (model+percentile, e.g.: “levy 0.99”, means Levy model 99th percentile, “hltbs 0.75”, means HLT model with bootstrapped sample at 75th percentile). These plots are included below. In consecutive plots, the worse performing model+percentile line is eliminated so that we can see more detail regarding better performing models. Also, the shape of error accumulation is instructive. The elimination order is (from most accumulated error to least accumulated error):

levy 0.99, levybs 0.99, levy 0.95, levybs 0.95, levy 0.90, levybs 0.90, levy 0.75, levybs 0.75, wbulbs 0.99, hltbs 0.99, wbul 0.99, hlt 0.99, gausbs 0.99, gaus 0.99, levy 0.50, levybs 0.50, gausbs 0.95, gaus 0.95, wbulbs 0.95, rand, wbul 0.95, gausbs 0.90, gaus 0.90, hltbs 0.95, hlt 0.95, levy 0.25, levybs 0.25, gausbs 0.75, gaus 0.75, wbul 0.90, wbulbs 0.90, hltbs 0.90, gaus 0.25, gausbs 0.25, wbulbs 0.25, wbul 0.25, randbs, hltbs 0.25, hlt 0.25, wbulbs 0.50, wbul 0.50, hlt 0.90, wbulbs 0.75, wbul 0.75, hltbs 0.50, hlt 0.50, hltbs 0.75, hlt 0.75, gausbs 0.50, gaus 0.50

Here is a list that performed worse than “rand”:

levy 0.99, levybs 0.99, levy 0.95, levybs 0.95, levy 0.90, levybs 0.90, levy 0.75, levybs 0.75, wbulbs 0.99, hltbs 0.99, wbul 0.99, hlt 0.99, gausbs 0.99, gaus 0.99, levy 0.50, levybs 0.50, gausbs 0.95, gaus 0.95, wbulbs 0.95

Here is a list that performed worse than “randbs”:

levy 0.99, levybs 0.99, levy 0.95, levybs 0.95, levy 0.90, levybs 0.90, levy 0.75, levybs 0.75, wbulbs 0.99, hltbs 0.99, wbul 0.99, hlt 0.99, gausbs 0.99, gaus 0.99, levy 0.50, levybs 0.50, gausbs 0.95, gaus 0.95, wbulbs 0.95, rand, wbul 0.95, gausbs 0.90, gaus 0.90, hltbs 0.95, hlt 0.95, levy 0.25, levybs 0.25, gausbs 0.75, gaus 0.75, wbul 0.90, wbulbs 0.90, hltbs 0.90, gaus 0.25, gausbs 0.25, wbulbs 0.25, wbul 0.25

Here is a list that performed better than “randbs”:

hltbs 0.25, hlt 0.25, wbulbs 0.50, wbul 0.50, hlt 0.90, wbulbs 0.75, wbul 0.75, hltbs 0.50, hlt 0.50, hltbs 0.75, hlt 0.75, gausbs 0.50, gaus 0.50

The vertical axis is the accumulated square error with units and values omitted as relative comparison is sufficient. The horizontal axis is enumerating estimates from first to last. Note that the same line style and line color does not represent the same model+percentile from plot to plot. Refer to the legend for identification of model and percentile.

all

All models and percentiles

1-eliminate levy99 levybs99

Eliminated levy 0.99 and levybs 0.99

2-eliminate levy95 and levybs95

Eliminated levy 0.95 and levybs 0.95

3-eliminate levy90 levybs90

Eliminated levy 0.90 and levybs 0.90

4-eliminate levy75 and levybs75

Eliminated levy 0.75 and levybs 0.75

5-eliminate wbulbs99

Eliminated wbulbs 0.99

6-eliminate hltbs99

Eliminated hltbs 0.99

7-eliminate wbul99

Eliminated wbul 0.99

8-eliminate hlt99

Eliminated hlt 0.99

9-eliminate gausbs99

Eliminated gausbs 0.99

10-eliminate levy50 and levybs50

Eliminated levy 0.50 and levybs 0.50

11-eliminate gausbs95

Eliminated gausbs 0.95

12-eliminate gaus95

Eliminated gaus 0.95

13-eliminate wbulbs95

Eliminated wbulbs 0.95

14-eliminate rand

Eliminated rand (displayed are all experiments that performed better than rand)

15-elimnate wbul95

Eliminated wbul 0.95

16-eliminate gausbs90

Eliminated gausbs 0.90

17-eliminate gaus90

Eliminated gaus 0.90

18-eliminate hltbs95

Eliminated hltbs 0.95

19-eliminate hlt95

Eliminated hlt 0.95

20-eliminate levy25

Eliminated levy 0.25

21-eliminate levybs25

Eliminated levybs 0.25

22-eliminate gausbs75

Eliminated gausbs 0.75

23-eliminate gaus75

Eliminated gaus 0.75

24-eliminate wbul90

Eliminated wbul 0.90

25-eliminate wbulbs90

Eliminated wbulbs 0.90

26-eliminate hltbs90

Eliminated hltbs 0.90

27-eliminate gaus25

Eliminated gaus 0.25

28-eliminate gausbs25

Eliminated gausbs 0.25

29-eliminate wbulbs25

Eliminated wbulbs 0.25

30-eliminate wbul25

Eliminated wbul 0.25

31-eliminate randbs

Eliminated randbs (displayed are all experiments that performed better than randbs)

32-eliminate hltbs25

Eliminated hltbs 0.25

33-eliminate hlt25

Eliminated hlt 0.25

34-eliminate wbulbs50

Eliminated wbulbs 0.50

35-eliminate wbul50

Eliminated wbul 0.50

36-elimnate hlt90

Eliminated hlt 0.90

37-eliminate wbulbs75

Eliminated wbulbs 0.75

38-eliminate wbul75

Eliminated wbul 0.75

39-eliminate hltbs50 and hlt50

Eliminated hltbs 0.50 and hlt 0.50

Experiment Analysis

Main concern is that experimental data is only one data set of 150 items. While the results are surprising, they may not be typical. I need other data sets to run this experiment on (please get in touch if you’re interested in testing your data set).

Regarding accuracy, the fitting of Levy distribution was fairly unsophisticated (calculate mean and variance of sample and use that to generate a Levy distribution). I didn’t expect Levy to perform well and as it was just background to testing the main hypothesis, I didn’t bother implementing a more sophisticated distribution fitting. Weibull distribution, on the other hand, is a pretty good fit as it uses least-squares fit to observed distribution. In summary, Levy distribution fit is crap, Weibull distribution fit should be pretty good. Gaussian distribution fit is straightforward, so it should also be good.

It is interesting to see the impact of outliers on estimation method (large spikes in error graphs). While an outlier destroys some estimators (one can observe points in the graphs where estimator makes a turn for the worse and rapidly departs from best performer), other estimators seem to be robust to outliers. Note that model may be robust or not depending on which percentile is used for estimation.

Another thing of note is that the best performing percentiles are 50th and 75th and not others. This attraction toward the average was a surprise.

Why does “pick the average so far” (more precisely, pick 50th percentile of estimated normal distribution from observed data without bootstrapping) work so well? I assume that part of it is due to normal distribution being robust to outliers, especially once there is enough data to anchor the distribution away from the outlier pretty well. I’m not sure why HLT 75th percentile is better than HLT 50th though.

Bootstrapped random estimator (rndmbs) performed really well, and it also has the same shape as the winning estimators. Note that rndmbs used the same random seed as rndm to select a number between zero and maximum in the sample. What most likely happened is the interplay between bootstrapping process (sampling with replacement) and random estimate being a random number between zero and maximum in the sample. Early on, before the outlier, bootstrapping did not change the maximum, and random estimators chose the same number up to the same maximum. Once an outlier occurred, we see that rndm selected it at least twice. However, it may be the case that the bootstrapping process for rndmbs did not pick the outlier into the bootstrapped sample, allowing rndmbs to pick from a smaller numeric range. From the plot of rndm, it looks like rndmbs only had to get lucky like this twice.

Experiment Learnings

It seems that in order to minimize error in estimates, the best thing to do is pick an estimator robust to outliers. In particular, the best (according to this data set), is to estimate a normal distribution from observed data and pick the mean. If this holds for other data sets, it means that we can all let go fancy statistical methods and use this very simple “pick the average” approach from now on. Imagine how much simpler our estimating lives could be ;).

Questions For The Future

Do these patterns hold for other data sets? If you have a data set, please get in touch.

The experiment only checks the estimation at the start of work (typically when we do estimation), but this doesn’t take into account the full HLT technique of continuous estimation. How good would these estimators be in a continuous (for example, once a day) estimation?

The experiment does not check if models get better at estimation as time progresses. This may be interesting to see.

Typically, when we estimate, the impact of finishing early and finishing late is asymmetric. What would be the results under different penalties for having estimates that are too optimistic (work actually takes longer than estimate).

What is the impact of choosing different seeds for bootstrapping as well as different seeds for estimators using random?

Things I Learned Riding The 2017 Tour Divide

antelope_wells

This summer, I rode my bicycle from Banff, Alberta, Canada to Antelope Wells, New Mexico, U.S. It ended up being 2783.6 miles, 165,700 feet of elevation gain, and it took 36 days 4 hours 51 minutes.

Screen Shot 2017-10-20 at 6.52.05 PM

Here’s what I learned (in no particular order)…

Horizons are closer than they appear

I kept making the same mistake over and over again, and that is, underestimating my ability to cover ground on a bicycle. I can recall numerous times when I looked from elevation onto the terrain around me, towards the horizon which my route would take me over, and think to myself that it’ll take me the rest of the day to get there. Many times, an hour or two later, I would be standing at that horizon looking at another one. There’s a metaphor in there somewhere, but I leave that up to you. What I remember is that horizons are closer than they appear.

Human civilization’s systemic layout

I’m not much of a camper. In fact, I only camped three times on the entire journey. The rest was spent in some sort of lodging accommodations, often accompanied by food resupply. While moving across the country, I kept particular attention to available water sources, since running out of water between water sources is not something I was interested in. I had a nice set of maps to work with throughout the route, so I wasn’t riding blind or anything. Nevertheless, this constant focus on supplies, after a while, gave me a weird sort of intuition about the layout of human civilization around me. I still struggle for words to describe what it feels like. It was a sort of awareness of where I was in the world. I had a constant awareness of, if something goes wrong, what recovery route to take, where is water, where are roads, where is next human settlement. A lot of my trip was going from one human settlement to another. This made me very aware that without those settlements, I wouldn’t last long. I certainly wouldn’t be able to cover ground as quickly as I did, repairing my bike when it broke down, or resting when I got tired. This awareness expanded, over time, to the things I encountered. I spent a lot of time riding on logging roads, so my brain learned “that’s where wood comes from.” I spent a lot of time riding through natural gas fields, mines (the ones where people dig into the ground for resources, not the exploding ones), even past a uranium mill. My brain learned “this is where energy comes from”. And always… cows and fields, everywhere. “This is where food comes from.” If you already have this awareness, none of this is illuminating, but, growing up in a city, I had a mental model for all this, but never felt it viscerally in my body. Being immersed in it, for as long as I was, on my tiny human scale, gave me an inner awareness of it all. For instance, I learned what services to expect at a specific settlement size, depending on what type of road it was on. There is definitely some sort of structure to human layout on the Earth. I got a glimpse of it to some extent.

What the fuck do I (and you) know?

Seeing this much of the country and interacting with all sorts of people… well, people that look like me most of the time (>__> )… anyway, regardless… all sorts of people (even within the sample I came across), really put in check my ideas about how things ought to be. That person in Montana, who lives there, hunts stuff, and lives their life thereabouts, seeing a glimpse of how they live their life, gave me a pretty good indicator that I have no clue how they live their life. Vice versa, they have no clue how I live mine here in Austin, TX. This was a humbling reminder. Also… everyone is super nice one-on-one.

Everyone is on their own epic journey

I noticed that everyone wanted to help me and make sure I was OK. Seriously, riding on a bicycle, obviously dressed like a long-distance traveller, really brings out the Samaritan in everyone. Additionally, seeing other travelers on the trail doing the same thing I was doing, going either the same direction or opposite, made me want to help them because it was obvious they were on an epic journey and I wanted them to be OK. At some point, I was able to make a mental leap that everyone of us is on an epic journey, except that we’re dressed normally, and our goals and constraints are much more complicated than riding a bicycle from point A to point B. Experiencing the journey gave me a tool to reach for in order to try to be better about helping other people. I just imagine them on a bicycle, covered in mud, riding somewhere along the route.

Tour Divide is one of the easiest things I’ve done, psychologically

To be clear, I trained for the journey, and I was in pretty good shape when I started training. But, what I’m talking about is the contrast of what my mind goes through when riding the Tour Divide, versus, living in the world. All I had to do was plan the next day, execute the plan (ride), made sure I was hydrated, feed myself, find shelter, and repeat, until done. Most of the time, the sole thought on my mind was “keep pedaling.” Mentally, it is simpler than pretty much any interaction I have now that I’m back in civilization. Navigating complexity of our modern human society is much more difficult and less satisfying. After some discussions about this particular learning, I did stumble upon a model that might possibly explain why this is the case.

Consider Daniel Pink’s “Autonomy, Mastery, Purpose” model for intrinsic human motivation along with Abraham Maslow’s Hierarchy of Needs model that progresses from most basic human needs to most complex: Physiological, Safety, Social Belonging, Esteem, Self-Actualization. When I was riding the bike, the needs I had to maintain were Physiological and Safety, i.e. don’t get hurt, don’t die, make it to next shelter. Achieving Autonomy, Mastery, and Purpose is rather straightforward for those needs (given proper preparation and supplies). I was at the height of happiness squatting by a mountain stream, filtering my water, and pumping it into my Camelback. That’s all it took for me to feel “I’m the boss of this! I can survive!”. In the evenings, however, once I found shelter and my Physiological and Safety needs were met, my brain started reaching for Social Belonging. Achieving Autonomy, Mastery, and Purpose in Social Belonging is difficult to do by yourself, and so the nights felt lonely in my shelter. Now, that I’m back in civilization, I’m working on Self-Actualization… that is orders of magnitude more difficult than lower sets of needs in the hierarchy. So, less happiness, less often for me.

Average knowledge vs. peak knowledge

Once I got into few days of the ride, I started imagining what would be something “extreme” for a person to do. “What if,” I thought, “I would ride down to New Mexico, and then turn around and ride back to Canada!”. That would be XTREME! Well, that’s because I was unfamiliar with what I was doing, I only had some average knowledge of Tour Divide and its possibilities. Turns out, that when I was riding, there was a person who was doing a double yo-yo. A yo-yo is starting at one end, going to the finish, then turning around and finishing where you started. This person was doing that twice. Another person, in a previous ride, started their ride from Costa Rica, so that by the time they made it to New Mexico, they’d be “in shape” to do well riding the route northbound.

It turns out that the most “extreme” thing I can imagine about something I’m unfamiliar with, is not extreme enough. If I have only average knowledge of something, I can’t imagine the possibilities. I can only imagine an average extreme. People, for whom this is their niche, do much much much more extreme things. They have peak knowledge of their niche, and it turns out that I can’t conceive of what the real peak extreme could be.

Honey Buns

Honey Buns turned out to be my main source of calories. They turned out to be the appropriate combination of calorie density per volume, not melting, as well as not requiring any external water to consume. There were days where all I ate was a Honey Bun per hour.

Colorado smells like weed

Yup. Pretty much that’s what I remember about Colorado. The woods smell like weed.

How Long Will It Take?

Estimating Software Work

Like many people who find themselves doing software development, I am sometimes asked to estimate when work will be completed. What I’m going to demonstrate is the best way I know of estimating software work completion. Next time someone asks “how long will it take?”, in the time it takes you to read this sentence, you’ll be able to answer with things like “when starting new work, there’s a 25% chance it will take us less than 3 days, 75% chance it will take us less than 37 days, and 90% chance it will take us less than 104 days” and be able to provide any other percentile you want.

First, a constraint I want in place is that people should not have to make any guesses about the nature of the work or the difficulty of the work in order to generate a reasonable estimate. This constraint is in place because in the value stream mapping sense, estimation is waste. Therefore, people not doing estimating eliminates the estimation waste.

Next, let’s go over the assumptions I’m going to make about the work. For the purpose of estimation:

Work is some problem to be solved. When the problem is solved, the work is completed.

Work is in the domain of software development. This is where my experience lies, this is the domain I’ve been asked to estimate.

The nature of the work does not matter. It can be a typo in information being displayed, or it can be a customer-facing availability outage for unknown reasons.

We do not know the probability distribution of the work. This last assumption will take some explaining.

Imagine that you have a record of the work you’ve completed over some period of time in the past, for example over the past two years you’ve completed 150 work items. For each work item (a solved problem), you have a start date and an end date. These give you a duration of the work, or how long a work item took to complete. So, in our example, you would have a list of 150 durations. If you were to create a histogram of work duration, you would see a duration distribution of the work. The assumption that “we do not know the probability distribution of the work” means that we do not know what duration distribution of the work will look like ahead of time. We might be able to determine the distribution only in hindsight. But, estimation does not happen in hindsight, therefore, at the time we have to make estimates we do not know the probability distribution of the work.

In case you think that work is normally distributed (as in, the typical bell curve that is easy to do statistics with), here is a histogram of 150 actual durations:

Now, 150 data points does not a large sample make. So, we’ll need to make another assumption.

Previously observed work durations are representative of the probability distribution of the work. That is, we assume that our past data comes from the same probability distribution of work that our future data will come from1.

If our past data is representative of the probability distribution of future data, we can go through a process of bootstrapping, and generate a much larger data set than 150. We do this by random sampling with replacement of our 150 point data set, to generate, say 1,000 point data set. Basically, we randomly pick one of the 150 points, add it to our 1,000 point data set (which now has one point in it), and put it back into the 150 point data set. We then randomly pick another one of the 150 points, add it to our 1,000 point data set (which now has two points in it), and put it back into the 150 point data set. We repeat until we sampled 1,000 points from the 150 points. What this gives us is an estimate of the “true” probability distribution of the work, given our previous assumptions.

With the resultant 1,000 points, we now sort them from shortest to longest duration. The 90th percentile answer for “how long will it take?” is the 90th percentile of the sorted 1,000 points, in our example, it turns out to be 103.73125 days, or under 104 days. That’s it. If you automate this, you’ll be able to rapidly provide any estimate of work completion to whatever percentile you’d like2.

One more thing…

There is an interesting, and I believe important, question to consider aside from “how long will it take?”. That is, “how long will it take to finish work you already have in progress?”. The answer is surprising (at least it was to me the first time I saw what happens). Let’s go through an example.

In this example, I will just use ten data points as our entire sample to illustrate what happens. Here they are, duration of completed work in days:

0.5, 0.5, 0.75, 1.0, 1.5, 3, 5, 7, 10, 21.5

Given the above historical data, consider now that you are about to start the next work item. In other words, all we know about the work item is that we haven’t started it yet. Therefore, we use all of our ten example data points to bootstrap a larger data set with 1,000 data points, and once we have that, we sort it, and then pick, for example, the 90th percentile. Nothing different from what we’ve already demonstrated.

However, now imagine that it is two days later, and we are still working on our work item. How would we answer the question of how long it will take us to finish? There is a key difference after two days of work, and that is that we have learned that our work item takes at least two days of work. When, after two days of work, we ask the question how long it will take us to finish, what we are really asking is “how long will it take to finish a work item that takes at least two days to finish?”. To answer this question, it makes no sense to use any data points that are less than two days in duration. Data points less than two days in duration clearly do not represent the type of work we are attempting to estimate completion of. If the work was of the type that takes less than two days to do, it would be finished already. So, without data points of less than two days, our data points to bootstrap from are now:

3, 5, 7, 10, 21.5

If you bootstrap from these data points, something interesting happens, and that is, that the 90th percentile will now very likely be further in the future than the estimate you gave when you asked the question two days prior. So, on day 0, when you haven’t started work, you used all data points, and the 90th percentile could end up being 10 days to finish. On day 2, when you worked for two days, using the newly learned information, we update our starting data set, and the 90th percentile could end up being 21.5 days to finish.

In fact, if you’re working on a work item, and every day you ask “how long will it take to finish?” the answer tends to be further and further in the future3.

Here is an example of the 90th percentile estimate for work item duration if we ask the question for the first thirty days, and the work item is not finished:

90th-percentile-estimate.png

Created using: https://plot.ly/create

Wait, what?!

What I presented here is, I think, a reasonable methodology for estimating work. It works at a level of solving problems, which is what “the business” usually cares about. It makes reasonable assumptions, it gives estimates as percentiles, which is better than a single estimate because we can adjust for our individual risk tolerance. Also, this method generates estimates without involvement of any human (once it is automated).

Then again, because this method is automated, it allows us to ask the question “how long will it take?” as often as we like. What we learn, is that every time we ask, the answer will be further in the future, and we will be more confident of the answer.

The best time to know when something is done, is when it is finished. But if you insist on asking, you might not like the answer.

Endnotes

1 More precisely, I am assuming that the past data comes from the same generator type that our future data will come from. The distinction, while interesting, isn’t important to the overall effort, so I won’t cover it further in this post.
2 To be even more… statistically valid (maybe?), you could regenerate your 1,000 point data set many times and take the average of the percentile you’re interested in across each generation. Then you can say you’re using a Monte Carlo method to derive your estimates.
3 I find this is a rather fascinating manifestation of the Lindy Effect.

Adding a “margin of safety” is not even wrong…

I just came across a blog post A Margin of Safety: How to Thrive in the Age of Uncertainty via the weekly High Scalability post. But, the advice there is wrong in a subtle way.

Basically, the author claims that if only engineers designing levees and flood walls added a “margin of safety” than a tragedy of flooding brought on by Katrina could have been avoided. I didn’t do the research that the author did on this, so I’m leaving that unchallenged. “Margin of safety” is actually a good advice for a specific context. The problem is when the author extrapolates it to how to use “margin of safety” in “real life.”

The author bundles together domains with completely different characteristics and applies the same heuristic as general advice across all of them. This is a mistake. While lifting weights may have properties where adding a margin of safety is reasonable, it is so because the failure mode and the range of lifting weights fits into normal probability distribution of things. You aren’t going to ever squat the weight of the Moon, or you are very unlikely to hurt yourself executing a movement with no weight if you’ve trained that same movement with weights before. The failure domain is for the most part predictable. Investing, on the other hand, is, for the most part, unpredictable, and has no bounds on what is likely and what is unlikely given enough time. No amount of safety margin will prevent you from ruin of the investment you make. Similarly, in the domain of project management itself (also mentioned by the author), there are predictable projects and unpredictable projects. A “margin of safety” will do nothing for you if you’ve attempted the impossible, and, for example, in the software world, it is not always easy to distinguish if what is being attempted is possible or will ever work as intended, especially if it’s novel. And while “margin of safety” sounds great in wildlife management, when there is existential risk involved, sometimes no margin is sufficient to prevent catastrophe.

Closing out along with the author’s closing point, “leave room for the unexpected” only points out that you are leaving room for the unexpected that you expect. There’s the unexpected unexpected (also more commonly referred to as the unknown unknown), which is a property of a complex system. “Margin of safety” doesn’t do much there, being able to react quickly and reinforce or dampen effects is much more important heuristic in such environments because the “margin of safety” is unknown at the time you have to decide what it is.

Dynamics Trumps Semantics

It could be time to reconsider our views on intentionality. We could learn to satisfy ourselves with building on what is sustainable, i.e. what can reliably be created, rather than what we ideally would prefer to have. Semantics can be bent more flexibly than dynamics, so it serves us to consider adjusting creative desire to that which is dynamically sound, rather than attempting the reverse. Darwin’s lesson to us was simple: that which can be sustained will outlive the things that cannot, regardless of their beauty or meaning. Dynamics trumps semantics.1

I’ve read Mark Burgess’ In Search of Certainty (source of the quote above) a while back, but the phrase dynamics trumps semantics has stuck with me. I keep seeing that phrase all around me, captured in familiar forms like:

Show, don’t tell.

Lead by example.

You can only control what you do.

The road to hell is paved with good intentions.

We know more than we can say…

A picture is worth a thousand words.

For my purposes here, dynamics means doing or motion of things, semantics means meaning of things.

So, as a thought experiment, I wanted to see the implications of dynamics trumps semantics. Previously, I’ve written about a model of communication for information sharing. In context of such a model, dynamics trumps semantics implies (to me) that communicating by doing should trump communicating of meaning. To simplify a bit further, what happens when we stop communicating what things mean?

All of us automatically attribute meaning to things we observe to some extent. If we stop communicating meaning, we are permitting people interpretations grounded entirely in their own experience, without biasing their meaning with our interpretation of what things mean. Interestingly, if our interpretation of meaning is “wrong”, and we don’t communicate meanings, we don’t propagate the “wrong” meaning to others. If their interpretation turns out to be more accurate, I assume we’d be able to observe that better accuracy through the actions they take and improve our own models accordingly.

Yet, we (or at least I) communicate meaning all the time, why is that?

In model of communication I mentioned context-specific jargon as a compression mechanism for communication. It seems to me that meaning is a different category of compression mechanism that we use, an orienting one, but not a very precise one. Interestingly, it seems that we can better communicate meaning obliquely, by doing instead of explaining the meaning. For example, think of the word love and how it means something different to each person. Typically, knowing what someone means by saying I love you (this phrase is a statement of what one person means to the other) only becomes apparent over time through their actions. A lot of heartache comes from actions not matching expectations because the meaning was communicated, instead of meaning being extracted (independently in the mind of the observer) from observed actions.

Dynamics trumps semantics is not that important in a stable world. If things don’t change that much, if you’re a human “back in the day” where the pace of change was effectively zero, meaning acquired over time and generations, became more and more accurate when assigned to the world around you. Dynamics trumps semantics didn’t mean much, because dynamics worked over many human lifetimes. However, we live in interesting times, where “impossible” things become possible many times over within a span of a single human lifetime. In times like these, dynamics trumps semantics seems much more relevant. We don’t have generations to let the meaning of things converge on the accurate representation of the world. Before we can make progress, new dynamics take effect, and render previous meanings moot. I’m curious if in a dynamic world like ours, not bothering much with communicating meaning can make us more effective in achieving the futures we want.

1 Burgess, Mark. In Search of Certainty: The Science of Our Information Infrastructure (p. 354). O’Reilly Media. Kindle Edition.

Integrated training in how we work

A request for comments…

This morning, my mind drifted towards an idea of how to offer training that other businesses pay for without becoming a consultancy. Becoming a consultancy is a non-goal. Declaring this a non-goal is my reaction to the inevitable drift from practice into selling a product based on lessons learned from organizing how to practice. A problem with that approach is that the shift to selling a (consultancy) product removes the participant from the practice of doing the thing, and into the practice of selling the consulting about doing the thing. I’m not staking out a moral high ground, I just don’t want to do the things that consultancy comes with.

The idea that I thought of this morning is not a new idea. It was new to me in this particular context. Naturally, as soon as I was able to formulate the idea in my head, I started seeing examples of this structure all around me.

I’m thinking about offering other companies to pay their employees to work for our company/organization. In exchange, the employees are indoctrinated into how we do work. We don’t go into a company as consultants and train within the context of that company, then leave. A reason not to do that is that there’s no way we’ll get enough context for the company within the time bounds of a consultancy engagement. This might not necessarily be due to limited time, but instead, due to the psychological positioning as a consultant, which always remains. In order to fully integrate ideas about how we do work, the employees need to be integrated into the context of our work, the purpose of our work, and then practice it for a number of months in order to integrate that tacit knowledge into their understanding of work. Once this integration takes place, they return to their original companies and then integrate the understanding to their new/old work environment.

I think that a period of one year would be ideal from perspective of our company. A period of three months would probably be ideal from the perspective of a customer company. So, perhaps a period of six months might work?

Regarding this concept not being new, companies send their employees on sabbaticals. Companies send their employees to grad schools. Any sort of liaison program is very similar to this. It’s like an internship paid for by the customer company. I think there’s just not many commercial companies that explicitly accept sabbatical employees with the intent to train them and then have them return to work for some other company, but I did … oh, about zero research on this (anyone got keywords I should use?).

You may well have no idea who I am and how we work or what our company does. I can go more into the details in the future if this concept is worth pursuing. I would like to know what you think? Would your company want to do something like this with someone somewhere? Would you feel that this is antithetical to developing your core competencies? What constraints would you want to be in place in order for your company to be comfortable doing this?