Something, something, computer programming…

What I find interesting about types is that they enable me to think in terms of patterns as opposed to in terms of specific examples. A type system, allows me to express something in terms of patterns, and the patterns can be arbitrarily abstract. Ironically, let’s look at some examples of patterns, i.e. types.

A Unit type can be thought of as a pattern of “something”. It conveys the notion that “something” exists. There is an instance of “something”.

It may help to think in terms of receiving an email in your inbox, but that you only saw the number of new messages increase by one. You haven’t read the email. You know nothing about it. You just know that you have another email in your inbox. That’s like Unit type. It conveys that “something” (some email) exists.

I tend to think of Unit type as a “signal”. If you imagine a light switch, a “signal” is not whether or not the light is on or off. A “signal” would be the flipping of the switch, the flip itself. Imagine you can’t see the light, you just hear the switch flipping. *flip* *flip* *flip*… three signals, three Unit types.

Ok, that might still be fairly abstract. Let’s contrast this with something more familiar, but let’s name it something really weird, like, the Sum type.

A Sum type can be thought of as “exclusive or” pattern. In other words, it can be this “something”, or that other “something”, but not both. For example, consider the notions of True and False. We say that something can be True or False but not both.

In fact, True or False (but not both) is of the type Sum with the shape of Unit + Unit. “Unit + Unit” means that the Sum type has space for two Units, but the fact that it’s exclusive or, means that it will only accept one Unit. True is defined by putting “something” (of type Unit) into the first space of Unit + Unit. False is defined by putting “something” (of type Unit) into the second space of Unit + Unit. What if you want to put “something” into both? You can’t, because by definition, we say that you can only put “something” into one of the spaces. Why is True putting “something” into the first space and not the second? The answer is that that’s the convention that most people who use Sum types use. You can use any convention you want, but it may be more difficult to understand what you’re communicating.

Remembering that Sum type describes a pattern of “exclusive or” helps me to remember how it works.

Going back to our light switch example, and to illustrate the difference between Sum and Unit, imagine that we now can tell whether the light is on or off. We can represent the pattern of knowing whether the light is on or off by Sum type with the shape Unit + Unit. If light is on, we will put “something” into the first space. If light is off, we will put “something” into the second space. It can’t be both on and off. All we need to put into one of the spaces is of Unit type, a “signal”. Remember that it is not the “signal” that tells us the light is on. The space the “signal” is in is what tells us whether the light is on or off (first space means light on, second space means light off). The nature of the “signal” itself is immaterial, we only care that “something” is in the space.

Why is it called “Sum” (as in “summation”) type? The name comes from how one would calculate the number of unique things that one can represent using a Sum type. For example, the Sum type Unit + Unit, can represent only one plus one, that is, two things. This is why it’s used for representing True and False, as those are exactly two things. If, for some reason, we wanted to represent four things, for example: Spring, Summer, Fall, Winter, I could use a Sum type of Unit + Unit + Unit + Unit. One plus one plus one plus one is four. And a season can (for our illustration purposes here) be either Spring, or Summer, or Fall, or Winter, but not more than one of those.

A Product type can be thought of as “and” pattern. In other words, it can be this “something” and that other “something” together.

A Product type that has two “somethings” would be Unit x Unit. “Unit x Unit” means that the Product type has space for two Units (“Unit x Unit x Unit” would mean that the Product type has space for three Units). For example, a weekend is Saturday and Sunday. We can represent Saturday by putting “something” into the first space and Sunday by putting “something” into the second space. Now, this is a somewhat not useful example of a Product. Let’s come up with a better example.

Remember our Sum type of Unit + Unit where we defined True and False? Let’s name that particular Sum type shape of Unit + Unit a Boolean type (it’s named after George Boole). Now that we have our Boolean type (which represents the notions of True and False), let’s define a more useful Product of the shape Boolean x Boolean. “Boolean x Boolean” means that the Product type has space for two Booleans. We’ll still think about the weekend, but this time, the first space will represent whether we are working on Saturday, and the second space will represent whether we are working on Sunday. So, if I’m working on Saturday and Sunday, I would represent that as True x True. If I’m working on Saturday, but not working on Sunday, I would represent that as True x False. Not working on Saturday, but working Sunday would be False x True. And, lastly, not working all weekend would be False x False.

Why is it called “Product” (as in “multiplication”) type? That’s because to calculate the number of unique things that one can represent using a Product type, we multiply the number of things that can be in the first space by the number of things that can be in the second space and so on. Notice, in our weekend representation of Unit + Unit, we could only put one thing in each space (Saturday and Sunday), so the number of things we could represent was one times one is one, the weekend. However, once we could put two things into each space, as in our example of whether we are working on the weekend, we could put two things into first space (True, False), and two things into second space (True, False). Two times two is four, and the Product type of Boolean x Boolean could represent four different work schedules over the weekend.

A Void type can be thought of as a “nothing” pattern. This pattern is either obvious to people, or very difficult to understand.

In the email inbox example, a Void type means that an email hasn’t arrived. You received no signal, “nothing” happened, no change at all.

In the light switch example, a Void type means that you can’t see if the light is on or off, and you can’t hear the flipping of the switch. It’s not that you will eventually hear or see something, but not yet. It’s that you will never hear or see anything. “Nothing” will happen. Void is the absence of any signal.

We now have some understanding of other types that can help us understand the nature of Void type. Imagine I have a Sum type with the shape of Void + Unit. “Void + Unit” means that the Sum type has *only one space*, and it is only the second space. There is no first space in Void + Unit type. How many things can you represent using Void + Unit type? It is zero plus one. You can only put zero things into the first space, because *there is no first space*. There is only second space, into which you can put one thing. Void type is analogous to zero.

To see this another way, consider a Product type of Void x Unit. How many things can you represent using Void x Unit type? It is zero times one, which would be zero. The first space doesn’t exist, it is of type Void, and therefore we have no way of constructing something that fits the pattern of “nothing and something”. The problem is that we cannot *construct* a *something* that fits the pattern of “nothing”, so we can never construct a *something* of Void x Unit type.

An arrow type is a pattern of “how things on the left side of the arrow relate to the things on the right side of the arrow” (but not the other way around). This sounds fairly abstract, let’s dive into an example.

Previously, we used the example type Product of Boolean x Boolean to describe a weekend work schedule. Let’s call this Product type a Schedule type. To build our example, we’ll also consider a Sum type of Unit + Unit, where putting Unit into the first space will mean worker Tristan, and putting Unit into the second space will mean worker Dale. Let’s call this Sum type a Worker type. Now, we can describe an Arrow type of Worker -> Schedule which is a pattern of “how Workers on the left side of the arrow relate to the Schedules on the right side of the arrow”.

Other common names for Arrow type are Exponential type, or Function type. The reason for “Exponential” name, is the same as for Sum and Product types, that is, it describes a way of how to count how many number of unique things one can represent using an Arrow type. Remember that our Arrow type is Worker -> Schedule. Worker type is a Sum type of Unit + Unit, which can represent, one plus one, so two things. Schedule type is a Product type of Boolean x Boolean, which can represent, two times two, so four things. The Arrow (“exponential”) type can represent Schedule ^ Worker number of things, or four to the power of two things, so 16 things. Let’s count them:

- Tristan -> (True, True), Dale -> (True, True)
- Tristan -> (True, True), Dale -> (True, False)
- Tristan -> (True, True), Dale -> (False, True)
- Tristan -> (True, True), Dale -> (False, False)
- Tristan -> (True, False), Dale -> (True, True)
- Tristan -> (True, False), Dale -> (True, False)
- Tristan -> (True, False), Dale -> (False, True)
- Tristan -> (True, False), Dale -> (False, False)
- Tristan -> (False, True), Dale -> (True, True)
- Tristan -> (False, True), Dale -> (True, False)
- Tristan -> (False, True), Dale -> (False, True)
- Tristan -> (False, True), Dale -> (False, False)
- Tristan -> (False, False), Dale -> (True, True)
- Tristan -> (False, False), Dale -> (True, False)
- Tristan -> (False, False), Dale -> (False, True)
- Tristan -> (False, False), Dale -> (False, False)

The reason for Arrow type to be called Function type is that Arrow type corresponds to what people mean by “function” in mathematics. If I have a thing of Arrow type, for instance, Tristan -> (True, True), Dale -> (True, True), then if I want to find out Tristan’s schedule, I would provide Tristan as input to the function, and the function would return the result (True, True).

Why call it an Arrow type then? There is a thing in mathematics called “up-arrow notation”, and it so happens that a single up-arrow in up-arrow notation corresponds to “exponential”. Discussing multiple arrows is out of scope of this post, but mentioned here for the curious.

A Value type can be thought of as a pattern in contrast to the Unit type pattern. Where Unit type was a “something” pattern, Value type is “this particular thing” pattern.

In the email inbox example, again, by contrast, where Unit type would be a signal that some new email arrived and we only care about the signal. Value type would be saying that a particular email arrived, and while we care that email arrived, we also care about the value, the particular contents of that particular email.

Another way of phrasing this, is that for a Unit type we only care about the signal. In the sentence “This thing exists”, what we focus on in Unit type is *exists*. For Value type, we focus on the entire sentence *this thing exists*, because we are trying to express the pattern that particular thing not only exists, but that it is a particular thing.

For example, think of the boolean True. Looking at True through the lens of “this particular thing”, we care that it is True, and that it is not False. The value of True is True.

Time to get weird.

Type type expresses the pattern of “a pattern” . We covered multiple examples of Type type. Unit is of type Type. Sum is of type Type. Product is of type Type.

There is an important concept to highlight. Earlier, we defined True and False as being of the type Boolean. What’s worth highlighting is that Boolean is of type Type, but True is of type Boolean.

Also, notice that the type Type is of type Type. This is because the pattern of “a pattern” fits the pattern of being “a pattern”.

Weirder…

Recall that when we talked about the Value type, we were expressing the pattern of “this particular thing”. If we have the type Boolean, and we have a particular boolean, say True, then the particular boolean True is of type Boolean, but it is also of type Value. This is because the boolean True fits the pattern of “booleans” and it fits the pattern of “particular thing”. “Fitting a pattern” is referred to as “inhabiting a type”. So, the boolean True inhabits the type Boolean and it inhabits the type Value. This is because it “fits the pattern of booleans” and it “fits the pattern of particular thing”.

Meta-weird…

Types are Values. This is because Type type (a pattern of “a pattern”) fits the pattern of being “a particular thing”. The type Type inhabits the type Value.

Values are Types. This is because Value type (a pattern of “a particular thing”) fits the pattern of being “a pattern”. The type Value inhabits the type Type.

Notice that the Unit type is the pattern of “something”. This means that everything that exists fits the pattern of being “something”, therefore everything that exists inhabits the type Unit.

It is worth highlighting the interplay of Void type and Unit type. The Void type itself inhabits type Type, inhabits type Value, and inhabits type Unit, because the pattern of “nothing” fits the pattern of being “a pattern” (Type), fits the pattern of being “a particular thing” (Value), and fits the pattern of being “something” (Unit). However, notice that there is nothing that can inhabit the type Void. This is because to fit the pattern of “nothing”, there can be nothing there. If there was something there, it wouldn’t be nothing.

Let’s stop before it gets weirder (like thinking about Arrow types with Void types), but this should be a fair introduction to the basic concepts with hints at where things start to get out of hand and we might need something more sophisticated than the english language.

While what I’ve described here (in english) is a description of a type system, there are different type systems that can be described, and they can differ from each other in subtle ways. The differences between them doesn’t make any of the other systems incorrect, nor does it make this description correct. But, hopefully, I managed to communicate some intuition about one kind of type system to you.

If something isn’t clear, please comment/respond and we’ll talk about it.

Also, thank you Dale Schumacher for pointing out errors in the early drafts and thinking through all this stuff with me.

Cheers!

]]>It turns out that this is exactly backwards with how we intuitively visualize time. As Jabe Bloom points out, the “set of columns reflects an ** inversion** of our innate understanding of the flow of time.”

Contrast this with what a Kanban board would look like if it was coherent with how we usually think about the time arrow.

Once I rearranged a Kanban board this way, I have a hard time thinking of it differently. There are multiple things that become coherent with this arrangement.

- The new arrangement is coherent with how left-to-right readers visualize the arrow of time. By coherent, I mean a very abstract coherence in the sense that “Happy” is coherent with “Up” or that “Sad” is coherent with “Down”.

- When “walking the board”, we are taught to walk “backwards” from DONE to TO DO. Well, in this new arrangement, there is nothing backwards about it. Walking the board becomes coherent with the board arrangement and the flow of time.
- The cards end up traveling from right to left. For some reason, that is more coherent with “pull”. To contrast with a standard Kanban board, cards traveling left to right seems to me more coherent with “push”.
- Looking at “TO DO” column in the future and on the right, feels more coherent with “TO DO” being our vision of the future that we are “pulling” into reality one card at a time. It also, to me, highlights better my opinion that a backlog is just a place where everything goes stale without us worrying about it.
- The clutter of a “TO DO” column seems easier to dismiss when it’s on the left. When it’s on the right, the clutter of “TO DO” goes from “we have a lot of work to do” to “we have no coherent view of what we want in the future.” The difference is very subtle, but I think it’s there.

Will changing your Kanban board this way make you 50% more productive? No. However, while I see no compelling reason for the predominant TO DO, DOING, DONE arrangement, there seems coherence to be gained by switching to DONE, DOING, TO DO.

**Endnotes:**

^{1} Bloom, Jabe (2012). The Moment of Pull – Meditations on time and the movement of cards. Retrieved 9 Feb 2018.

^{2} While this Kanban board arrangement came to me while reading Jabe’s “The Moment of Pull,” it is not a new idea. For example, see: Rybing, Tomas (2015). Mirrored Kanban Board. Retrieved 9 Feb 2018.

TL;DR: It turns out that the HLT method minimizes estimation error better than every other tested method except one, which is… *drumroll*… “pick the average so far” as the estimate. Read below for details and caveats. *Also, it would be very helpful to run this experiment on many data sets instead of just the one I used, please contact me if you can provide a data set to run this experiment on.*

**Experiment Objective**

Determine usefulness of estimating software work using percentile estimates based solely on observed past data as described in How Long Will It Take? .

**Experiment Hypothesis**

HLT estimates are better than random estimates. (Spoiler: they are! … or more correctly: experiment results do not refute this hypothesis)

**Experiment Metric**

Sum of square error, which will be the difference between estimated duration and actual work item duration, squared.

**Experiment Design**

The experiment is a simulation of what would the estimates be at specific times in the past. Given a data set of work start and stop times, simulation starts after completion of first work item and ends after completion of last work item in the data set.

Experiment uses multiple estimation models. The model with the least cumulative sum of square error is deemed the best. Where appropriate, models are tracked per 25th, 50th, 75th, 90th, 95th, and 99th percentiles. Models used in the experiment are:

**HLT**: Estimation method described in How Long Will It Take?.**Levy**: Estimation method that assumes distribution of observed work item durations can be described as a Levy distribution. This model is included to showcase a terrible model.**Gaussian**: Estimation method that assumes distribution of observed work item durations can be described as a Gaussian/Normal distribution. This model is included to showcase a “dumb” model as a sanity check.**Random**: Estimation method that simply picks a random number between zero and longest duration observed so far. This model is included to provide a baseline to compare against.**Weibull**: Estimation method that assumes distribution of observed work item durations can be described as a Weibull distribution. This model is included because it is seems to be the go-to model used by people who take estimation seriously.

In addition to the above models, each model is also tested with and without sample bootstrap to a sample size of 1000, as described in How Long Will It Take?

Data set used for the experiment is Data Set 1, consisting of 150 work item start and stop times.

Each simulation is performed using the following procedure:

- Using Data Set 1, create a timeline of start and stop events to playback.
- Playback the timeline created in 1.
- Upon observing a work item start event, notify the estimation model of work item start. If the model can generate an estimate (model must observe two completed work items prior to generating an estimate), compare the generated estimate with actual known duration, calculate the square error, and record it.
- Upon observing a work item stop event, notify the estimation model of work item stop.

- Continue playback until the timeline is exhausted.

**Experiment Results**

A rich way to demonstrate the results is to plot the cumulative sum of square error for each model and each percentile together (model+percentile, e.g.: “levy 0.99”, means Levy model 99th percentile, “hltbs 0.75”, means HLT model with bootstrapped sample at 75th percentile). These plots are included below. In consecutive plots, the worse performing model+percentile line is eliminated so that we can see more detail regarding better performing models. Also, the shape of error accumulation is instructive. The elimination order is (from most accumulated error to least accumulated error):

levy 0.99, levybs 0.99, levy 0.95, levybs 0.95, levy 0.90, levybs 0.90, levy 0.75, levybs 0.75, wbulbs 0.99, hltbs 0.99, wbul 0.99, hlt 0.99, gausbs 0.99, gaus 0.99, levy 0.50, levybs 0.50, gausbs 0.95, gaus 0.95, wbulbs 0.95, rand, wbul 0.95, gausbs 0.90, gaus 0.90, hltbs 0.95, hlt 0.95, levy 0.25, levybs 0.25, gausbs 0.75, gaus 0.75, wbul 0.90, wbulbs 0.90, hltbs 0.90, gaus 0.25, gausbs 0.25, wbulbs 0.25, wbul 0.25, randbs, hltbs 0.25, hlt 0.25, wbulbs 0.50, wbul 0.50, hlt 0.90, wbulbs 0.75, wbul 0.75, hltbs 0.50, hlt 0.50, hltbs 0.75, hlt 0.75, gausbs 0.50, gaus 0.50

Here is a list that performed *worse* than “rand”:

levy 0.99, levybs 0.99, levy 0.95, levybs 0.95, levy 0.90, levybs 0.90, levy 0.75, levybs 0.75, wbulbs 0.99, hltbs 0.99, wbul 0.99, hlt 0.99, gausbs 0.99, gaus 0.99, levy 0.50, levybs 0.50, gausbs 0.95, gaus 0.95, wbulbs 0.95

Here is a list that performed *worse* than “randbs”:

levy 0.99, levybs 0.99, levy 0.95, levybs 0.95, levy 0.90, levybs 0.90, levy 0.75, levybs 0.75, wbulbs 0.99, hltbs 0.99, wbul 0.99, hlt 0.99, gausbs 0.99, gaus 0.99, levy 0.50, levybs 0.50, gausbs 0.95, gaus 0.95, wbulbs 0.95, rand, wbul 0.95, gausbs 0.90, gaus 0.90, hltbs 0.95, hlt 0.95, levy 0.25, levybs 0.25, gausbs 0.75, gaus 0.75, wbul 0.90, wbulbs 0.90, hltbs 0.90, gaus 0.25, gausbs 0.25, wbulbs 0.25, wbul 0.25

Here is a list that performed *better* than “randbs”:

hltbs 0.25, hlt 0.25, wbulbs 0.50, wbul 0.50, hlt 0.90, wbulbs 0.75, wbul 0.75, hltbs 0.50, hlt 0.50, hltbs 0.75, hlt 0.75, gausbs 0.50, gaus 0.50

The vertical axis is the accumulated square error with units and values omitted as relative comparison is sufficient. The horizontal axis is enumerating estimates from first to last. Note that the same line style and line color does not represent the same model+percentile from plot to plot. Refer to the legend for identification of model and percentile.

**Experiment Analysis**

**Main concern** is that *experimental data is only one data set of 150 items**. *While the results are surprising, they may not be typical. I need other data sets to run this experiment on (please get in touch if you’re interested in testing your data set).

**Regarding accuracy**, the fitting of Levy distribution was fairly unsophisticated (calculate mean and variance of sample and use that to generate a Levy distribution). I didn’t expect Levy to perform well and as it was just background to testing the main hypothesis, I didn’t bother implementing a more sophisticated distribution fitting. Weibull distribution, on the other hand, is a pretty good fit as it uses least-squares fit to observed distribution. In summary, **Levy distribution fit is crap, Weibull distribution fit should be pretty good. Gaussian distribution fit is straightforward, so it should also be good.**

It is interesting to see the **impact of outliers on estimation method** (large spikes in error graphs). While an outlier destroys some estimators (one can observe points in the graphs where estimator makes a turn for the worse and rapidly departs from best performer), other estimators seem to be robust to outliers. Note that model may be robust or not depending on which percentile is used for estimation.

Another thing of note is that the best performing percentiles are 50th and 75th and not others. This **attraction toward the average** was a surprise.

**Why does “pick the average so far”** (more precisely, pick 50th percentile of estimated normal distribution from observed data without bootstrapping) **work so well?** I assume that part of it is due to normal distribution being robust to outliers, especially once there is enough data to anchor the distribution away from the outlier pretty well. I’m not sure why HLT 75th percentile is better than HLT 50th though.

**Bootstrapped random estimator (rndmbs) performed really well, and it also has the same shape as the winning estimators.** Note that rndmbs used the same random seed as rndm to select a number between zero and maximum in the sample. What most likely happened is the interplay between bootstrapping process (sampling with replacement) and random estimate being a random number between zero and maximum in the sample. Early on, before the outlier, bootstrapping did not change the maximum, and random estimators chose the same number up to the same maximum. Once an outlier occurred, we see that rndm selected it at least twice. However, it may be the case that the bootstrapping process for rndmbs did not pick the outlier into the bootstrapped sample, allowing rndmbs to pick from a smaller numeric range. From the plot of rndm, it looks like rndmbs only had to get lucky like this twice.

**Experiment Learnings**

It seems that in order to minimize error in estimates, the best thing to do is pick an estimator robust to outliers. In particular, the best (according to this data set), is to estimate a normal distribution from observed data and pick the mean. If this holds for other data sets, it means that we can all let go fancy statistical methods and use this very simple “pick the average” approach from now on. Imagine how much simpler our estimating lives could be ;).

**Questions For The Future**

Do these patterns hold for other data sets? If you have a data set, please get in touch.

The experiment only checks the estimation at the start of work (typically when we do estimation), but this doesn’t take into account the full HLT technique of continuous estimation. How good would these estimators be in a continuous (for example, once a day) estimation?

The experiment does not check if models get better at estimation as time progresses. This may be interesting to see.

Typically, when we estimate, the impact of finishing early and finishing late is asymmetric. What would be the results under different penalties for having estimates that are too optimistic (work actually takes longer than estimate).

What is the impact of choosing different seeds for bootstrapping as well as different seeds for estimators using random?

]]>This summer, I rode my bicycle from Banff, Alberta, Canada to Antelope Wells, New Mexico, U.S. It ended up being 2783.6 miles, 165,700 feet of elevation gain, and it took 36 days 4 hours 51 minutes.

Here’s what I learned (in no particular order)…

I kept making the same mistake over and over again, and that is, underestimating my ability to cover ground on a bicycle. I can recall numerous times when I looked from elevation onto the terrain around me, towards the horizon which my route would take me over, and think to myself that it’ll take me the rest of the day to get there. Many times, an hour or two later, I would be standing at that horizon looking at another one. There’s a metaphor in there somewhere, but I leave that up to you. What I remember is that horizons are closer than they appear.

I’m not much of a camper. In fact, I only camped three times on the entire journey. The rest was spent in some sort of lodging accommodations, often accompanied by food resupply. While moving across the country, I kept particular attention to available water sources, since running out of water between water sources is not something I was interested in. I had a nice set of maps to work with throughout the route, so I wasn’t riding blind or anything. Nevertheless, this constant focus on supplies, after a while, gave me a weird sort of intuition about the layout of human civilization around me. I still struggle for words to describe what it feels like. It was a sort of awareness of where I was in the world. I had a constant awareness of, if something goes wrong, what recovery route to take, where is water, where are roads, where is next human settlement. A lot of my trip was going from one human settlement to another. This made me very aware that without those settlements, I wouldn’t last long. I certainly wouldn’t be able to cover ground as quickly as I did, repairing my bike when it broke down, or resting when I got tired. This awareness expanded, over time, to the things I encountered. I spent a lot of time riding on logging roads, so my brain learned “that’s where wood comes from.” I spent a lot of time riding through natural gas fields, mines (the ones where people dig into the ground for resources, not the exploding ones), even past a uranium mill. My brain learned “this is where energy comes from”. And always… cows and fields, everywhere. “This is where food comes from.” If you already have this awareness, none of this is illuminating, but, growing up in a city, I had a mental model for all this, but never felt it viscerally in my body. Being immersed in it, for as long as I was, on my tiny human scale, gave me an inner awareness of it all. For instance, I learned what services to expect at a specific settlement size, depending on what type of road it was on. There is definitely some sort of structure to human layout on the Earth. I got a glimpse of it to some extent.

Seeing this much of the country and interacting with all sorts of people… well, people that look like me most of the time (>__> )… anyway, regardless… all sorts of people (even within the sample I came across), really put in check my ideas about how things ought to be. That person in Montana, who lives there, hunts stuff, and lives their life thereabouts, seeing a glimpse of how they live their life, gave me a pretty good indicator that I have no clue how they live their life. Vice versa, they have no clue how I live mine here in Austin, TX. This was a humbling reminder. Also… everyone is super nice one-on-one.

I noticed that everyone wanted to help me and make sure I was OK. Seriously, riding on a bicycle, obviously dressed like a long-distance traveller, really brings out the Samaritan in everyone. Additionally, seeing other travelers on the trail doing the same thing I was doing, going either the same direction or opposite, made me want to help them because it was obvious they were on an epic journey and I wanted them to be OK. At some point, I was able to make a mental leap that everyone of us is on an epic journey, except that we’re dressed normally, and our goals and constraints are much more complicated than riding a bicycle from point A to point B. Experiencing the journey gave me a tool to reach for in order to try to be better about helping other people. I just imagine them on a bicycle, covered in mud, riding somewhere along the route.

To be clear, I trained for the journey, and I was in pretty good shape when I started training. But, what I’m talking about is the contrast of what my mind goes through when riding the Tour Divide, versus, living in the world. All I had to do was plan the next day, execute the plan (ride), made sure I was hydrated, feed myself, find shelter, and repeat, until done. Most of the time, the sole thought on my mind was “keep pedaling.” Mentally, it is simpler than pretty much any interaction I have now that I’m back in civilization. Navigating complexity of our modern human society is much more difficult and less satisfying. After some discussions about this particular learning, I did stumble upon a model that might possibly explain why this is the case.

Consider Daniel Pink’s “Autonomy, Mastery, Purpose” model for intrinsic human motivation along with Abraham Maslow’s Hierarchy of Needs model that progresses from most basic human needs to most complex: Physiological, Safety, Social Belonging, Esteem, Self-Actualization. When I was riding the bike, the needs I had to maintain were Physiological and Safety, i.e. don’t get hurt, don’t die, make it to next shelter. Achieving Autonomy, Mastery, and Purpose is rather straightforward for those needs (given proper preparation and supplies). I was at the height of happiness squatting by a mountain stream, filtering my water, and pumping it into my Camelback. That’s all it took for me to feel “I’m the boss of this! I can survive!”. In the evenings, however, once I found shelter and my Physiological and Safety needs were met, my brain started reaching for Social Belonging. Achieving Autonomy, Mastery, and Purpose in Social Belonging is difficult to do by yourself, and so the nights felt lonely in my shelter. Now, that I’m back in civilization, I’m working on Self-Actualization… that is orders of magnitude more difficult than lower sets of needs in the hierarchy. So, less happiness, less often for me.

Once I got into few days of the ride, I started imagining what would be something “extreme” for a person to do. “What if,” I thought, “I would ride down to New Mexico, and then turn around and ride back to Canada!”. That would be XTREME! Well, that’s because I was unfamiliar with what I was doing, I only had some average knowledge of Tour Divide and its possibilities. Turns out, that when I was riding, there was a person who was doing a double yo-yo. A yo-yo is starting at one end, going to the finish, then turning around and finishing where you started. This person was doing that twice. Another person, in a previous ride, started their ride from Costa Rica, so that by the time they made it to New Mexico, they’d be “in shape” to do well riding the route northbound.

It turns out that the most “extreme” thing I can imagine about something I’m unfamiliar with, is not extreme enough. If I have only average knowledge of something, I can’t imagine the possibilities. I can only imagine an average extreme. People, for whom this is their niche, do much much much more extreme things. They have peak knowledge of their niche, and it turns out that I can’t conceive of what the real peak extreme could be.

Honey Buns turned out to be my main source of calories. They turned out to be the appropriate combination of calorie density per volume, not melting, as well as not requiring any external water to consume. There were days where all I ate was a Honey Bun per hour.

Yup. Pretty much that’s what I remember about Colorado. The woods smell like weed.

]]>Like many people who find themselves doing software development, I am sometimes asked to estimate when work will be completed. What I’m going to demonstrate is the best way I know of estimating software work completion. Next time someone asks “how long will it take?”, in the time it takes you to read this sentence, you’ll be able to answer with things like “when starting new work, there’s a 25% chance it will take us less than 3 days, 75% chance it will take us less than 37 days, and 90% chance it will take us less than 104 days” and be able to provide any other percentile you want.

First, a constraint I want in place is that **people should not have to make any guesses** about the nature of the work or the difficulty of the work in order to generate a reasonable estimate. This constraint is in place because in the value stream mapping sense, estimation is waste. Therefore, people not doing estimating eliminates the estimation waste.

Next, let’s go over the assumptions I’m going to make about the work. For the purpose of estimation:

**Work is some problem to be solved**. When the problem is solved, the work is completed.

**Work is in the domain of software development.** This is where my experience lies, this is the domain I’ve been asked to estimate.

**The nature of the work does not matter**. It can be a typo in information being displayed, or it can be a customer-facing availability outage for unknown reasons.

**We do not know the probability distribution of the work**. This last assumption will take some explaining.

Imagine that you have a record of the work you’ve completed over some period of time in the past, for example over the past two years you’ve completed 150 work items. For each work item (a solved problem), you have a start date and an end date. These give you a duration of the work, or how long a work item took to complete. So, in our example, you would have a list of 150 durations. If you were to create a histogram of work duration, you would see a duration distribution of the work. The assumption that “we do not know the probability distribution of the work” means that we do not know what duration distribution of the work will look like ahead of time. We *might* be able to determine the distribution *only in hindsight*. But, estimation does not happen in hindsight, therefore, at the time we have to make estimates we do not know the probability distribution of the work.

In case you think that work is normally distributed (as in, the typical bell curve that is easy to do statistics with), here is a histogram of 150 actual durations:

Now, 150 data points does not a large sample make. So, we’ll need to make another assumption.

**Previously observed work durations are representative of the probability distribution of the work.** That is, we assume that our past data comes from the same probability distribution of work that our future data will come from^{1}.

If our past data is representative of the probability distribution of future data, we can go through a process of bootstrapping, and generate a much larger data set than 150. We do this by *random sampling with replacement* of our 150 point data set, to generate, say 1,000 point data set. Basically, we randomly pick one of the 150 points, add it to our 1,000 point data set (which now has one point in it), and put it back into the 150 point data set. We then randomly pick another one of the 150 points, add it to our 1,000 point data set (which now has two points in it), and put it back into the 150 point data set. We repeat until we sampled 1,000 points from the 150 points. What this gives us is an estimate of the “true” probability distribution of the work, *given our previous assumptions*.

With the resultant 1,000 points, we now sort them from shortest to longest duration. The 90th percentile answer for “how long will it take?” is the 90th percentile of the sorted 1,000 points, in our example, it turns out to be 103.73125 days, or under 104 days. That’s it. If you automate this, you’ll be able to rapidly provide any estimate of work completion to whatever percentile you’d like^{2}.

There is an interesting, and I believe important, question to consider aside from “how long will it take?”. That is, “how long will it take to finish work you already have in progress?”. The answer is surprising (at least it was to me the first time I saw what happens). Let’s go through an example.

In this example, I will just use ten data points as our entire sample to illustrate what happens. Here they are, duration of completed work in days:

`0.5, 0.5, 0.75, 1.0, 1.5, 3, 5, 7, 10, 21.5`

Given the above historical data, consider now that you are about to start the next work item. In other words, all we know about the work item is that we haven’t started it yet. Therefore, we use all of our ten example data points to bootstrap a larger data set with 1,000 data points, and once we have that, we sort it, and then pick, for example, the 90th percentile. Nothing different from what we’ve already demonstrated.

However, now imagine that it is two days later, and we are still working on our work item. How would we answer the question of how long it will take us to finish? There is a key difference after two days of work, and that is that *we have learned that our work item takes at least two days of work*. When, after two days of work, we ask the question how long it will take us to finish, what we are really asking is “how long will it take to finish a work item that takes at least two days to finish?”. To answer this question, it makes no sense to use any data points that are less than two days in duration. Data points less than two days in duration clearly do not represent the type of work we are attempting to estimate completion of. If the work was of the type that takes less than two days to do, it would be finished already. So, without data points of less than two days, our data points to bootstrap from are now:

`3, 5, 7, 10, 21.5`

If you bootstrap from these data points, something interesting happens, and that is, that the 90th percentile will now very likely be further in the future than the estimate you gave when you asked the question two days prior. So, on day 0, when you haven’t started work, you used all data points, and the 90th percentile could end up being 10 days to finish. On day 2, when you worked for two days, using the newly learned information, we update our starting data set, and the 90th percentile could end up being 21.5 days to finish.

In fact, if you’re working on a work item, and every day you ask “how long will it take to finish?” the answer tends to be further and further in the future^{3}.

Here is an example of the 90th percentile estimate for work item duration if we ask the question for the first thirty days, and the work item is not finished:

What I presented here is, I think, a reasonable methodology for estimating work. It works at a level of solving problems, which is what “the business” usually cares about. It makes reasonable assumptions, it gives estimates as percentiles, which is better than a single estimate because we can adjust for our individual risk tolerance. Also, this method generates estimates without involvement of any human (once it is automated).

Then again, because this method is automated, it allows us to ask the question “how long will it take?” as often as we like. What we learn, is that every time we ask, the answer will be further in the future, and we will be more confident of the answer.

The best time to **know** when something is done, is when it is finished. But if you insist on asking, you might not like the answer.

^{1} More precisely, I am assuming that the past data comes from the same *generator type* that our future data will come from. The distinction, while interesting, isn’t important to the overall effort, so I won’t cover it further in this post.

^{2} To be even more… statistically valid (maybe?), you could regenerate your 1,000 point data set many times and take the average of the percentile you’re interested in across each generation. Then you can say you’re using a Monte Carlo method to derive your estimates.

^{3} I find this is a rather fascinating manifestation of the Lindy Effect.

Basically, the author claims that if only engineers designing levees and flood walls added a “margin of safety” than a tragedy of flooding brought on by Katrina could have been avoided. I didn’t do the research that the author did on this, so I’m leaving that unchallenged. “Margin of safety” is actually a good advice *for a specific context*. The problem is when the author extrapolates it to how to use “margin of safety” in “real life.”

The author bundles together domains with completely different characteristics and applies the same heuristic as general advice across all of them. This is a mistake. While lifting weights may have properties where adding a margin of safety is reasonable, it is so because the failure mode and the range of lifting weights fits into normal probability distribution of things. You aren’t going to ever squat the weight of the Moon, or you are very unlikely to hurt yourself executing a movement with no weight if you’ve trained that same movement with weights before. The failure domain is for the most part predictable. Investing, on the other hand, is, for the most part, unpredictable, and has no bounds on what is likely and what is unlikely given enough time. No amount of safety margin will prevent you from ruin of the investment you make. Similarly, in the domain of project management itself (also mentioned by the author), there are predictable projects and unpredictable projects. A “margin of safety” will do nothing for you if you’ve attempted the impossible, and, for example, in the software world, it is not always easy to distinguish if what is being attempted is possible or will ever work as intended, especially if it’s novel. And while “margin of safety” sounds great in wildlife management, when there is existential risk involved, sometimes no margin is sufficient to prevent catastrophe.

Closing out along with the author’s closing point, “leave room for the unexpected” only points out that you are leaving room for the unexpected that you expect. There’s the unexpected unexpected (also more commonly referred to as the unknown unknown), which is a property of a complex system. “Margin of safety” doesn’t do much there, being able to react quickly and reinforce or dampen effects is much more important heuristic in such environments because the “margin of safety” is *unknown* at the time you have to decide what it is.

It could be time to reconsider our views on intentionality. We could learn to satisfy ourselves with building on what is sustainable, i.e. what can reliably be created, rather than what we ideally would prefer to have. Semantics can be bent more flexibly than dynamics, so it serves us to consider adjusting creative desire to that which is dynamically sound, rather than attempting the reverse. Darwin’s lesson to us was simple: that which can be sustained will outlive the things that cannot, regardless of their beauty or meaning. Dynamics trumps semantics.^{1}

I’ve read Mark Burgess’ *In Search of Certainty* (source of the quote above) a while back, but the phrase *dynamics trumps semantics* has stuck with me. I keep seeing that phrase all around me, captured in familiar forms like:

*Show, don’t tell.*

*Lead by example.*

*You can only control what you do.*

*The road to hell is paved with good intentions.*

*We know more than we can say…*

*A picture is worth a thousand words.*

For my purposes here, *dynamics* means *doing* or *motion of things*, *semantics* means *meaning of things*.

So, as a thought experiment, I wanted to see the implications of *dynamics trumps semantics*. Previously, I’ve written about a model of communication for information sharing. In context of such a model, *dynamics trumps semantics* implies (to me) that communicating by *doing* should trump communicating of *meaning*. To simplify a bit further, what happens when we stop communicating what things *mean*?

All of us automatically attribute *meaning* to things we observe to some extent. If we stop communicating *meaning*, we are permitting people interpretations grounded entirely in their own experience, without biasing their *meaning* with our interpretation of what things *mean*. Interestingly, if our interpretation of *meaning* is “wrong”, and we don’t communicate *meanings*, we don’t propagate the “wrong” *meaning* to others. If their interpretation turns out to be more accurate, I assume we’d be able to observe that better accuracy through the actions they take and improve our own models accordingly.

Yet, we (or at least I) communicate *meaning* all the time, why is that?

In model of communication I mentioned context-specific jargon as a compression mechanism for communication. It seems to me that *meaning* is a different category of compression mechanism that we use, an orienting one, but not a very precise one. Interestingly, it seems that we can better communicate meaning *obliquely*, by *doing* instead of explaining the *meaning.* For example, think of the word *love* and how it means something different to each person. Typically, knowing what someone *means* by saying *I love you* (this phrase is a statement of what one person *means* to the other) only becomes apparent over time through their *actions*. A lot of heartache comes from *actions* not matching expectations because the *meaning* was communicated, instead of *meaning* being extracted (independently in the mind of the observer) from observed *actions*.

*Dynamics trumps semantics* is not that important in a stable world. If things don’t change that much, if you’re a human “back in the day” where the pace of change was effectively zero, *meaning* acquired over time and generations, became more and more accurate when assigned to the world around you. *Dynamics trumps semantics* didn’t mean much, because *dynamics* worked over many human lifetimes. However, we live in interesting times, where “impossible” things become possible many times over within a span of a single human lifetime. In times like these, *dynamics trumps semantics* seems much more relevant. We don’t have generations to let the *meaning* of things converge on the accurate representation of the world. Before we can make progress, new *dynamics* take effect, and render previous *meanings* moot. I’m curious if in a dynamic world like ours, not bothering much with communicating *meaning* can make us more effective in achieving the futures we want.

^{1} Burgess, Mark. In Search of Certainty: The Science of Our Information Infrastructure (p. 354). O’Reilly Media. Kindle Edition.

This morning, my mind drifted towards an idea of how to offer training that other businesses pay for without becoming a consultancy. Becoming a consultancy is a non-goal. Declaring this a non-goal is my reaction to the inevitable drift from practice into selling a product based on lessons learned from organizing how to practice. A problem with that approach is that the shift to selling a (consultancy) product removes the participant from the practice of doing the thing, and into the practice of selling the consulting about doing the thing. I’m not staking out a moral high ground, I just don’t want to do the things that consultancy comes with.

The idea that I thought of this morning is not a new idea. It was new to me in this particular context. Naturally, as soon as I was able to formulate the idea in my head, I started seeing examples of this structure all around me.

I’m thinking about offering other companies to pay their employees to work for our company/organization. In exchange, the employees are indoctrinated into how we do work. We don’t go into a company as consultants and train within the context of that company, then leave. A reason not to do that is that there’s no way we’ll get enough context for the company within the time bounds of a consultancy engagement. This might not necessarily be due to limited time, but instead, due to the psychological positioning as a consultant, which always remains. In order to fully integrate ideas about how we do work, the employees need to be integrated into the context of our work, the purpose of our work, and then practice it for a number of months in order to integrate that tacit knowledge into their understanding of work. Once this integration takes place, they return to their original companies and then integrate the understanding to their new/old work environment.

I think that a period of one year would be ideal from perspective of our company. A period of three months would probably be ideal from the perspective of a customer company. So, perhaps a period of six months might work?

Regarding this concept not being new, companies send their employees on sabbaticals. Companies send their employees to grad schools. Any sort of liaison program is very similar to this. It’s like an internship paid for by the customer company. I think there’s just not many commercial companies that explicitly accept sabbatical employees with the intent to train them and then have them return to work for some other company, but I did … oh, about zero research on this (anyone got keywords I should use?).

You may well have no idea who I am and how we work or what our company does. I can go more into the details in the future if this concept is worth pursuing. I would like to know what you think? Would your company want to do something like this with someone somewhere? Would you feel that this is antithetical to developing your core competencies? What constraints would you want to be in place in order for your company to be comfortable doing this?

]]>When I speak of focus on communication, I mean that both the information sender and the information receiver have a shared mental model for communication that resembles something like the above. I also expect them to understand that the purpose of communication in this context is to share information. Most importantly, I expect the sender and receiver to understand all the ways in which the communication process can result in failure to transmit the intended information.

In this model, there are three sources of error in communication. The first source of error can occur when information is encoded into a signal, for example, when we translate an idea in our minds into words, the many meanings of the words may or may not correspond to what we mean to communicate. The second source of error can occur when noise alters the signal, for example, having a conversation in a loud room can result in the recipient not hearing a word correctly, or perhaps not hearing a word at all. The third source of error can occur when information is decoded from signal back into information, for example, for words with multiple meanings, the recipient may choose to select a meaning that the sender did not intend.

What is important to notice is that even in the total absence of noise, we still have two sources of error, the sender encoding the information, and the receiver decoding it.

Before we get into how to correct encoding and decoding errors, let’s consider how can we detect errors in communication in the first place.

One effective way of detecting errors is to ask for a confirmation brief from the receiver. In other words, ask someone to repeat back to you what you just told them. Using this method, the recipient encodes the information received into a signal, the signal (plus noise) reaches the original sender, who then decodes the signal into information. The original sender then compares the original intended information with what they decoded from the confirmation brief and can decide whether or not the information is close enough to what they intended or whether further clarification is needed.

The confirmation brief method of error detection can be initiated by the sender of the information, or by the recipient. Also note that saying “I understand” is insufficient to be considered a confirmation brief as it allows for no comparison by the sender.

So, how do we correct encoding and decoding errors when two people are attempting to communicate with each other? For one, encoding and decoding are done by separate people. The error correction can happen (or can fail to happen) independently of the sender and the recipient. The mechanism of encoding and decoding typically involves assumptions about what the other person means to communicate and how they mean to communicate it. The sender, chooses words^{2} and other media to encode information into a signal and using confirmation brief observes how effectively the recipient understood the intended information. When further clarification is needed, the sender can provide more detail, but alternatively, the sender can choose to encode information using different words, or different media in order to help the recipient understand. A good sender is a person who can modify their encoding to make continual progress in reducing the gap between what they intended to communicate and what they receive in the confirmation brief. Similarly, on the receiving side, a good receiver is a person who can modify their decoding to make continual progress in reducing the gap between what they provide in the confirmation brief and what the sender intends to communicate. The sender confirms understanding once the sender decides that the confirmation brief is close enough to what they intended to communicate.

In brief, “focus on communication” requires participants to have a mental model for communication and awareness of all sources of error that can impede communication. Participants should continually strive to improve the repertoire of encodings and decodings available to them. Effective communicators should continually strive to improve the methods for constructing and adapting information encoding and decoding processes.

^{1} What I present here is a human-centric simplification of Claude E. Shannon’s ideas that gave rise to the field of Information Theory. For his seminal paper and a much more detailed treatment of communication see C. E. Shannon (1948). A Mathematical Theory of Communication. *The Bell System Technical Journal, *Vol. 27, pp. 379–423, 623–656, July, October, 1948*.*

^{2} Notice that emergence of a context-specific jargon is a mechanism for reducing encoding and decoding errors by defining a dictionary of words and their context-specific meanings ahead of communication. However, it only works if the interpretation of the jargon meanings actually matches between sender and receiver. Furthermore, in order to arrive at the common interpretation of jargon, sender and receiver still need to go through the communication process to establish the common interpretation in the first place.

For as long as I remember, I was interested in the question of how I should do things. I assume that some form of this question tugs at all of us and that each of us explores it to various depths depending on what is going on in our individual lives. While I don’t think I’m done figuring things out, it seems to me, based on conversations with others, that I have assembled enough information that it is worth sharing.

We’ll start at the conclusion. Here it is, my answer to how I should do things.

*I want to thrive in a complex world.*

There is a lot to unpack, so let’s get to it.

First, the answer does not really seem like an answer at all. We have a *how* question that is answered by *I want*, what should we make of that? While I would like to claim some deep insight, I can’t. I sort of stumbled onto it. In retrospect, the reason *I want* works as an answer, has to do with the idea of obliquity, as defined by John Kay in his essay:

Strange as it may seem, overcoming geographic obstacles, winning decisive battles or meeting global business targets are the type of goals often best achieved when pursued indirectly. This is the idea of Obliquity. (…) Obliquity is characteristic of systems that are complex, imperfectly understood, and change their nature as we engage with them.^{1}

So, my answer is oblique. However, while John Kay names the idea of obliquity, he does not provide a mechanism to determine what specific oblique approach to take for doing things. Also, notice that obliquity is a “characteristic of systems that are *complex*” (emphasis mine). There’s that word again, *complex*. It appears both in my answer to how I should do things, and in my very loose association with obliquity that I claim makes it ok for a *how* question to be answered with an *I want* answer. Complexity is indeed the key. In order to really understand what *I want to thrive in a complex world* means and why it is appropriate, it is important to understand complexity. What follows next is an exposition of the idea of complexity through the lens of my personal journey to understand it.

For contrast, before we begin to define complexity, let’s first examine a simple, non-complex system. Consider the following sequence of numbers:

*4, 5, 6, 7, 8, 9, 10, …*

The “*…*” symbol at the end indicates that the sequence continues indefinitely. We observe that the sequence starts with “*4*”. To learn the next item in the sequence, we observe that we can add “*1*” to the previous number “*4 + 1 = 5*”, and we get “*5*”. Once we are at “*5*”, we observe that to learn the next item in the sequence, we can add “*1*” to it “*5 + 1 = 6*”, and we get “*6*”. Once we are at “*6*”, we observe that to learn the next item in the sequence, we can add “*1*” to it “*6 + 1 = 7*”, and we get “*7*”, and so on.

Now consider the following question:

What will be the number that is 50 places after “*10*” in the sequence?

We could, beginning with “*10*”, add “*1*”, and repeat that 50 times. However, because this is a non-complex system, we can analyze it and arrive at a shortcut of “*10 + (1 * 50) = 60*”. The number in the 50th place after “*10*” in the sequence is “*60*”. This is a crucial point. We did not have to calculate every step in the sequence between “*10*” and “*60*” in order to arrive at “*60*”. Because this is a non-complex system, we can analyze it and predict any item in the sequence without going through the entire sequence step by step.

Let’s imagine that the sequence we’ve been working with was measured from some real system and our measurements have error associated with them. For this example, we’ll assume our measurements are accurate to within 0.1. The sequence now is something like:

*4.1, 5.1, 5.9, 7.0, 8.1, 8.9, 10.1, …*

With this measurement error, our analysis would not end up with a rule to add “*1*” to the previous number, but instead we would use the mean (average) of the differences between the numbers in the sequence.

*5.1 – 4.1 = 1.0
5.9 – 5.1 = 0.8
7.0 – 5.9 = 1.1
8.1 – 7.0 = 1.1
8.9 – 8.1 = 0.8
10.1 – 8.9 = 1.2
…*

In this case, let’s assume that the mean turns out to be “*1.0*” with a standard error of “*+/- 0.1*”.

Notice that despite the measurement error, we can still figure out the number that is 50 places after “*10*” in the sequence using our shortcut of “*10.1 + (1.0 * 50) = 60.1*”. The standard error is “*0.1 * 50 = 5*”. The number that is 50 places after “*10*” in the sequence is “*60 +/- 5*”. The margin of error contains our ideal “*60*”. What about the number that is 100,000 places after “*10*” in the sequence? “*10.1 + (1.0 * 100,000) = 100,010.1*”. The standard error is “*0.1 * 100,000 = 10,000*”. We get “*100,000 +/- 10,000*”, and the margin of error contains our ideal “*100,010*”. The original measurement error does not hinder our ability to predict items in the sequence without going through the entire sequence step by step. We still get meaningful results, and we can meaningfully quantify how much error is in our prediction.

Furthermore, if we were to measure our system again, and assuming again that our measurements are accurate to within 0.1. Another sequence of measurements could be something like:

*3.9, 4.9, 5.9, 7.1, 8.1, 8.9, 9.9, …*

Using the measurements above, let’s assume our analysis again ends up with a rule to add “*1.0*” with a standard error of “*+/- 0.1*”.

When we want to figure out the number that is 50 places after “*10*” in this new sequence, based on our previous analysis, we can calculate “*9.9 + (1.0 * 50) = 59.9*”. The standard error is “*0.1 * 50 = 5*”. We get “*60 +/- 5*”, with the margin of error containing our ideal “*60*”. What about the number 100,000 places after “*10*”? We calculate “*9.9 + (1.0 * 100,000) = 100,009.9*”. The standard error is “*0.1 * 100,000 = 10,000*”. We get “*100,000 +/- 10,000*”, and the margin of error contains our ideal “*100,010*”. So, not only do we continue to get meaningful results, our predictions continue to lead us to similar results (within the margin of error) despite our initial error in measuring “*10*” in the sequence.

In summary, a non-complex system is predictable. It is predictable in a way that we can determine (within the margin of error) what the system will look like any number of steps in the future, without going through every step between our starting point and our prediction. Additionally, a non-complex system is predictable in a way that multiple predictions of the system any number of steps into the future will be close to each other (within the margin of error) despite starting with slightly different initial values due to measurement error.

In order to show an example of a complex system, we’ll have to get a little more… well, complex. I will present a set of equations, but then, to illustrate the complexity of the system we will look at a visual simulation. While the underlying details will be mathematically sound, I will avoid mathematics as much as possible. Instead, I will explain this complex system in terms of visuals.

First, the equations describing a deterministic complex system:

*dx/dt = 10(y – x)
dy/dt = x(28 – z) – y
dz/dt = xy – 2.66z*

The above equations describe a Lorenz system^{3}, developed in 1963 as a simplified mathematical model for atmospheric convection (how air moves in the atmosphere). Here is what the system looks like when graphed over time (the animation starts over after twenty seconds):

What we are seeing is the values of *x* and *y* graphed over time. The center of the animation corresponds to *x = 0*, *y = 0*, and *z = 0*. To the left of center are negative *x* values. To the right of center are positive *x* values. Below the center are negative *y* values. Above the center are positive *y* values. This animation only shows *x* values (horizontal axis) and *y* values (vertical axis). The *z* values are not depicted. The animation starts with values *x = 0.1*, *y = 0.1*, and *z = 0.1*. The line drawn gives us some intuition about what shape the system takes over time.

Before we go on, let’s highlight the first important aspect of a complex system. Given this Lorenz system example, starting with *x = 0.1*, *y = 0.1*, and *z = 0.1*, consider the question:

*What will be the x, y, and z values 50 time steps after start?*

If you recall, in our non-complex system example, we were able to analyze the system and determine that all we were doing was adding “*1*” at each step. This analysis gave us a shortcut to compute the state of the system 50 steps ahead without having to go through every step. Instead, we used our shortcut of “*start + (1 * 50) = answer*”. The first important aspect of a complex system is that such a shortcut *does not exist*.

What this means, is that in order to figure out what the *x*, *y*, and *z* values are 50 steps after start, we have to go through (that is, calculate) each one of the 50 steps to get the answer. Similarly, to figure out what the values are 100,000 steps after start, we have to go through all 100,000 steps to get the answer. The only way to see what the system will do (in this example, what will be the *x*, *y*, and *z* values at some point in the future) is to either observe the system, or to simulate it, and see where it ends up.

The next animation shows the same exact Lorenz system I described previously, but this time, the animation shows only part of the line drawn. This animation draws the exact same values as the previous animation, but it shows only the most recent points instead of leaving the entire line drawn. This is so we can more easily see what I’m about to demonstrate next.

In order to illustrate what happens with an initial measurement error in a complex system, we will draw 20 systems together at the same time. The previous animation showed one Lorenz system of equations. This next animation shows 20 Lorenz systems of equations, each system of equations starting with exactly the same *x*, *y*, and *z* values (I will add measurement error in later animations). This animation is meant to demonstrate that if we start with exactly the same values, each step we calculate will be exactly the same. That is, all 20 systems will end up with the same *x*, *y*, and *z* values 50 steps from the start, 100,000 steps from the start, and so on. You’ll notice that the drawn points look thicker. This is meant to illustrate the 20 systems drawn all having the same value at the same point in time.

What happens if we introduce measurement error? In this example, the way I’ll demonstrate measurement error is to perturb the starting positions of *x*, *y*, and *z* of each of the 20 systems by a little bit. We will call this perturbation “*spread*”. We will use the following formula to adjust the starting point of *x*, *y*, and *z*:

*x = 0.1 + (random(0, 1) * 2 * spread) – spread
y = 0.1 + (random(0, 1) * 2 * spread) – spread
z = 0.1 + (random(0, 1) * 2 * spread) – spread*

*random(0, 1)* means a random number between *0* and *1*. When we set “*spread = 0.1*”, one example of how much we’ll perturb the initial *x* value is *0.035241377710758706*. This means, that *x*, instead of starting with *0.1*, would in this case start with *0.1 + 0.035241377710758706 = 0.135241377710758706*. We then similarly perturb starting value of *y* using a new random number between *0* and *1*. We then perturb starting value of *z*. After we have perturbed values of *x*, *y*, and *z*, we use those as starting point for the first system (instead of *x = 0.1*,* y = 0.1*, and *z = 0.1*). We follow the same procedure for the remaining 19 systems.

With *spread = 0.1*, here is what the 20 systems look like:

Notice that the systems begin together, but after a while, we can see them drift away from each other. While the systems started really close together, pretty soon, each system ends up arbitrarily far away from all the other systems. This illustrates the other crucial aspect of complex systems. Where the system ends up in the future is highly sensitive to where the system starts. Tiny differences in start conditions can lead to arbitrarily large differences in where the systems end up.

To illustrate this point further, let’s see what happens if instead of perturbing initial systems with *spread = 0.1*, we make the perturbations smaller. In other words, what happens if the 20 systems begin closer together?

Below is an animation with “*spread = 0.01*” (ten times less spread than before):

Below is an animation with “*spread = 0.001*” (ten times less spread than before):

Below is an animation with “*spread = 0.0001*” (ten times less spread than before):

Below is an animation with “*spread = 0.00001*” (ten times less spread than before):

Below is an animation with “*spread = 0.000001*” (ten times less spread than before):

Next is an animation with “*spread = 0.0000001*” (ten times less spread than before, one million times less spread than our first perturbation). This means that the system starting positions are only changed by a tiny amount, somewhere around *0.0000001*. For example, instead of *x = 0.1* at start, we would have *x = 0.10000003385421758612446* at start. This is also a good time to consider what measurements are you capable of with this level of accuracy? Here’s what 20 systems look like with this tiny level of initial starting position difference:

In fact, no matter how small the difference in the initial start position, if there is any difference at all in the initial start position, our 20 simulated Lorenz systems will end up arbitrarily far away from each other. This is profoundly important. Imagine that one of these 20 Lorenz systems is the real system, and the other 19 systems are our simulations of the real system (so that we can try to predict what happens in a real system). This sensitivity to differences in the initial start position means that if there is any error at all in our measurement of the real system, our simulations will diverge arbitrarily far away from what the real system will do.

I hope at this point, I have demonstrated two important properties of complex systems:

- There are no shortcuts to figure out where a complex system will end up in the future. We have to either observe the system itself, or simulate every step of its evolution.
- If there is any measurement error between a real complex system and our simulation of it, our simulation of it will diverge arbitrarily far away from the real system.

So far, I have demonstrated systems in form of abstract mathematical examples. The intent behind this is to demonstrate how complexity arises even in an ideal system where we know (because we define) everything about the system. The world has many more components, features, and types of interactions than the systems I described thus far. The world is much more complex. How can we, as humans, make sense of the complexity in the world around us?

The mathematical examples above are too abstract to inform my daily decisions as a human being. To actually make sense of the world and to make decisions in the world, I needed a different model. This is where I found Dave Snowden’s sense-making^{4} Cynefin^{5} framework to be useful.

Cynefin framework offers decision models depending on the nature of the system under consideration from the observer’s point of view. The nature of systems is considered from the perspective of domains. Cynefin framework offers the Ordered domain, Complex domain, Chaotic domain, and the domain of Disorder. Furthermore, the Ordered domain is divided into Simple and Complicated domains. See the figure below (the center unlabeled area is the domain of Disorder).

As before, prior to considering the Complex domain, let’s discuss simpler domains of the Cynefin framework for context.

In systems that appear to the observer to belong to the Simple domain, cause and effect relationships exist, are predictable and are repeatable. The observer perceives this cause and effect relationship as self-evident. An applicable decision model here is Sense-Categorize-Respond. This is the domain where application of Best Practice is valid. Best Practice implies that there is *the* best approach to respond with.

In systems that appear to the observer to belong to the Complicated domain, cause and effect relationships exist but are not self-evident, and therefore require expertise. An applicable decision model here is Sense-Analyze-Respond. The domain requires analysis in order to decide a course of action. This is the domain where application of Good Practice is valid. The difference between Good Practice and Best Practice is such that Good Practice consists of multiple approaches where each are valid given some level of expertise. This is in contrast with Best Practice, where there exists the best approach.

The Ordered domain (Simple and Complicated) corresponds to the mathematical example of a simple system discussed previously where cause and effect are predictable. What corresponds to the mathematical example of a complex system is the Complex domain, where cause and effect are not predictable.

In systems that appear to the observer to belong to the Complex domain, cause and effect relationships are only obvious in hindsight, with unpredictable, emergent outcomes. An applicable decision model in the Complex domain is Probe-Sense-Respond. Probing is conducted through safe-to-fail experiments. If we sense the experiment is pushing the system towards a desired outcome (succeeding), our response is to amplify it. If we sense the experiment is pushing the system towards undesired outcome (failing), our response is to dampen it. The experiment shouldn’t be conducted without identification of amplification and dampening strategies in advance. Otherwise, we will not be able to exploit desirable outcomes or dampen the undesirable ones. This amplification and damping of safe-to-fail experiments leads towards an Emergent Practice, novel and unique in some way.

In systems that appear to the observer to belong to the Chaotic domain, no cause and effect relationships can be determined. An applicable decision model in the Chaotic domain is Act-Sense-Respond. The goal is to stabilize the situation as quickly as possible.

The domain of Disorder is the perspective of not knowing which domain the system being observed is in.

Another important aspect of the Cynefin framework is the transition between the Simple domain and the Chaotic domain. Unlike transitions between other domains, it is useful to think of a transition from Simple to Chaotic domain as falling from a cliff.

This indicates the danger of the Simple domain perspective that can lead to overconfidence in the belief that systems are simple, that systems are predictable, that past success guarantees future success, etc. In such case, the system drifts towards the transition and can enter the Chaotic domain in form of an unforeseeable crisis, accident, or a failure, where recovery is expensive. This transition area can be thought of as the “complacent zone.”

Notice that in discussing Cynefin domains, I have framed everything from the point of view of an observer. That is, the perception of the domain is dependent on who is doing the perceiving. Unfamiliar systems can appear Complex or Chaotic to one observer while being merely Complicated or Simple to another. While it may be possible that the systems themselves have intrinsic complexity, as in the mathematical complex system example discussed before, in order to make a decision, the perspective of the decision-maker is the only one available to the decision-maker. Hence, the way the decision-maker perceives the system domain dictates the decision model used.

With an intuition for complex systems and the context of the Cynefin framework, we can now meaningfully determine if we live in a complex world.

One approach to determine whether we live in a complex world would be to consider the world from the perspective of a human-technology-nature system of systems. That is, humans exist in the world, and there exist human systems. Technology exists in the world, and there exist technology systems. Nature (not human and not technology) exists in the world, and there exist natural systems. Humans, technology, and nature interact, and there exist human-technology-nature systems. There are many of these human-technology-nature systems that interact together. Hence, the world can be thought of as a human-technology-nature system of systems. We can therefore attempt to determine what Cynefin domain describes the world.

Can the world be thought of as belonging to the Simple domain? While there may exist examples of human-technology-nature systems in the world where cause and effect are obvious (although I have a hard time coming up with any), every system in the world would have to belong to the Simple domain for us to consider the world in its entirety to belong to the Simple domain. This is not the case. We already demonstrated a system (Lorenz equations) that does not belong to the Simple domain, therefore, in order to thrive, thinking of the world as belonging to the Simple domain is insufficient.

Can the world be thought of as belonging to the Chaotic domain? Recall that the Chaotic domain was defined as the domain where there is no relationship between cause and effect. In general, we assume there exist causes and effects, and we do actually perceive relationships between cause and effect in the world. Therefore, in order to thrive, thinking of the world as belonging to the Chaotic domain is also insufficient.

So, in order to thrive, thinking of the world system of systems as belonging to the Simple domain, or the Chaotic domain, is insufficient.

Can the world be thought of as belonging to the Complicated domain? Recall that in a complex system, there are no shortcuts to figure out where a complex system will end up in the future and that any measurement error between a real complex system and our simulation of it will diverge arbitrarily far from the real system itself. We already demonstrated a system with these properties. Therefore, in order to thrive, it is insufficient to think of the world as belonging to the Complicated domain.

In order to thrive, it is necessary to perceive the world as belonging to the Complex domain. While it is true that some systems in the world may present themselves as belonging to the Simple, Complicated, or Chaotic domains, the world system of systems does demonstrate causes leading to effects (even if only in retrospect) and the types of interactions that exist are complex enough to need an explanation other than Simple or Complicated.

What if the world does not belong to the Complex domain? What would be the cost of our mistake as compared to the alternatives? Let’s consider the question whether we live in a complex world from the perspective that we actually want to use the answer to pick a decision-making approach.

What is the utility of perceiving the world as belonging to the Simple domain? Simple domain implies the Sense-Categorize-Respond mode of decision-making. While this is a very simple and quick decision-making model, it will lead us astray if the system we are interacting with belongs to one of the other domains (Complicated, Complex, or Chaotic). I can’t think of any examples of self-evident human-technology-nature systems. Additionally, there is a danger of the Simple domain perspective that leads to an unforeseeable transition to the Chaotic domain resulting in an expensive crisis.

If we perceive the world as belonging to the Chaotic domain, that implies Act-Sense-Respond mode of decision-making. This is a very immediate mode of operation concerned only with the present and the desire to exit the Chaotic domain as soon as possible. Recall that the Chaotic domain was defined as the domain where there is no relationship between cause and effect. We just do, until the situation stabilizes. While this is a valid way to model the world, we typically want more from our approach to deciding in the world than just doing things without expecting we will have some desirable effect.

What is interesting about the Simple and Chaotic domains is that they tend to reflect the observer’s lack of understanding of the system. I mentioned already that I can’t think of real examples of Simple systems. Placing a system in the Simple domain seems to be a simplifying assumption to enable us to consider interactions of more systems than we would be able to otherwise. However, as mentioned before, there is a risk associated with the Simple domain resulting in expensive crisis. On the other hand, placing a system in the Chaotic domain seems to be done from position of ignorance. That is, we do not understand the cause and effect. We inherently do believe that there exist causes and effects in the world. A system in the Chaotic domain therefore, seems to tell us more about perspective of the observer rather than of the system itself. From this point of view, choosing to approach the world from the perspective of Chaotic domain (as opposed to being forced into it) seems to be a choice of willful ignorance. The cost of willful ignorance seems higher than other approaches, but more importantly, it is insufficient in order to thrive in the world.

Perceiving the world to be in the Complicated domain implies that given enough analysis, we can predict what will happen in the future. The decision-making mode applicable is Sense-Analyze-Respond. This is where after analysis, we have a Good Practice which we can reuse to exploit the system we are interacting with. This is a less-costly approach than Complexity domain’s Probe-Sense-Respond (which results in an Emergent Practice), but only if our analysis is correct and the system ends up in the state we predicted it should. Herein lies the problem of using Complicated domain approach. It works until it doesn’t. It implies that we understand everything there is to understand about the system and that we know what will happen to the system in the future. Much like Simple domain simplification can lead to an expensive crisis, so can Complicated domain not-as-much-simplification approach lead to an expensive crisis. This happens when the system is influenced by factors that we failed to analyze. For example, consider stock market crashes, car accidents, deadlines, unintended consequences of any kind, etc. The existence of insurance indicates that Complicated domain assumption is insufficient.

If we perceive the world as belonging to the Complex domain, it implies the Probe-Sense-Respond mode of decision-making. We expect the unexpected. We attempt to dampen or reinforce the unexpected results of our probes in course of ongoing safe-to-fail experiments. This is a more costly approach if the system turns out to belong to the Complicated domain for some time period. There is also the difficulty of determining what is safe-to-fail. What is safe-to-fail for a tribe, for example, having one person taste some new plant that the tribe stumbled across, may not be safe-to-fail for the person doing the tasting if the plant turns out to be poisonous. Determination of what is safe-to-fail in the first place itself requires a Probe-Sense-Respond mode of decision-making. The mistakes made from a Complex decision-making approach are in a sense, more aware, than those made from a Complicated decision-making approach. In Complex domain approach, we are aware we can make mistakes (failed experiments), whereas in a Complicated domain approach, mistakes are unforeseen when, eventually, the Complicated domain assumption turns out to be invalid once the perceived system is influenced by factors that were not part of analysis in scope of Complicated domain’s Sense-Analyze-Respond decision-making.

The cost of assuming the world belongs to the Complex domain is less than assuming otherwise.

It is necessary to think of the world as Complex, and it is insufficient to think of the world as Simple, Complicated, or Chaotic. To thrive in the world, it is necessary to wield the Probe-Sense-Respond mode of decision-making. It is necessary to conduct safe-to-fail experiments, amplifying desired outcomes and dampening undesirable ones. This implies optimizing for learning. To optimize for learning, it is useful to focus on systems awareness, communication, and minimizing feedback loops. Because we are people, it is also useful to remember that people are the ones who do the things.

^{1} Kay, John (2014). Obliquity. Retrieved 5 Dec 2016.

^{2} In this example, I am using the word “complex” to describe, what in mathematics, is called “chaos”. Part of the difficulty of understanding complexity is due to different words being used in different domains of our knowledge to describe the same thing. Because in non-mathematical contexts the word that is used is “complex”, I am sticking with “complex” here for the sake of consistency.

^{3} See https://en.wikipedia.org/wiki/Lorenz_system. The parameters picked here are specifically the ones resulting in chaotic solutions to the Lorenz system.

^{4} There is an important distinction between a categorization framework and a sense-making framework. In a categorization framework, framework precedes data. In a sense-making framework, data precedes the framework. Sense-making is “a motivated, continuous effort to understand connections (which can be among people, places, and events) in order to anticipate their trajectories and act effectively”. See: Klein, G., Moon, B. and Hoffman, R.F. (2006). Making sense of sensemaking I: alternative perspectives. *IEEE Intelligent Systems*, 21(4), 70–73.

^{5} Snowden, Dave (2010). The Cynefin Framework. Retrieved 5 Dec 2016. Also see: Snowden, Dave (2009). How to organize a Children’s Party. Retrieved 5 Dec 2016.