Thought Leaders

Why Data Is Essential For Agile Today

female programmer data

In the Agile community, we hear talk about the importance of outcomes. And yet, the Agile community is largely silent on the topic of data.

With today’s increased availability of data, it is possible to discern cause and effect more finely from product decisions in ways that were not possible 20 years ago when the Agile Manifesto was written.

First, let’s distinguish superficial outcomes from more meaningful ones. Customer success is defined as an outcome generated from solving a genuine customer problem, such as improving one’s credit score.

Suppose a financial services company wants to provide a service to help its customers to improve their credit score. The effect is not immediate, as improving one’s credit score takes time.

One feature of the service might enable someone to consolidate their debt under a single low-interest loan, thereby paying off some of their credit cards and reducing their net monthly payment.

Another feature might optimize the interest being earned on the customer’s deposit accounts. If a customer uses these actions, their credit score might improve after a while, but not right away.

Whether or not a customer sees, clicks, or uses (e.g. “converted”) said features are all potential leading indicators of customer success, but the outcome indicator, which is a lagging indicator, is whether or not their credit score actually improved.

While leading indicators are important for revealing the use of a service, a company’s success (as determined by corporate outcomes such as profit, more customers, greater customer retention, etc.) is more closely tied to the lagging indicators.

Somehow the importance of data for assessing lagging indicators and therefore outcomes has been entirely missed by most Agile authors! Nowhere in the Agile Manifesto is data mentioned.

In articles and books about Agile methods, one seldom hears about data. There is a lot of talk about code. We hear about “refactoring”, we hear about testing and we hear about software “developers” – people who write code.

But the word “data” is seldom spoken. We also hear a lot of talk about experiments and hypothesis-driven approaches, but again, no talk of data.

It is as if we are traveling through Arizona in the US, and remarking on the mountains, again and again, but fail to observe the Grand Canyon. Somehow our gaze is always in one direction, and so we miss this monumental chasm that is there right beside us.

The Agile community often talks about A/B testing: the practice of releasing two versions of a product feature to see which one users like better. DevOps practices make A/B testing easy, and so it is standard practice today.

But A/B testing is handicapped by a lack of data. We are not merely talking about which feature version users like more: we are talking about actual customer outcomes.

Simple A/B testing will not work for assessing the longer-term outcomes of those features. The issue is not whether a feature gets used more, the question the company wants to answer is, do these features actually produce the desired outcome?

In our example of a credit score improvement service, the question to be answered is, do the service’s features actually improve our customers’ credit scores over time? If so, that is something we can advertise, which will help us to retain and attract more customers.

To prove that the service works, the company needs data on the outcome: do the credit scores of its customers actually increase when they use the service?

To obtain that data, the company needs to correlate a lot of disparate data: data about its customers’ use of its services, and data about the credit score history of its customers.

It probably also needs data about other behaviour of its customers, because customers might be doing other things that affect their scores: that is a source of noise in the data that needs to be filtered out.

A simple Agile story such as “I want to show that use of our services improves our customers’ credit score” will not cut it. A developer will not know how to implement that. You might as well write a story, “I want a unicorn to walk into the room”.

To approach the problem, a data strategy needs to be mapped out: where can we get this data? Which data do we have, and which do we need to start collecting? Which other products will need to collect data, and store it in a way that we can use for correlation?

This is an information modelling problem. The software is important but secondary. Yet Agile is silent on how to approach such an endeavour.

Surely a sprint will not work: there seems to be a need for some R&D-like activity, to research and possibly prototype the collection and aggregation of the needed data. Other product areas might need to be enlisted in the effort since some of the data is generated by other financial service products.

One might try to fit this into an Agile model by saying, “That is just a spike: there needs to be a pilot effort to build that and see if it can be done”.

But that misses the point, which is that this is not just a software problem: it is first and foremost a data problem to be solved. One does not need a group of programmers to figure this out: one needs people who know and understand the data that is produced and consumed by the organization’s many products.

Agile 2 proposes three dimensions along which data is important:

  1. For business intelligence and machine learning.
  2. That an organization’s data needs to be understood before development teams can even begin to implement features.
  3. As a means of validating features.

The example we have described pertains to number 1: the organization’s many products produce data, and that data is potentially very valuable for determining outcomes.

That goes way beyond simple A/B testing and enables us to truly test business hypotheses. The ability to test hypotheses is central to doing market-facing experiments.

Yet while the Agile community speaks often about experiments and hypotheses, it is silent on data – a contradiction and a paradox.

Number 2 above means that teams should not just start coding without knowledge of the organization’s data and service schemas: yet that is often what is done.

Number 3 means that teams need to have production-like test data to be able to validate what they code, but that is also seldom the case.

Data is critical for success, and so it needs to be central to any Agile approach. How can one be agile if one does not ensure that one has covered the data landscape?

Having mapped the mountains but ignored the canyons, one might motor along at high speed, feeling entirely agile and confident, only to fall into an abyss.

By Cliff Berg (LinkedIn) and Jason Hall (LinkedIn).

Cliff Berg and Jason Hall
Related Thought Leaders
Related sized article featured image

Iain Cameron
Related sized article featured image

Swarm theory offers project managers invaluable insights into improving communication, building trust, and defining purpose.

David Jones