Skip to main content

A classification of application metrics

6 min read

When you're a product guy you love monitoring your metrics, don't you? How, otherwise, could you develop your product purposefully?

In the last weeks I was musing on a classification of application metrics for back-end data systems which do not provide interaction with (end) users / customers. I.e. classic and well known e-commerce metrics like average order value, conversion rate, etc. are absent. This is how I have come so far:

What is a metric?

My ultra short (Twitter compatible) and simple definition is:

A metric is a quantitative information which enables conclusion about quality.

Application metrics need a connection to business objectives to make an interpretation possible (positive? negative?). See Kaushik for a well defined drill down [1].
KPIs are a subset of metrics which - according to Kaushik - "helps you understand how you are doing against your objectives" [2]. Full ack.

Why metrics are important?

Metrics are essential for product managers to drive their products against the business objectives. Furthermore metrics are indispensable for efficient communication between the product division and the company management. Why? They are basic common sense and act like wormholes between these two worlds of mostly divergent thinking, acting, and talking.

The popular Lean Startup movement has even established the Build - Measure - Learn cycle as an axiomatic premise for building successful products.

Scope

The scope of this blog post comprises business data systems / services without direct interaction with (end) users / customers. Systems that collect or produce, transform, store, and provide e.g. product or customer data via APIs. True back-end.

Ensuring data integrity is a crucial requirement especially in distributed and / or microservice environments. Example: Receiving five new object events via the input stream must generate five new object events on the output stream regardless of complexity and distribution level of the process chain.

The metrics in question focus the application layer which is following (i.e. is above) the infrastructure layer. Therefore I call these metrics application metrics. The application layer is the layer where business logic resides. Datadog’s Alexis Lê-Quôc has also published a classification of metrics (which has inspired me) where he has identified "work metrics" [3] which look similar to what I refer to as application metrics.

Classification

The following classification of application metrics helps organizing your metrics as well as identifying your KPIs, and thus remembers you what kind of metrics is worth to have a look at.

Class: Integrity

Description: Covers metrics that indicate if processing the business objects along the process chain is successful or not. As integrity - especially in distributed and/or microservice environments - is crucial, metrics from this class should be casted to KPIs.

Example:
A process chain

  1. Import customer data from a message queue
  2. Transform customer data
  3. Save customer data in database
  4. Provide transformed customer data via RESTful API

Example values:
Calculate ratios

  • "No. of new customers imported per day" / "No. of new customers saved in database per day" = 1
  • ...
  • "No. of new customers saved in database per day" / "No. of new customers provided via API per day" = 1,25

The resulting ratios are metrics. A ratio = 1 indicates processing is successful; i.e. integrity is ensured. If ratios are != 1 errors must have occurred.

Of course, atomic "No. of new customers imported per day" is a metric, too. It is a quantitative information about the quality of efforts to generate new customers. However, the product owner of the business data system which processes, saves, and provides customer data to other internal systems has no influence on this metric. Therefore it cannot be one of his KPIs. It is a minor quantitative information (see next class).

-------------------------------------------------------------

Class: (Minor) Quantities

Description: Covers metrics of less interest the product owner cannot influence

Example:
A process chain

  1. Import customer data from a message queue
  2. Transform customer data
  3. Save customer data in database
  4. Provide transformed customer data via RESTful API

Example value: No. of unique customers in database

-------------------------------------------------------------

Class: Performance

Description: Self-explanatory. Performance metrics are fundamental indicators how an application is doing against its non-functional objectives. Therefore performance metrics should be casted to KPIs.

Example:
A process chain

  1. Import customer data from a message queue
  2. Transform customer data
  3. Save customer data in database
  4. Provide transformed customer data via RESTful API

Example value: 90th percentile end to end processing time of customer data (“a customer”) in seconds per day: 1.2

-------------------------------------------------------------
Class: Errors

Description: Self-explanatory. Error metrics could be casted to KPIs because errors indicate that an application is not meeting its objectives.

Example:
A process chain

  1. Import customer data from a message queue
  2. Transform customer data
  3. Save customer data in database
  4. Provide transformed customer data via RESTful API

Example value: No. of HTTP 500 per day: 2

-------------------------------------------------------------

Class: API analytics

Description: Covers metrics to analyse client server interaction

Example:
A process chain

  1. Import customer data from a message queue
  2. Transform customer data
  3. Save customer data in database
  4. Provide transformed customer data via RESTful API

Example value: No. of requests responded with HTTP 200 on resource /customers/city/Berlin

Actions

In the context of metrics some specific actions are relevant which are sometimes confused IMHO.

  • Logging: The act of producing raw data e.g. by Syslog to enable the generation of metrics (e.g. a count of specific log entries).
  • Monitoring: The act of watching metrics constantly; performed by a human or a computer (program).
  • Notifying: The act of notifying a specific person or a specific group of persons when a specific event in the context of metrics happens, e.g. the occurrence of an HTTP 500.
  • Analyzing: Best see https://en.wikipedia.org/wiki/Data_analysis

The support dimension

Some last words are dedicated to the support people doing a great job often 27/4. When collecting raw data for your metrics in class Error collect as much information as possible. "What good data looks like" in [3] is a great inspiration on this topic. Focus on these two objectives: A) identify the problem asap, b) fix the problem asap. And don’t forget to measure the total time an incident needs to be solved.

[1] http://www.kaushik.net/avinash/web-analytics-101-definitions-goals-metrics-kpis-dimensions-targets/

[2] http://www.kaushik.net/avinash/web-analytics-101-definitions-goals-metrics-kpis-dimensions-targets/#kpi

[3] https://www.datadoghq.com/blog/monitoring-101-collecting-data/

(Digital) product owners, beware of the iceberg!

2 min read

The mind map below was posted on Twitter months ago. However Twitter's limitation to 140 characters didn't allow further words about my motivation. This is the addendum.

Building digital web products from scratch the agile way including continuous delivery or even continuous deployment requires "operations thinking" right up from start. A common mistake (IMHO) I have seen several times is that many product owners almost exclusively care about user interaction - the visible part of the product - and most effort is spent on UX. This is an important aspect, of course. However, this is just the top of the iceberg. I guess nobody wants to ship a product which breaks on increasing traffic, which is not prepared for handling data loss or which misses a proven backup and recovery process. I.e. - from an operations perspective - I guess nobody wants to ship just a prototype while the company expects stable software up running. 

But isn't this the team's job? Well, let's sharpen the role of a product owner. In this case this is perfectly illustrated by the difference between responsabilty and accountability as defined in RACI. The team is responsible writing and shipping and maybe even running the software while the product owner is accountable for the whole product. Therefore a product owner has to care about performance, backup and recovery, monitoring, SLAs, security, etc., etc. A product owner is responsible (!) to create, refine and prioritize backlog items which cover the operations aspects of his product (the what). It is a native product owner task. It is not expected that a product owner has the know how to answer all these questions and presenting perfect user stories of operations epics passing the backlog refinements without any question. Providing the how is in the team's realm. Of course, any input from the team on the what is appreciated. But managing the discussion, asking the right questions, asking for yet another load test, etc. is in the product owner's task portfolio. He's got the active role here.

The following mind map helps me to remember all facets of a digital web product. It serves as a boilerplate for the collection of epics especially when I have to create the initial backlog. It is a life saver.

Mind map creating digital product

If you think something is missing please drop me a line. Any input is welcome.

The Coherent Business to SCS Model

3 min read

At first glance the self-contained system (SCS) approach [1] is a native software architecture topic. However, there is a tight correlation to the organisational resp. the business part of the story. Here's the hook: "Each SCS is owned by one team." [2]. Further: "The manageable domain specific scope enables the development, operation and maintenance of an SCS by a single team." [3]

What's the consequence? Let's change perspective and have a view from the business side. A plausible correlation between an SCS architecture and the business layer is constituted by a mapping from the core business processes to their enabling (or maybe just facilitating) software systems. The mapping is as follows:

  1. Identify a value creating business process P.
  2. Divide P into logical steps P1 - Px whereby "logical" means that each step represents disjoint, well-defined business logic.
  3. A step is casted to a business domain or - for short - domain.
  4. Finally, each domain is mapped to a self-contained system. Shared business objects such as customer or order are exchanged via RESTful HTTP or leightweight messaging as defined in the SCS approach.

The Coherent Business to SCS Model

The mapping of the business part to its technical counterpart is coherent. Therefore, this is called the "Coherent Business to SCS Model" (CBSM).

A product organization has product managers who manage their domain products powered by SCSs. There's a good chance that they do it the agile way having teams who develop, operate and maintain their SCSs.

On the business layer the domains are tied together by the company vision which enables the product managers deriving their product vision.

The core value of the CBSM is that it enables the domains to develop with maximum speed due to minimum dependencies on system level. To be precise, the only dependency is the API. If a domain does not change its API there will be (theoretically) no limit of development within this domain. This even allows replacement of the underlying SCS once it is at the end of its lifecycle. Equivalent leightweight communication on the business layer makes this a success story.

A congruent approach has been implemented at GALERIA Kaufhof (a member of Hudson's Bay Company) [4]. It further shows an evolution of the model: A domain may be powered by more than one SCS for technical reasons.

Well, I think this kind of interaction of business and technology is not really a new idea. Have a look at The Open Group's definition of a service in SOA which embraces some of the core ideas [5]. My argument is that the Coherent Business to SCS Model is a more leightweight approach (buzzword!). I just wonder if SCS is a consequence or the driver?

[1] http://scs-architecture.org
[2] http://scs-architecture.org/#one-team
[3] https://speakerdeck.com/rstrangh/self-contained-systems-1
[4] https://galeria-kaufhof.github.io/general/2015/12/15/architektur-und-organisation-im-galeria-de-produktmanagement/ (in german)
[5] https://en.wikipedia.org/wiki/Service-oriented_architecture

REST vs. Message Queue

2 min read

I was asked whether to use REST or a Message Queue to realize data replication between two systems. Well, this depends on your requirements, doesn't it?

So, Carsten Blüm and I started a collection of arguments and observations:

Implementing a trans-system data replication

REST (via data feed e.g. using Atom)

  • rather for asynchronous data replication, delay tolerant
  • data replication is triggered client-side; a client decides when to fetch data from a server
  • i.e. temp. inconsistencies are acceptable
  • client manages a resync on client-side data loss autonomously
  • client is responsible to handle errors
  • client needs a navigable data history
  • generally irrelevant who and how many clients are fetching (the same) data
  • data replication over HTTP, therefore benefit from HTTP features like caching
  • data replication does not need to be reliable from server's perspective, no acknowledgements for receivings needed, i.e. no need to ensure a trans-system transaction
  • temp. downtime of the server i.e. temp. unavailability of data is acceptable
  • no further technology than HTTP wanted

Message Queuing

  • rather for virtually synchronous data replication, delay intolerant (assuming clients'/consumers' uptime is 24/7 and they are reading constantly)
  • i.e. temp. inconsistencies are rather inacceptable
  • data replication is triggered server-side (assuming clients'/consumers' uptime is 24/7 and they are reading constantly)
  • server is responsible to resend data on client-/consumer-side data loss
  • clients/consumers don’t need a server-side data history
  • observation: the number of clients/consumers tends to be smaller and clients/consumers are probably known; a need for routing and/or filtering and/or even a time to live may exist
  • no data replication over HTTP (assuming there is no RESTful API to a message queue)
  • data replication needs to be reliable, acknowledgements for receivings needed
  • published data must be available even when the server is down
  • it's OK to extend the technology stack and have a Message Queue

Notice:

  • You can implement virtually synchronous data replication perfectly in REST (e.g. Atom feeds constantly polled) as well as asynchronous data replication using a message queue.
  • You can, of course, implement acknowledgements in REST. In [1] Atom is used for "reliable data distribution and consistent data replication".

[1] Firat Kart, L. E. Moser, P. M. Melliar-Smith: Reliable Data Distribution and Consistent Data Replication Using the Atom Syndication Technology. http://www-poleia.lip6.fr/~gancarsk/grbd08/kart07atom.pdf

"Let Technology Inspire You" Series

1 min read

Last monday we started our "Let Technology Inspire You" series at DI UNTERNEHMER. Offering a forum to data people and enabling vital discussions about "data" is part of our transformation towards a (digital) data company. First guest speaker was Tim Strehle explaining "How the Semantic Web can change Digital Asset Management" (slides in German).

My inspirations are as follows:

  1. "Using HTML as the Media Type for your API" including schema.org RDFa markup
  2. The power of Mediatypes to describe a system's domain in an SCS environment

Thoughts on #1:

  • Is it efficient to have have a hybrid resource representation for both humans and machines? Why not use content negotiation to provide a human and a machine readable representation? If a resource is data driven this separation should not result in much effort.?

Beside these inspirations it was interesting to see how the idea of Self-Contained Systems is spreading. (Tilkov, the Messiah.)

If you are interested in sharing your thoughts on data with us pls. drop me a line.