Incorporate data into your next product
Connecting your analytical data platform to user-facing products
In the near future, more and more products will begin integrating data directly into the end user experience. We’ll see this commonly manifest through personalization systems, applied machine learning, or displaying aggregated data directly to end users. As a result, product and engineering teams will need to incorporate the nuances of data into the way they build products.
When should I consider “data in my product”?
Almost every product is a data product because it retrieves and shows data to the end user. So, when do we start thinking of a product as one that incorporates data? Products that more deeply incorporate data are ones that need to use aggregated or computationally complex data as part of the product delivery, either as a value directly shown to the user or to help decide what to show to the user.
Easy examples of this are products that need to call an ML model’s output (recommenders, ranking algorithms, decisioning systems, complex pricing systems, fraud detection, etc). Other examples include needing to join and aggregate data from multiple sources (ex: show the user’s total spend across multiple products or sources) or perform complex calculations (ex: calculate user percentile across a large audience) in order to meet the end user’s needs .
In these cases, while the product can likely deliver some level of value with its current setup, leveraging a robust data infrastructure and service can greatly improve the product’s quality.
Products that more deeply incorporate data are ones that need to use aggregated or computationally complex data as part of the product delivery, either as a value directly shown to the user or to help decide what to show to the user.
What should I consider?
Adding data to a product’s core experience introduces several new variables to consider. In addition to thinking through the product’s backend microservices and interactions with its production database(s), teams will also need to think through what generates data that wouldn’t come from the production database, the analytical data platform, and upstream data sources.
We know that the technology behind products can get quite complex. For the purposes of this post, we will keep things simple with just a set of microservices and a single production database powering the product.
Taking data into account requires thinking about 3 additional components:
A data “service”: this effectively a microservice that sits on top of an analytical data platform. It ensures that your product can access the data it needs with high availability and low latency (note: this is not always a given, especially with data warehouses) and in the proper formats.
A data service can be called just like a microservice to provide real-time support directly to the product and end user. In some cases, however, the service can also batch compute and write data to the production database instead. This second method is useful for computationally expensive queries that generate values that can be broadly used across the product (ex: infer the user’s next product preference once an hour and store the value in your DB, then call it across one or many microservices for a variety of use cases).Analytical data platform: this is a larger data infrastructure that can unlock all sorts of data value if built and leveraged correctly. This set of systems is where most of the heavy data computing occurs, whether that’s stream processing, OLAP needs, or your machine learning operations.
Well built pipelines will process data across the analytical data platform to create data assets that can be used for product features. For example, you may create and maintain a custom audience of users with certain preferences based on their historic purchases. Aggregating data across millions of users and their purchase history is a great use case of a data warehouse; connecting the output of that calculation to an audience to target for personalized recommendations is a powerful way to connect your data infrastructure to the end user experience.
Teams are investing in their analytics data platforms across use cases such as data ingestion, data contracts/quality management, reverse ETL, and others. Well architected data systems with governed, reliable, and useful datasets can be leveraged beyond BI/analytics and activated to improve your product’s impact.Well architected data systems with governed, reliable, and useful datasets can be leveraged beyond BI/analytics and activated to improve your product’s impact.
Data sources: when building data into a product, it’s imperative to understand the data’s lineage all the way back to its source. Does your data come from other products’ databases and then get joined in the analytical data platform? Does it come from a 3rd party vendor? How reliably does the data get ingested and what failsafes need to get put in place if there’s an issue with the data from its source? Be sure to only build product features off datasets with reliable sources and set contingency plans that don’t disrupt the user’s experience should one of your upstream data sources run into issues.
There are many product and engineering decisions that will affect your end users based on the state of your data services and infrastructure. With a map and lineage of data all the way back to source, the best way to manage these considerations is to run several use cases through the data systems. Your use cases will help you understand things like where you need real-time data, where it’s okay to leverage stale(ish) data (hours, not days), where you should batch versus stream data, the impact (and mitigating actions) of data that’s unavailable or poor quality, and more.
Run several use cases from your end user all the way through the data systems.
Your use cases will help you understand things like where you need real-time data, where it’s okay to leverage stale(ish) data (hours, not days), where you should batch versus stream data, the impact (and mitigating actions) of data that’s unavailable or poor quality, and more.
What other nuances come up over time?
As your product evolves over time, so do your data systems that support it. Basic monitoring, alerting and data quality checks will help ensure that your team identifies and resolves incidents in a timely manner, but there are also a few data-specific nuances to keep in mind.
Managing Bias: data, when used without human intelligence, can often return biased results. Without frequent analyses on the data outputs that are used by your product, you may be unintentionally using data in ways that harm or discriminate against your users.
Logic changes along the data pipeline: it’s very likely that business logic is applied somewhere along your data pipeline and that logic will change over time. An example might be event taxonomies that change classification over time. If historical data matters (showing/leveraging trends, model training, etc), then logic changes may also trigger the need for restating data through the past so that your end users don’t have a sudden change in their experience.
Building data into your product is a powerful way to drive more value for your users. However, it also introduces the need to take data services, infrastructure and lineage, and other long-term nuances into consideration so that your product reliably delivers the promised value to your users. Taking these extra dynamics into consideration will help you build more robust and impactful products while also capitalizing on your organization’s data investments.
Great post! We've been rapidly incorporating data into our products and there is a lot to think through, especially when certain source data fields are leveraged in the analytics transformation but also change frequently leaving us to consider how to almost immediately show changes in the product while also having enough time to refresh analytics pipeline, all while trying to avoid building out a real-time infrastructure
Real good conclusion. Getting data-based value onto a product is a great way to improve customer-engagement dimension. And leveraging best of Data infrastructure to work for the enterprise.