This year was a big one for data with the rise of observability and data contract. As I look forward, here are some bold predictions I have for 2023:
1. Data observability goes beyond alerting and gets into self-healing. It’s not enough to inform teams when there are issues, but we’ll see technologies arise that will automatically resolve common data issues to lower the overall cost of maintenance for data pipelines.
2. dbt will emerge as the leader of the metrics layer. With such a strong and open user base, dbt looks poised to win the lion’s share of metric layer adoption as long as they can continue adding features and making their feature-set accessible
3. Code-driven BI will take off. Pioneers like lightdash are onto something. Full stack analytics engineers will leverage code-driven BI to build and maintain data pipelines that provides full control and transparency across the transformation and BI layers. These can be governed with source control to drive overall data reliability
4. Snowflake will acquire an ML company. As the Snowflake vs DataBricks battle continues, we’ll see Snowflake more formally get into Machine Learning to better compete. This will round out Snowflake’s portfolio and expand their ecosystem’s footprint
5. Offline and online data lines will continue to blur. Traditionally, offline data warehouses were separated from the online production data world. However, as things like feature stores, Snowflake’s Unistore, and marketing orchestration off the data warehouse take off, we’re going to see more user-facing and online use cases build direct dependencies directly off the offline data warehouse.
6. The modern data warehouse will be the next CDP. Customer data platforms will standardize on data warehouses as tools like Fivetran make ingestion and consolidation of data easy, dbt creates more flexible methods of data aggregation and cleansing, and tools like Hightouch make is easy to connect data from data warehouses to various marketing tools
7. We’ll begin seeing consolidation in the data market. 2018-2022 was a time where we saw a proliferation of new tooling for the data market, notably with data ingestion/transformation tools and catalog/observability tools. With a looming recession and natural maturity of the data profession, we’ll begin seeing consolidation of vendors and competencies across the landscape
Agree or disagree with anything? Let me know in the comments below!