Despite all of the tools and methodologies that have arisen in the last few years, many companies, particularly those that have been in the market for decades, struggle when it comes to leveraging their operational data to build new digital products and services. According to research and surveys conducted by McKinsey over the last few years, the success rate of digital transformations is consistently low, with less than 30% succeeding at improving their company’s performance. There are a lot of reasons for this, but most of them can be summarized in a sentence: A digital transformation is primarily an organizational and cultural change and then a technological shift.
The question is not if digital transformation is a good thing nor is it if moving to the cloud is a good choice. Companies need (badly, in some cases) a digital transformation and yes, the pros of moving to the cloud usually overcome the cons.
So, let’s try to dig deeper and analyze three of the main problems companies face when they go on this journey
Digital products development
Products by nature are customer-driven but companies run their businesses on multiple back-end systems that are instead purpose-driven. Unless you run a very small business, different people with different objectives have ownership of such products and systems. Given this context, what happens when a company wants to launch a new digital product at speed? The back-end systems (CRMs, E-commerce, ERP, etc.) hold the data they need to bring to the customer. Some systems are SaaS, some are legacy, and perhaps others are custom applications created by the company that disrupted the market with innovative solutions back in the days, the perfect recipe for integration hell.
The product manager needs to coordinate and negotiate multiple change requests with the system’s owners whilst trying to convince them to add their needs in the backlog to meet the deadline. And things get even worse, as the new product relies on the computational power of the source systems, and if those systems cannot handle the additional traffic, both the product and the core services will be affected.
Third-party integration
“Everybody wants the change, (almost) nobody wants to change.” In this ever-growing digital world, partnering with third parties (whether they are clients or service providers) is crucial, but everyone who has tried to do so knows how challenging this is: non-standard interfaces, CSV files over FTP with fancy update rules, security issues… The list of unwanted things can grow indefinitely.
SaaS everywhere
The Software-as-a-Service model is extremely popular and getting the service you want without worrying about the underlying infrastructure gives freedom and speed of adoption, but what happens when a big company relies on multiple SaaS products to run their business? Sooner or later, they experience loss of control and higher costs in keeping a consistent view of the big picture. They need to deal with SaaS internal representations of their own data, multiple views of the same domain concept, unplanned expenses to export, and interpret and integrate the data from different sources with different formats.
Putting it all together
All the issues above fall into a well-known category of information technology. They are integration problems, and over the years, a lot of vendors promised a definitive solution. Now, you can consider low-code/no-code platforms with hundreds of ready-made connectors and modern graphical interfaces. Problem solved, right? Well, not really. Low-code integration platforms simplify implementation. They are really good at it, but doing so oversimplifies the real challenge: creating and maintaining a consistent set of APIs shaped around the business value over time, and preventing the interfaces from leaking internal complexities to the rest of the company, something that has to be defined and maintained through architectural choices and proper skills (completely hidden behind the selling points of such platforms).
There are two different ways to solve integration problems:
-
Centralized using adapters. In this case, the logic is pushed to the central orchestration component, with integration managed through a set of adapters. This is the rather old school SOA approach, the one that the majority of market integration platforms are built on.
-
Decentralized, pushing the logic to the edges, giving autonomous teams the freedom to define both the boundaries and the APIs that a domain must expose to deliver business value. This is a more modern approach that has arisen recently alongside the rise of microservices and, in the analytical world, with the concept of data mesh.
The former gives speed at the starting point and the illusion of reducing the number of choices and skills to manage the problems, but in the long run, inevitably, this begins to accumulate technical debt. Due to the lack of necessary degrees of freedom, you lose the ability to evolve the integration points over time, the same thing that caused the transition from SOA to microservices architectures. The latter needs the relevant skills, vision, and ability to execute but gives immediate results and allows you to flexibly manage the evolution of the enterprise architecture over time.
Old problems, new solutions
At Sourcesense in the last 20 years, we have partnered on hundreds of projects to bring agility, speed, and new open-source technology to our customers. Many times through the years, we were faced with the integration challenges above, and yes, we tried to solve them with the technology available at the time, so we have built some integration solutions on SOA (when they were the best of breed) and interacted with many of the integration platforms on the market. Then, we struggled with the issues and limitations of the integration landscape and have listened to our customers’ needs and where expectations have fallen short. The rise of agile methodologies, cloud computing, new techniques, technologies, and architectural styles has given an unprecedented boost to software evolution and the ability to support business needs, so we embraced the new wave and now have growing experience in solving problems with these tools.
Along the way, we’ve seen a recurring pattern when we encountered integration problems, the effectiveness of data hubs as components of the enterprise architectures to solve these challenges, so we built one of our own: Joyce.
Data hubs
This is a relatively new term and refers to software platforms that collect data from different sources with the main purpose of distribution and sharing. Since this definition is broad and vague, let’s add some other key elements that matter and help define the contours of our implementation. Collecting data from different sources can bring three major benefits:
-
Computational decoupling from the sources. Pulling (or pushing) the data out of the originating systems means that client applications and services interact with the hub and not directly with the sources, preventing them from being slowed down by additional traffic.
-
Catalog and discoverability. If data is collected correctly, this leads to the creation of a catalog, allowing people inside the organization to search, discover, and use the data inside the hub.
-
Security. The main purpose of the hubs is distribution and sharing. This leads immediately to focus on access control and security hardening. A single access point simplifies the overall security around the data because it significantly reduces the number of systems the clients have to interact with to gather the data they need.
Joyce, how it works
The cornerstone concept of Joyce is the schema. It allows you to shape the ingested data and how this data will be made available to client services. Using the same declarative approach made popular by Kubernetes, the schemas describe the expected result and the platform performs the actions to make it happen.
Schemas are standard JSON schema files stored and classified in a catalog. Their definition falls into three categories:
-
Input – how to gather and shape the source data. We leverage the Kafka Connect framework to provide ready-made connectors for a wide variety of sources. The ingested data can be filtered, formatted, and enriched with transformation handlers (domain-specific extensions of JSON schema).
-
Model – allows you to create new aggregates from the data stored in the platform. This feature gives the freedom to model the data the way needed by client services.
-
Export – bulk data export capability. Exported data can be any query run against the existing data with an optional temporal filter.
Input and model data is made available to all the client services with the proper authorization grants through auto-generated REST and GraphQL APIs. It is also possible to subscribe to a dedicated topic if an event-driven approach is more suitable for the use-case.
MongoDB: the key for a flexible model and performance at scale
We heavily rely on MongoDB. Thanks to its flexibility, we can easily map any data structure the user defines to collect the data. Half of the schema definition is basically the definition of a MongoDB schema. (We also auto-generate one schema per collection to guarantee data integrity.)
Joyce runs in a Kubernetes cluster and all its services are inherently stateless to exploit the full potential of horizontal scaling. The architecture is based on the CQRS pattern. This means that writes and reads are completely decoupled and can scale independently to meet the unique needs of the production environment. MongoDB is also the backing database of the API layer so we can keep the promise of low latency, high throughput, and continuous availability along all the components of the stack.
The platform is available as a fully managed PaaS on the three major cloud providers (AWS, Azure, GCP) but if needed, it can be installed on an existing infrastructure (in cloud and on prem).
Final considerations
There are many challenges leaders must face for a successful digital transformation. They need to guide their organizations along a process that involves changes on many levels. The exponential growth of technological solutions in the last few years adds more complexity and confusion. The evolution of organizational models and methodologies point in the direction of shared responsibility, people empowerment, and autonomous teams with a light and effective central governance. The same evolution also permeates the novel approaches to enterprise architectures like the data mesh.
Unfortunately, there’s no silver bullet, just the right choices for the given context. Despite all the marketing and hype around this or that one solution to all of your digital transformation needs, a long term successful shift needs guidance, competence and empowerment. We’ve built Joyce with the aim of reducing the burden of repetitive tasks and boilerplate code to get the results faster and catch the low hanging fruits without trying to replace the necessary architectural thinking to properly define the current state and the evolution of the enterprise architectures of our customers. If you’re struggling with the problems enlisted at the beginning of this article you should give Joyce a try.