The State of Data Management Platforms: Balancing Flexibility, Scale, and Openness

Across the public and private sectors, organisations are facing an unpredictable and major challenge, the volume, diversity and velocity of data is outpacing their ability to manage it. From open data portals and regulatory systems to AI training pipelines, Data Management Platforms (DMPs) have become core systems for digital transformation.

But the state of the market today exposes a growing divide between off-the-shelf vendor solutions that promise convenience and open, adaptable frameworks that offer long-term resilience.

The Reality of Off-the-Shelf Data Management Platforms

Many organisations start their data journeys with established, commercial tools using platforms like Snowflake, Databricks, Microsoft Purview, AWS DataZone, or Google Cloud Dataplex. These enterprise-grade systems offer immense power, high availability, built-in security and sophisticated analytics capabilities.

However, they also share three structural weaknesses:

Vendor Lock-In: Proprietary architectures make it hard to export data or migrate to alternative ecosystems without significant cost or re-engineering.
Rigid Data Models: The “one-size-fits-all” structure of these systems limits flexibility in metadata schemas, workflows or data governance policies.
Expensive Customisation: Tailoring functionality often requires licensed modules or vendor-specific scripting, introducing technical debt and long-term dependency.

For some organisations, particularly those with complex regulatory, transparency, or interoperability obligations, these constraints stifle agility just when it’s needed most.

The Volume Problem: More Data, Less Control

As AI, IoT and automation reshape business and government we know that data volumes are growing exponentially. Traditional data management systems, designed around centralised repositories and static metadata, struggle to cope with:

Unstructured and real-time data from sensors, logs and AI outputs.
Cross-domain sharing requirements between agencies, departments, vendors and the public.
Evolving compliance regimes such as GDPR, the EU AI Act and data residency rules.

The risk is clear: organisations end up with multiple overlapping systems, some proprietary, some bespoke and each optimised for a specific use case, but collectively unmanageable.

The Open-Source Alternative

In contrast, open-source data management frameworks like CKAN, DKAN, OpenDataSoft (community edition) and Apache Atlas have matured significantly over the past decade. They’re not “free replacements” for commercial systems they’re configurable foundations that can be shaped to meet organisational, sectoral and regulatory needs.

CKAN has become the de facto standard for open data portals worldwide, adopted by governments from the UK to Canada. Its modular architecture and API-first design make it ideal for metadata management, data publication and integration with analytics and AI pipelines.
DKAN offers similar open-data functionality with tighter CMS integration, making it useful for civic and policy organisations.
Apache Atlas focuses on metadata governance and data lineage and integrates seamlessly into big-data environments such as Hadoop or Databricks.
OpenMetadata and DataHub (LinkedIn’s open-source platform) are emerging leaders in the data discovery and governance space, designed for modern cloud-native infrastructures.

These platforms thrive on extensibility and allow organisations to define their own metadata standards, workflows and interfaces, without being constrained by vendor roadmaps.

Configurable, Not Custom: A Smarter Future

The future of data management isn’t about choosing open source over commercial systems; it’s about configurability and interoperability. As data strategies evolve to support AI driven decision making, real-time analytics, and citizen transparency, the winning architectures will be those that can adapt.

A data management ecosystem should:

Integrate open-source and commercial tools through open APIs.
Be configurable enough to evolve with new governance, privacy and AI requirements.
Avoid vendor lock-in by adhering to open standards and modular architectures.
Support automation, AI integration, and ethical data use at scale.

In this context, open platforms like CKAN and Apache Atlas are not replacements for enterprise tools, they’re complements, providing the flexibility and transparency that proprietary systems often lack.

Looking Ahead: AI Will Stress-Test Every Platform

The rise of AI adds a new dimension. Machine learning models require traceable, high-quality, well-governed data, but they also generate new data types, metadata and audit trails. Managing these dynamically will expose the rigidity of closed systems.

Data platforms will need to support explainability, lineage tracking and real-time governance, ensuring that the data fuelling AI is trustworthy and compliant. Open, configurable frameworks are far better positioned to evolve toward this future than static, vendor-controlled environments.

Our Perspective

At Ember, we’ve seen both sides of the data management equation. Proprietary platforms can offer stability and enterprise integration, but open frameworks deliver flexibility, transparency and long-term sustainability.

Our approach is to blend both worlds, combining open standards like CKAN and Apache Atlas with secure, enterprise-grade hosting and integration models, to help organisations build adaptive data ecosystems ready for the demands of AI, governance and innovation.

The message for digital leaders is simple:

In a world where data is never static, your platform shouldn’t be either.

‍