Below you will find pages that utilize the taxonomy term “Tech”
Harnessing the Power of Rucio for the DaFab Project: A Leap Towards Advanced Metadata Management
Introduction
In the realm of both scientific research and production environments, efficiently managing and utilizing metadata is crucial. Metadata serves as the backbone for data discovery, organization, and retrieval, enabling effective data usage across various fields. This is particularly important in areas like Earth Observation (EO), where vast amounts of satellite data need to be processed and analysed to monitor and understand our planet.
The DaFab project, an ambitious initiative, aims to enhance the exploitation of Copernicus data through advanced AI and High-Performance Computing (HPC) technologies. By integrating these technologies, DaFab seeks to improve the timeliness, accuracy, and accessibility of EO data. At the heart of this endeavour lies Rucio, a robust data management system developed by CERN. Rucio’s role is pivotal in achieving key objectives of the project such as creating a unified, searchable catalogue of interlinked EO metadata, improving metadata ingestion and retrieval speeds, and facilitating seamless integration with AI-driven workflows and HPC systems.
Orchestration of Workflows in converged Cloud and HPC environments
Multi-Site Workflow Orchestration in the DAFAB Project
DAFAB will design and implement a workflow orchestration system that enhances multi-site application deployment and data discovery. The workflow system will enable applications to express their computations as a graph and declare the data needed at each step in a high-level query language. Workflows will then execute across multiple sites, whether cloud-based (Kubernetes) or high-performance computing (Slurm) environments. By enabling transparent data access and seamless execution of workflow stages, this system shifts the burden of application deployment and data discovery to the platform itself, significantly accelerating development timelines.