Harnessing the Power of Rucio for the DaFab Project: A Leap Towards Advanced Metadata Management
Introduction
In the realm of both scientific research and production environments, efficiently managing and utilizing metadata is crucial. Metadata serves as the backbone for data discovery, organization, and retrieval, enabling effective data usage across various fields. This is particularly important in areas like Earth Observation (EO), where vast amounts of satellite data need to be processed and analysed to monitor and understand our planet.
The DaFab project, an ambitious initiative, aims to enhance the exploitation of Copernicus data through advanced AI and High-Performance Computing (HPC) technologies. By integrating these technologies, DaFab seeks to improve the timeliness, accuracy, and accessibility of EO data. At the heart of this endeavour lies Rucio, a robust data management system developed by CERN. Rucio’s role is pivotal in achieving key objectives of the project such as creating a unified, searchable catalogue of interlinked EO metadata, improving metadata ingestion and retrieval speeds, and facilitating seamless integration with AI-driven workflows and HPC systems.
Orchestration of Workflows in converged Cloud and HPC environments
Multi-Site Workflow Orchestration in the DAFAB Project
DAFAB will design and implement a workflow orchestration system that enhances multi-site application deployment and data discovery. The workflow system will enable applications to express their computations as a graph and declare the data needed at each step in a high-level query language. Workflows will then execute across multiple sites, whether cloud-based (Kubernetes) or high-performance computing (Slurm) environments. By enabling transparent data access and seamless execution of workflow stages, this system shifts the burden of application deployment and data discovery to the platform itself, significantly accelerating development timelines.
Job offer at CERN
CERN is offering a position in the Rucio development team at CERN. Rucio is an open-source scientific data management system responsible to manage the data of some of the biggest scientific data producers in the world. Experiments such as ATLAS, CMS, Belle II, DUNE, and many others rely on Rucio, which manages world-wide distributed data in the exa-byte range.
More information at: https://jobs.smartrecruiters.com/CERN/743999972065373-software-engineer-in-distributed-systems-ep-adp-co-2024-34-grap
Job offer at JSI
JSI is looking for a talented and motivated Software Engineer is sought to join the team and contribute to cutting-edge projects in the field of particle physics and Earth observation data processing. The successful candidate will play a crucial role in developing and optimizing neural network architectures, such as Spike Neural Networks (SNNs), for non Von-Neumann computing platforms, as outlined in our recent project, while also contributing to other aspects of our research and development efforts.
DaFab Project Begins
Data economy is an ever-increasing market. Thanks to the Copernicus constellation Europe has amassed a considerable amount of Earth Observation data already used by technology-intensive SMEs and large industry in several sectors of the economy. However, the management of data at scale remains challenging, and lowering the entry cost to exploitation of Copernicus data is a key element to foster innovation and fuel this data economy. #DaFab is a new project, launched with the support of the European Space Agency: https://www.euspa.europa.eu/dafab-ai-factory-copernicus-data-scale