Loading
Modern data-driven applications leverage large, heterogeneous data collections to find interesting patterns, and build robust machine learning (ML) models for accurate predictions. Large data sizes and advanced analytics spurred the development and adoption of data-parallel computation frameworks like Apache Spark or Flink as well as distributed ML systems like MLlib, TensorFlow, or PyTorch. A key observation is that these new systems share many techniques with traditional high-performance computing (HPC), and the architecture of underlying HW clusters converges. Yet, the programming paradigms, cluster resource management, as well as data formats and representations differ substantially across data management, HPC, and ML software stacks. There is a trend though, toward complex data analysis pipelines that combine these different systems. Examples are workflows of distributed data pre-processing, tuned HPC libraries, and dedicated ML systems, but also HPC applications that leverage ML models for more cost-eff
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://beta.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=corda__h2020::e501fe8b950a6b01e49ec2e39cea1718&type=result"></script>');
-->
</script>