====== Data Lakes ====== **December 17th 2021, 14:00 - 17:00** In recent years, classic HPC users have seen an ever-increasing interest in the public cloud that is used as part of traditional HPC workflows. There are many reasons for this, e.g. special hardware components such as TPUs or special GPUs are available in the cloud earlier than in a local data center. In addition, there is a need for users to store any data for analysis using AI methods in different data silos and to be able to access them flexibly from HPC and cloud systems. {{ :en:services:application_services:high_performance_computing:workshops:pexels-james-wheeler-414612_1_.jpg?400|}} A central role for data analytics workflows is the flexible data migration and provision in the data lake. For this purpose, highly-scalable object storage has long been established in the cloud area, which is mostly used via an S3 interface. Another advantage from the user's point of view for a consistent data management strategy as offered by a data lake is the uniform and consistent view that it allows for the individual data silos. Data centers can share their plans and services regarding high-performance data analytics (HPDA) and big data analytics (BDA) and scientific data management. This workshop aims to foster interaction between data centers, administrators, and, most importantly, researchers. Our motto is: **Let's build a bridge to the data lake together ** The videos of the event are now online on YouTube. ===== Agenda ===== | 14:00 | **Welcome and Motivation** | Julian Kunkel (GWDG, Uni Göttingen) \\ Piotr Kasprzak (GWDG) | {{ :en:services:application_services:high_performance_computing:workshops:data_lakes-2021-12-16-introduction.pdf |Slides}} | [[https://www.youtube.com/watch?v=XJ2_xxehdHQ&list=PLvcoSsXFNRbnK2ac8ALWVieieipztsDEI&index=16|Video]] | | | **Short introductory round of all attendees** | | | | | 14:15 | **Use Case Presentation** | Mark Greiner (MPI CEC) | {{:en:services:application_services:high_performance_computing:workshops:datalakes-2021-12-16_datamanagementplatform_for_gwdg_vaid_greiner.pdf |Slides}} | [[https://www.youtube.com/watch?v=N4NXZ3JFazY&list=PLvcoSsXFNRbnK2ac8ALWVieieipztsDEI&index=17|Video]] | | 14:45 | **Data Volume Considerations for NHR and NFDI** | Andreas Knüpfer (ZIH, TU Dresden) | {{:en:services:application_services:high_performance_computing:workshops:datalake_workshop_2021-12_knuepfer_public.pdf |Slides}} | [[https://www.youtube.com/watch?v=JW-FP86t-b8&list=PLvcoSsXFNRbnK2ac8ALWVieieipztsDEI&index=18|Video]] | | 15:00 | **GWDG data lake services and future plans** | Julian Kunkel (GWDG, Uni Göttingen)\\ Piotr Kasprzak (GWDG)\\ Hendrik Nolte (GWDG) | | [[https://www.youtube.com/watch?v=E4Mra8mMuk0&list=PLvcoSsXFNRbnK2ac8ALWVieieipztsDEI&index=19|Video]] | | 15:30 | //Break and Networking// | | | | | 16:00 | **Data Lake not at any price - The DataLake concept must fit the requirements** | Alfred Schlaucher (Oracle) | {{:en:services:application_services:high_performance_computing:workshops:datalakes-2021-schlaucher.pdf |Slides}} | [[https://www.youtube.com/watch?v=FqVWdBibSho&list=PLvcoSsXFNRbnK2ac8ALWVieieipztsDEI&index=20|Video]] | | 16:30 | Discussion | | | | | 17:00 | **Adjorn** | | | | ===== Important Information ===== | Date and Time | Friday, December 17th 2021, 14:00 - 18:00 | | Venue | Virtual | | Organizers | Julian Kunkel (Uni Göttingen/GWDG), julian.kunkel@gwdg.de | | | Hendrik Nolte (GWDG), hendrik.nolte@gwdg.de | | | Alexander Goldmann (GWDG), alexander.goldmann@gwdg.de | ==== Registration ==== You can register for this workshop [[https://terminplaner4.dfn.de/928kSKZZj84AmL8h|here]]. If you would like to give a talk, please contact [[mailto:julian.kunkel@gwdg.de| Julian Kunkel]]. ==== Funding ==== This workshop is funded by the GWDG and supported by the [[https://www.nhr-gs.de/|NHR]]. {{:en:services:application_services:nhr-goettingen-single.png?nolink&400|}}