Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
en:services:application_services:high_performance_computing:data_sharing [2021/10/05 14:44] – [Data sharing] akhuziyen:services:application_services:high_performance_computing:data_sharing [2021/10/05 15:04] (current) – added S3 akhuziy
Line 1: Line 1:
 +====== Data sharing ======
  
 +This documentation describes how multiple users can share data within SCC. 
 +
 +There are currently 4 options we offer, described below. 
 +
 +===== Using a hidden directory =====
 +
 +In this case the data will be available for reading by all users or by a specific POSIX group **only if they know the path to it**. You should send the path to users whom you want to give the access. In order to make it safer, better to share data via ''/scratch'' filesystem. 
 +
 +First you need to create a directory with a random name
 +<code bash>
 +SHAREDIR=$(mktemp -p /scratch/users/$USER -d share.XXXXXXXX)
 +</code>
 +This will create a directory with a random name in ''/scratch/users/$USER'' and save the path in the variable ''SHAREDIR''.
 +Now you can copy or move the files you want to share in that directory: 
 +<code bash>
 +cp /PATH/TO/MY/FILES $SHAREDIR/. # if you want to copy
 +mv /PATH/TO/MY/FILES $SHAREDIR/. # if you want to move
 +</code>
 +Now you need to set permissions to the directories: 
 +<code bash>
 +chmod go=x /scratch/users/$USER # will set an execute permission on the parent directory
 +chmod -R go+rX $SHAREDIR # will make the files readable for other users and the group
 +</code>
 +After that send the path to users you want to share data with. To print the path of the shared directory run:
 +<code bash>
 +echo $SHAREDIR
 +</code>
 +The users who know the path can ''cd'' into it and copy the files contained in the directory. 
 +
 +After sharing is done, don't forget to unset execute permission of the parent directory to restrict the access to files again. 
 +<code bash>
 +chmod go= /scratch/users/$USER # will unset all permissions on the parent directory
 +</code>
 +
 +===== Project directory in scratch filesystem =====
 +
 +In this case we will create a shared project directory in scratch filesystem (''/scratch/projects/PROJECTNAME'') and give the full access to it for users of a POSIX group of your choice. This option is good for collaboration where huge amount of data is involved. Be aware that **the scratch filesystem doesn't have a backup**. If you want a backup, you can consider the [[en:services:application_services:high_performance_computing:data_sharing#functional_account|next option]]. 
 +
 +==== Applying for a group ====
 +In order to share the directory with a group of users you need to apply for a POSIX group by [[en:services:application_services:high_performance_computing:getting_help|contacting us]]. Please give us a unique group name and usernames of people you want to be in the group. 
 +
 +===== Functional account =====
 +
 +In this case you will have a ''HOME'' directory of the functional account as a shared space for collaboration like in the previous option. It comes with a backup and archiving possibilities, however, the IO performance for large computations will be affected by slow ''HOME'' filesystem, if you will need to process a large amount of data, please consider the [[en:services:application_services:high_performance_computing:data_sharing#project_directory_in_scratch_filesystem|previous option]] or both options simultaneously, by using a functional account for storing data and the scratch filesystem for processing.
 +
 +First you need to apply for the [[https://lotus1.gwdg.de/gwdgdb/benutzer_input.nsf/Funktionsaccount?OpenForm|functional account]], which will take some days, since it should be approved by the head of your institute. Then you need to apply for the POSIX group as described in the [[en:services:application_services:high_performance_computing:data_sharing#applying_for_a_group|previous section]]. When you have the access to you functional account, [[en:services:application_services:high_performance_computing:getting_help|contact us]], so we can add your functional account to the POSIX group. 
 +After the POSIX group is ready, change the permissions of the ''HOME'' directory on the login node (login.gwdg.de): 
 +<code bash>
 +ssh functionalusername@login.gwdg.de 
 +chmod g+rwxs . # everyone in the group will have all access rights to the directory
 +chgrp YOURGROUP . # change the group of the Home directory
 +</code> 
 +If you don't want members of the group to be able to delete or rename the files that don't belong to them, you can add a sticky bit to it with
 +<code bash>
 +chmod g+t .
 +</code>
 +
 +===== Using S3 =====
 +
 +In this case you first need to get an S3-Bucket from us. In order to get an S3-Bucket you can simply write a Mail to <support@gwdg.de> and ask for one, which is accessible from the HPC system. You can then share your ''secret key'' and ''private key'' within your group to give everyone access. In this scenario, access to your data is done via http and it is reachable not only from the HPC system, but also from the Cloud and Internet (if needed).
 +
 +You can access your S3-Bucket from a compute node using ''http://172.19.1.26:8090'' as an endpoint. 
 +
 +In order to work with your S3-Bucket, you could for instance use ''rclone'':
 +
 +<code bash>
 +module load rclone
 +# List content of your Bucket
 +rclone ls <config-name>:<bucket-name>/<prefix>
 +# Or Snyc the Content of your $HOME with the Bucket
 +rclone sync -i $HOME/some/folder <config-name>:<bucket-name>/<prefix> 
 +# Or Snyc the Content of your Bucket with your $HOME
 +rclone sync -i <config-name>:<bucket-name>/<prefix> $HOME/some/folder
 +</code>
 +
 +This requires a config file in ''/usr/users/$USER/.config/rclone/rclone.conf'' with the following content:
 +<code ini>
 +[<config-name>]
 +type = s3
 +provider = Ceph
 +env_auth = false
 +access_key_id = <AccessKey>
 +secret_access_key = <SecretKey>
 +region =
 +endpoint = http://172.19.1.26:8090
 +location_constraint =
 +acl =
 +server_side_encryption =
 +storage_class =
 +</code>