Currently, Rosalind is not a suitable environment for highly sensitive data. This includes any NHS patient-identifiable data or de-identified patient data.
The Rosalind Information Governance and Data Protection policies and procedures are currently being developed in conjunction with KCL IT in order to ensure best practise and to determine appropriate protocols and technical solutions to enable analysis of sensitive clinical data on the system.
The Rosalind system comprises a bare-metal HPC cluster, an OpenStack Cloud and a Ceph data archive. It is run as a TrAC Research Facility to provide sustainably-funded research IT facilities for users across KCL and their collaborators. The system is supported by both central IT services and by subject-specialist administrators. The system is hosted, under a KCL IT contract, in the JISC Shared Data Centre (https://www.jisc.ac.uk/shared-data-centre). The system is firewalled from the rest of the KCL network and provides an OpenVPN service allowing registered users to access the environment securely.
Users must comply with KCL IT policy, including the following:
Data Governance: http://www.kcl.ac.uk/governancezone/Assets/InformationPolicies/Data%20Governance%20Policy%20v1.pdf Information Security: http://www.kcl.ac.uk/governancezone/Assets/InformationPolicies/Information%20Security%20Policy.pdf IT Acceptable Use: http://www.kcl.ac.uk/governancezone/Assets/InformationPolicies/IT%20Acceptable%20Use%20Policy.pdf
Data access on the HPC system is controlled using linux user and group permissions and changes to access must be approved by the PI who owns the data. Similarly, access to data in the Ceph archive is controlled by user permissions, with PI approval. The cloud provides a flexible environment in which data can potentially be made publicly available, or completely isolated in its own virtual network accessible only by approved users.
The KCL Research Data Management policy applies: http://www.kcl.ac.uk/governancezone/Assets/Research/Research%20Data%20Management%20Policy.pdf In general, research data will be made publicly available upon publication, unless data governance restrictions apply. Data should be published to a suitable public data repository, where one exists. If no suitable repository exists, KCL IT can provide hosting of datasets (for large datasets, this may be on Rosalind’s Ceph storage) and will curate data in a Research Data Management System which is currently under development (http://www.kcl.ac.uk/library/researchsupport/research-data-management/RDMProject.aspx), with an initial release due in April.