- Mentor: Rudolph Pienaar email: rudolph.pienaar-at-childrens.harvard.edu
- Min-max team size: 4-7
- Expected project hours per week (per team member): 6-8
- Will the project be open source? Yes
- Job handling / concepts of distributed computing (Very important)
- Linux job basics (ssh to remote hosts, running/handling jobs) (Rather important)
- Python (Valuable)
- Docker / HPC scheduling (Nice to have, but will learn)
In many respects, practical medical data computation is still in its infancy. By practical is implied rapid and useful post-processing of information typically collected on imaging equipment (such as MRI, CT, X-ray, Ultrasound, etc) so as to supplement or inform clinical decision making.
There are many reasons for this — not least of which regulatory concerns governing how and where medical data can move, the generally conservative (from the perpsective of information technology) nature of clinical care, and the non-integrative (again, from a performance computing) paradigm of device manufacturers.
However, these concerns aside, there are also architectural concerns and end user experience issues that also play a defining role.
This project seeks to architect practical solutions to medical data processing. Our group at Boston Children's Hospital has developed a web-based workflow manager called ChRIS that allows for the collection, processing, and real-time collaboration on image data.
In Spring of 2015, a BU team designed and built a python-based scheduler for ChRIS on Massachussetts Open Cloud (MOC). This year, we seek to expand on this foundation. In the current ChRIS, "plugins" that process medical data need to be fully installed on the cluster on which they are to run. This presents several practical limitations on plugin authors.
For this year, we seek to address the very practical problem of using Dockers as self-contained mechanisms for isolating compute dependencies needed by plugins from the underlying cluster on which they execute.
So: the objective is to design a Docker wrapper for post-processing pipelines of unknown dependencies. Specific problems that need to be addressed is how the parent system (ChRIS) will communicate with these Dockers, transfer data to and from them, and importantly, schedule the actual processing on the MOC.
- Linux process management and distributed system design
- Docker usage and architecturing in a real-world practical system
- HPC, particularly as pertains to self-contained compute.