Project logistics
- Mentor: Leonid Andreev, Gustavo Durand, Pete Meyer, and Ata Turk email: leonid-at-harvard-dot-edu, gdurand-at-iq-dot-harvard-dot-edu, meyer-at-hms-dot-harvard-dot-edu
- Min-max team size: 3-5
- Expected project hours per week (per team member): 6-8
- Will the project be open source? Yes
Preferred past experience
- Java (Rather important)
- Web development (Rather important)
- Python (Rather important)
- Globus (Good to have)
Project Overview:
Dataverse is an open source research data repository software that provides dataset owners incentives to share their datasets and get credit through data citation.
Cloud Dataverse project aims to extend Dataverse such that (i) datasets are stored in Cloud Object Storage Systems such as OpenStack Swift and (ii) stored datasets are made available to BigData clusters (Hadoop/Spark ...) spinned up in the cloud (e.g. via OpenStack Sahara).
Overall goal is to build a system that functions similar to the combination of Amazon public datasets and Amazon Web Services Elastic MapReduce.
Some Technologies you will learn/use:
- BigData technologies: OpenStack Sahara
- Dataverse: Data sharing, data publishing, and data archiving technologies
- Cloud-based object storage technologies: OpenStack Swift
- Java EE