Cloud Dataverse @ MOC

A public dataset repository with compute capabilities

Project logistics

Preferred past experience

Project Overview:

Dataverse is an open source research data repository software that provides dataset owners incentives to share their datasets and get credit through data citation.

Cloud Dataverse project aims to extend Dataverse such that (i) datasets are stored in Cloud Object Storage Systems such as OpenStack Swift and (ii) stored datasets are made available to BigData clusters (Hadoop/Spark ...) spinned up in the cloud (e.g. via OpenStack Sahara).

Overall goal is to build a system that functions similar to the combination of Amazon public datasets and Amazon Web Services Elastic MapReduce.

Some Technologies you will learn/use: