Project logistics
- Mentor: Patrick Cable email: cable-at-ll.mit.edu
- Min-max team size: 2-4
- Expected project hours per week (per team member): 6-8
- Will the project be open source? Yes
Preferred past experience
- Linux (Very important)
- Specifically, working with the Linux kernel and modules (Rather important)
Project Overview
Data provenance is a record of the history of data traversing or being used by a system or network of systems. Such history information can be used to assure correctness and security of data, and also to help understand and protect the system's operations and information.
Until recently, the Linux Provenance Module (LPM) only supported kernel 2.6.32, which didn't have the primitives necessary to run containers/Docker. We've worked on porting the Linux Provenance Modules to a recent kernel (4.2), but we haven't investigated what provenance looks like when it comes to containers.
Our open questions include:
- How do namespaces affect how provenance is collected?
- What should be added to the kernel module to provide more insight into what is happening on the system?
- What would an architecture to store docker image provenance data so that a user could understand the provenance of data store in a particular image?
- Could it be tied in with a project such as imagelayers to provide a visual display of provenance?
Some Technologies you will learn/use:
- Linux Provenance Module (paper)
- Docker
- Notary
- imagelayers API and service