Data Provenance and Containers

How do containers play with current provenance collection mechanisms?

Project logistics

Preferred past experience

Project Overview

Data provenance is a record of the history of data traversing or being used by a system or network of systems. Such history information can be used to assure correctness and security of data, and also to help understand and protect the system's operations and information.

Until recently, the Linux Provenance Module (LPM) only supported kernel 2.6.32, which didn't have the primitives necessary to run containers/Docker. We've worked on porting the Linux Provenance Modules to a recent kernel (4.2), but we haven't investigated what provenance looks like when it comes to containers.

Our open questions include:

Some Technologies you will learn/use: