Big Data Containers (Red Hat)– enable OpenShift/Docker access to Cloud DataVerse (scientific data repository)

Agentless introspection of bare-metal systems (IBM) – security scanning of network-booted servers

Dataverse scaling (Harvard/Red Hat) – apply cloud scaling techniques (containers, etc.) to Dataverse

Radiology in the multi-cloud (Red Hat/Children's Hosp) – cross-datacenter connector for existing X-ray analysis system

Serverless supercomputing (Red Hat) - use on-demand functions for high performance computing

Sharing Research... (NU/startup) - creating containers for data science

Predictive Analytics... (HP Enterprise) - use Spark, etc. to analyze HP customer data

SecDevOps in the Cloud (startup) - automated security for cloud deployment systems

OpenStack in Kubernetes (OpenStack/Huawei) - creating links between openstack and kubernetes

Cloud networking and Continuous Integration (Cisco) - extending CI/CD tools to include network HW

Serverless Computing on the Edge (Cisco) - combining serverless (lambda) with edge computing

Content Distribution / Web Application Firewall (Akamai) - creating a prototype firewalled CDN

Access Orchestration (BU/OpenStack) - create a new OpenStack identity federation service

Drone Mission Analysis (startup) - cloud extraction of data from civilian drone images

Tracing - Investigating cross-layer problems (BU/Red Hat) - extend Jaeger and OSprofiler trace frameworks

End-to-end tracing in Ceph (Red Hat/BU/NU) - add always-on tracing to Ceph storage system

Trace "Granularity Mapping" (Red Hat/BU) - add variable-overhead tracing to Jaeger or OSprofiler

Tracing in Kubernetes (Red Hat/BU) - add instrumentation to Kubernetes container framework

Aria Tosca Parser, Orchestrator (Apache Found.) - develop a compiler for cloud orchestration

Security Scan for OpenStack (Trilio) - add security scanning to OpenStack

Hardware Auditing Service for HIL (BU) - add security scanning to a bare-metal cloud

Secure Cloud Automated Deployment (MIT Lincoln Labs) - automated deployment for secure cloud

Secure Cloud Network and Disk Encryption (MIT Lincoln Labs) - adding encryption to secure cloud

International Cloud Transactions (startup) - implementing secure, regulatory compliant transfer of medical data



Big Data Containers

Project Logistics:

Preferred Past Experience:

Project Overview:

Background: The MOC has enabled researchers to use its “data lake” (i.e. Ceph object store) in conjunction with jobs launched on OpenStack. However, many analytics type workloads have been moving to container platforms such as OpenShift (Ex: radanalytics.io) to take advantage of its performance and multi-cloud benefits.

In OpenShift 3.7, a service broker concept has been added to enable OpenShift to provision and connect to internal or external services. These service brokers are based on an open standard (Open Service Broker API) and are shared between Kubernetes and Cloud Foundry.

Project Specifics: The goal of this project is to build a service broker for the Open Dataverse API on the MOC to enable analytics jobs on OpenShift to consume data from Dataverse. This will enable researchers to use the power of OpenShift (from the MOC or any other cloud) to provide a more optimized compute option that what's currently available.

See also

Some Technologies you will learn/use:


Agentless, out-of-box inspection of bare metal systems using BMI (Bare Metal Imaging)

Project Logistics:

Preferred Past Experience:

Project Overview:

Background:Agentless and Out-of-box inspection of cloud resources like VMs, Containers allows us to implement various security functions seamlessly. In this project we are exploring an opportunity to extend these design principles for bare metals.

Project Specifics: In this project, we will leverage BMI to build an agentless introspection technology for bare-metal systems. This will enable us to run black-box software against bare metal system state, without any overheads, or side effects. With the BMI architecture, we have cheap, out-of-band access to system content. Building upon this content availability and near-real-time snapshotting/observation of systems, we can build cloud-native operational analytics and security applications for the cloud platform and its bare-metal instances. We expect to perform this work in the open, and contribute to the Agentless System Crawler project, extending it for bare-metal introspection as the end result of this work. We will showcase this technology via application of black-box security and compliance applications against introspected systems data, with the same fidelity as if they were running in each bare metal system.

Some Technologies you will learn/use:


Dataverse Scaling

Project Logistics:

Mentors: Dan McPherson email: dmcphers@redhat.com;  

     Phil Durbin email: philip_durbin@harvard.edu;

Min-max team size: 2-4

Expected project hours per week (per team member): 6-8

Will the project be open source: yes

Preferred Past Experience:

Project Overview:

Background:

Dataverse is a popular Open Source project that has built a very vibrant community of users.  Surprisingly though, Dataverse hasn't really been built for scale or high availability.  Its backlog of work primarily consists of adding more features to the project to make users happy and consists less of how to make the product more resilient and easier to host.  This is a really great state for a product to be in because it means that users are happy and engaged.  Even though the community isn't demanding it, there is a great need to make Dataverse easier to operationalize at scale.  In particular, the MOC has an offering called Open Dataverse where it's using the project to front its research oriented datalake. This project and other projects like it are only a few outages away from making the current situation an emergency.

 

Project Specifics:

Dataverse was recently containerized to run on top of OpenShift (https://github.com/IQSS/dataverse/pull/4168).  The goal of this project is to continue that work and make each of the components of Dataverse function at scale on OpenShift.  This includes glassfish, postgresql, and solr.

Some Technologies you will learn/use:


Radiology in the Multi-Cloud

Project Logistics:

Mentors: Dan McPherson email: dmcphers@redhat.com;  Rudolph Pienaar email: rudolph.pienaar@gmail.com ;

Min-max team size: 2-4

Expected project hours per week (per team member): 6-8

Will the project be open source: yes

Preferred Past Experience:

Project Overview:

Background:

Today, medical image processing often happens behind closed doors without a lot of sharing or collaboration.  This has resulted in a variety of slow and complex systems that are often bespoke to each hospital or research facility.
The ChRIS (
Children's Research Integration System) project's goal is to provide a standardized platform for medical image processing.  It's doing so in collaboration with the MOC and Red Hat using technologies such as OpenStack and OpenShift/Kubernetes with the end goal of democratizing image processing and making the results clinically relevant.

 

Project Specifics:

The goal of this project is to enable ChRIS to be able to communicate with multiple datacenters where each datacenter will have its own list of available image processors. The project will include the implementation, testing, and deployment of the project into a production or preproduction environment.

More details about the radiology collaboration

https://github.com/redhat-university-partnerships/radiology

Specifics about the work: https://github.com/redhat-university-partnerships/radiology/issues/23

More details about ChRIS on the MOC

https://docs.google.com/presentation/d/1mV56QGxpdtmGAmWtNApAlCZe1nHlnXuEZyje-SpXwA8/edit#slide=id.p

Some Technologies you will learn/use:

Software Engineering (python, git, agile, etc)

OpenShift/Kubernetes/Docker

Continuous Integration (likely with Jenkins)





Serverless Supercomputing

Project Logistics:

Mentors: Dan McPherson email: dmcphers@redhat.com;  

Min-max team size: 2-4

Expected project hours per week (per team member): 6-8

Will the project be open source: yes

Institution: BU or NEU

Preferred Past Experience:

Project Overview:

Background: Serverless/FaaS computing really started taking off with the launch of AWS Lamba.  However, the downside of a vendor-specific solution like AWS Lambda is vendor lock-in - you can no longer easily move your application to another provider and you have no control over your cost. Recently Red Hat and other companies have made a bet on Apache OpenWhisk, an Open Source solution for serverless computing that will run across all cloud and on-premise environments, and which as an Open Source solution can be implemented by multiple vendors or by users themselves.

Red Hat is currently making an investment to offer OpenWhisk on top of
OpenShift.  The benefit being that OpenShift can run anywhere and will provide a platform that OpenWhisk can utilize and integrate with.  Because FaaS doesn't really work without existing services to interact with,  in the MOC, OpenWhisk will also be able to take advantage of the features from OpenStack such as the Swift object store.

 

Project Specifics: The goal of this project is to build an on demand "supercomputer" out of OpenWhisk on OpenShift on OpenStack in the MOC.  Namely, given a task that is highly parallelizable (TBD which task), rather than spin up virtual machines or containers to solve the problem, we can instead use OpenWhisk/FaaS to have an on demand supercomputer.  The goal would be to give a small portion of the work to each function, and spin up 1000s of workers to accomplish the job as quickly as possible.  Such a model has huge benefits from a cost perspective in that you aren't paying for any overhead for running the job except for the actual execution time (no spin up or spin down cost).  The problem has been solved with virtual machines many times but you get billed by the hour, min, or second with virtual machines and the time to start and stop the machines can make it cost prohibitive to spin up too many instances at once.  Using containers is better, but using FaaS is really the ultimate solution.  What's needed though is a framework and examples to show how this is possible and in an environment (OpenWhisk) which is generically applicable.

Some Technologies you will learn/use:


Sharing Research through replicable Data Science Environments

Project Logistics:

Mentors: Sri Krishnamurthy email: s.krishnamurthy@neu.edu

Min-max team size: 4-6

Expected project hours per week (per team member): 6-8

Will the project be open source: no

Preferred Past Experience:

Project Overview:

Background: Researchers typically share their research through forums like https://arxiv.org/. In the past decade, there has been a growing interest in replicable research projects to address the so called replication and reproducibility crisis (https://en.wikipedia.org/wiki/Replication_crisis). In a recent Nature paper ( https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970 ), it was quoted that more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments. QuSandbox    ( www.qusandbox.com ) is our effort to address the reproducibility problem by enabling creation of replicable data science environments. Our technology leverages Docker, many open source and proprietary technologies and the cloud to embed code and replicable data science environments to enable researchers, students and practitioners to create and share projects that can be replicated and shared.

 

Project Specifics: In this project, we aim to prototype an environment customized for researchers to be able to share their research. For example, think of a researcher creating a new algorithm for text analysis. Let us say a draft paper is published on arxiv and the software is available on github. Our goal is to create a platform where in addition to github, the researcher can share the code and the environment leveraging QuSandbox’s research cloud environment.

Goals of the project:

Some Technologies you will learn/use:



Predictive Analytics on Telemetry Data from HPE SimpliVity Customer Systems

Project Logistics:

Mentors: Jiri Schindler email: jiri.schindler@hpe.com;  Peter Corbett email: peter.corbett@hpe.com;

Min-max team size: 3-5

Expected project hours per week (per team member): 6-8

Will the project be open source: no

Preferred Past Experience:

Project Overview:

Background:

The ability to analyze and determine usage patterns across an entire install base, spot anomalies for a specific production system or predict future resource consumption for a customer deployment is an integral part of any enterprise system in a production environment. These analytics help administrators and IT personnel optimize operations and system providers (manufacturers) deliver more efficient and timely support.

The HPE SimpliVity HyperConverged Infrastructure is an on-premises solution that combines management of virtual machines (compute), storage with application and hardware lifecycle operations (disaster recovery, upgrades, system expansion). A SimpliVity solution consists of a collection of high-end commodity servers with local storage that are organized into clusters. Several clusters comprise a federation spanning different geographies. An entire federation uses a single management/operations console integrated into existing virtualization infrastructure solutions (e.g., VMWare vSphere). For more details on HPE SimpliVity see
https://www.hpe.com/us/en/integrated-systems/simplivity.html).

Each system (node) deployed in a production environment sends periodically detailed configuration, telemetry, and performance data to HPE where it is processed and analyzed. The data is available for internal consumption by various organizations (product planning, architecture, performance engineering, support etc.). Since the acquisition of SimpliVity by HPE in May 2016, we have embarked on a long-term project to revamp our platform for data processing and analysis of data we receive from tens of thousands of systems sold to-date to our customers. There are several scrum teams working on this effort, and in particular on the integration with existing HPE systems and databases tracking assets, customers and other information.

Project Specifics:

The focus of the project would be the development of new analyses and data visualizations of the configuration and telemetry data. Some of these may already exist as stand-alone purpose-built (one-of) tools or scripts developed by support engineers. One of the specific objectives would be to re-design and re-implement some of them in a scalable fashion and make them an integral component of our analytics platform: automatically pulling data from a common “data lake” and making the results available to everyone authorized within HPE for inspection.

Another project objective would be to work with the mentors (principal investigators) on developing new analyses such as unique data growth, compression and deduplication, resource consumption predictors, impact of snapshots and backups on overall storage consumption etc. The mentors will provide guidance in selecting suitable machine learning and data analysis algorithms and provide the necessary context. Therefore, background in machine learning or statistical analysis is not required.

The BU/NEU cloud computing course students would form a separate scrum team. They would be focusing on specific goals and tasks with minimal direct dependencies on other teams comprised of full time HPE employees that are part of the broader effort and focus on integration with other HPE systems including HPE InfoSight. The student team of students would follow our agile development processes with sprints and having their own backlog of stories. This separation will allow the students to make progress on their own, and cleanly delineate their work from that of the other teams. The students would follow (to the extent that is practical) our practices for developing production code (including writing unit tests and adopting continuous integration flow with automated deployment of a new version of the code onto a container infrastructure).

The overall effort dictates the selection of technologies, tools, and languages; the students would be expected to adopt them. We leverage open source technologies and tools readily available to everyone without special license, including, among others, Apache Spark,
NiFi, Mesos (Mesosphere DC/OS), and Zeppelin. We also use CI tools like git, Jenkins,  gradle, ansible etc. The expectation is that the artifacts (code and algorithms) developed by the student team would be directly leveraged within HPE for the production data platform and stay with the company at the conclusion of the project.


General information about the specific product and its features.

https://www.hpe.com/us/en/integrated-systems/simplivity.html

Overall effort within HPE that this project would be part of.

https://www.hpe.com/us/en/storage/infosight.html


Some Technologies you will learn/use:




SecDevOps in the Cloud

Automate the deployment and securing of cloud based simple & complex services and applications for self-service consumption in MOC using CONS3RT

Project logistics

Preferred experience

Overview

"If it is not automated, it is broken" - An axiom that could not be more true in the cloud and security world. Those involved in the design, delivery and use of cloud applications and services prove it to themselves everyday. The work we do consists of many pieces pulled together into an orchestrated dance. Security is often applied late in the cycle as an overhead task rather than being baked in throughout the lifecycle. You may do the same thing on Monday, Tuesday, Wednesday, but then a message comes in or the phone rings and you miss a step. And that is all that is needed for a vulnerability.

Automation is not just about speed; it delivers consistency, repeatability and comprehensive coverage, all of which are paramount to security. SecDevOps is a critical and growing specialty in industry.



CONS3RT gives users with a better way to deploy, validate, ands secure the infrastructure and systems they field in the cloud. By using a modular approach with a broad library of software and security components - application, utilities, data sets, lock downs, configurations, etc - users can automate the entire lifecycle of their environments. Users leverage this library of resources to automate the execution and securing anything from simple workstations to complex environments (e.g 50 or more servers). Every thing in the library can be shared and reused "as is" and/or used as the basis for something more.


Working with a set of experienced asset developers, the team will identify options for security elements for fielding systems that impact the the user community. After evaluating alternatives, the team will select a target components in the area of system and infrastructure security configuration, assessment and monitoring. Then the team will develop, integrate, and test the assets to make it real. The resulting assets and designs will be shared and used across the CONS3RT community in multiple sites, including the MOC.

What is CONS3RT?

CONS3RT is a leading-edge cloud and security orchestration service providing users with DevOps Automation and Validation. There are several tools that target cloud management and more that provide DevOps, but CONS3RT focuses on cloud usefulness and the strategic needs of DevOps so users can get things done! Users can leverage the Provisioning, Build and Test as a Service capabilities so their organization can achieve the highest levels of continuous integration, continuous delivery and stongest security.

Who is Jackpine?

Jackpine Technologies is a Boston area start-up focused on making cloud useful & secure. Our mission is to empower organizations to create exceptional software through innovative practices, technologies and products. We are small, nimble team with a great collaborative atmosphere. To date, all of our interns have stuck around after graduation so we must be doing something right. Students participating in this project will have access to some of the industry leaders in cloud DevOps.

Some Technologies you will learn/use:




First class support for OpenStack in Kubernetes ecosystem

(you can ask questions at https://etherpad.openstack.org/p/dims-k8s-openstack-moc - it’s an “etherpad”, sort of like a google doc, and the mentor will monitor it for questions)

Project logistics

Preferred past experience

Project Overview:

Kubernetes is essentially a Container Orchestration Engine, it is able to coordinate running applications in containers across many machines. Kubernetes is very popular in public clouds and not so much in private clouds. Since OpenStack is the pre-eminent private cloud and Open Source as well, we have a unique opportunity to make the private OpenStack based cloud support be the very best to compete with the giants. So we will start with exploring the level of support for OpenStack in projects like kubernetes-anywhere (https://github.com/kubernetes/kubernetes-anywhere) , kops (https://github.com/kubernetes/kops), kubeadm (https://github.com/kubernetes/kubeadm). Then we kick the tires to see what works, what does not, find and fix problems / issues and basically ensure that things work.



We will define a set of scenarios that should work including things like exposing OpenStack Cinder volumes to Kubernetes applications, making sure Kubernetes can use OpenStack Neutron/Octavia based LBaaS etc. We will also setup CI/CD systems (possibly at http://openlabtesting.org/) and ensure that the Kubernetes end-to-end and conformance tests run as well to make sure that all the functionality in Kubernetes works well. Note that this is working across a stack OpenStack is at the bottom, Gophercloud talks to OpenStack, Terraform uses Gophercloud to deploy IaaS resources, kubeadm can setup kubernetes clusters, kops and kubernetes-anywhere use all of the above mentioned to give a really good user experience for deployers and operators. So your will be participating in a full eco-system, all your work will be in public repositories, under Apache License, Version 2.0 and you will be easily able to point your work to future employers. Not to mention, you work will be immediately used by a whole lot of people and is not just for demo. If this rocks your boat, let me know.



Some Technologies you will learn/use:



Cloud networking and Continuous Integration/Continuous Deployment (CI/CD) Infrastructure

Project Logistics:

Mentors: Leon Zachery email: lzachery@cisco.com;

Min-max team size: 4-5

Expected project hours per week (per team member): 6-8

Will the project be open source: yes

 

Project Overview:

Connectivity to the cloud and between clouds is an evolving technology area. CI/CD (Continuous Integration/Continuous Deployment) is essential in ensuring robust operation of networking in production cloud environments. A judicious balance between evaluating functional correctness and run times makes the infrastructure more valuable. Often times the tests do not provide enough coverage and the pass results are not truly representative of the state of the change sets. In the other extreme, we may have multiple tests that do not add much value in terms of coverage and just result in longer run times. Evaluation of hardware based networking devices add additional constraints in setting up the deployment to accurately represent the performance benefits of physical devices as well.

Students can consider looking at mapping some formal verification techniques to adapt to construct representative optimal test sets or for the more pragmatic minded even focus on building tools using metrics such as code coverage to provide guidance on usefulness of additional tests. Some of the constraints with hardware devices can be modeled using the ASR1k based environment present at BU.

What the students will get:

- Good Knowledge of CI/CD immediately applicable to industry

- Tools such as zuul, ansible




Serverless Computing on the Edge

Project Logistics:

Mentors: Leon Zachery email: lzachery@cisco.com;

Min-max team size: 4-5

Expected project hours per week (per team member): 6-8

Institution: BU or NEU

Project Overview:

Serverless Computing paradigms are useful for low latency and low footprint task execution. Augmented Reality and many mobile applications require low latency compute with minimal network response times. Many of the Edge network devices have significant compute capacities that can be leveraged for a quick turnaround. Low footprint functions are quite ideal to perform some quick compute without any compromise to network functionality. ASR1k platforms support running containers and we would like to prototype and investigate use cases that can enable edge compute with some applications. Any application can be used as a sample for an implementation in this environment with the intent to understand and drive for some generic infrastructure to support serverless environments on an ASR1k.

 

What the students will get out of this:

- Cloud Computing and Serverless workflows

- Containers and container orchestration.

- Depending on Applications - some understanding of those specific domains.




CDN with WAF (Content Distribution Network with Web Application Firewall)

Project Logistics:

Mentors: Karl Redgate email: karl.redgate@gmail.com, karl.redgate@akamai.com (Akamai Security business unit)

Min-max team size: 3-6

Expected project hours per week (per team member): 6-8

Will the project be open source: yes

 

Preferred Past Experience:

Project Overview:

There are now several Content Distribution Networks on the Internet from service providers. Some companies are now rolling out their own CDNs. We will generate our own limited CDN with integrated Web Application Firewall with an API for provisioning a customer.


Some Technologies you will learn/use:




Access Orchestration Service for OpenStack Federations

Project Logistics:

Mentor: Kristi Nikolla <knikolla@bu.edu>

Min-max team size: 3-6

Expected project hours per week (per team member): 6-8

Will the project be open source: yes

Preferred Past Experience

Project Description

OpenStack is a cloud operating system that manages compute, storage and network resources in a datacenter. Federating multiple OpenStack clouds allows users of one cloud to access resources from other clouds participating in the federation.


The goal of the project is to design and implement a service for orchestrating the management of user authorization in multiple OpenStack clouds that are federated together. This will require learning and interacting with various cloud services using restful api’s.


This would allow a user in one cloud to request a quota of resources in another cloud using a REST API interface.

Technologies you will learn




Drone Mission Analysis

Project Logistics:

Preferred Past Experience:

Project Overview:

Background: Drones are opening up entirely new paradigms for commercial businesses. These flying robots are quickly stepping in to do jobs that are dull, dirty and dangerous. The age of robotics is here, and today is like Internet 1995.

Project Specifics: In many applications a drone can do a better job than a human, and one of those is infrastructure inspection. Being new technology, there are no standards in place and the results of these inspections can vary depending on the skill and experience of the drone operator. A Rhode Island startup is working to change that. VertSpec is presently developing software that automates the inspection of Cell Phone Towers and Power lines. Their web-based system will program any drone to do a precision inspection that can be repeated over time.

Commercial drone operators can contract with VertSpec and download pre-program inspection missions then visit the inspection site and fly the mission. After the mission is complete, the operator needs to send back data to VertSpec to double check that the inspection is complete. VertSpec needs a way to extract data from the images taken during the inspection and send it back to their cloud service for verification. The data that needs to be extracted from the pictures are standard EXIF data, XMP (DJI Meta Data) and a thumbnail of each image.

The solution to be provided as part of this cloud computing class project is a low-bandwidth web-based tool to upload JSON Objects that include camera, XMP and EXIF data. The cloud service will process the data and return a result to the drone operator. The result will be either ‘Mission Complete’ or an instruction to re-fly all or part of the mission.

Some Technologies you will learn/use:


Diagnosing problems in complex distributed apps with end-to-end tracing (four separate projects)

At the beginning of time, when dinosaurs roamed the earth, most applications were simple and monolithic: they ran on single machines.  Think Minesweeper, Pacman, and the original Doom video game.  To debug problems in these simple applications, it was natural to think in machine-centric terms: “What is going on in this machine that is affecting performance of my application?”  Many standard techniques that we still use to debug applications today arose from this machine-centric mindset.  Some examples include logging, adding printfs to code (also called caveman debugging), and Linux performance counters.  



In the present day, applications have become immensely complex.  Many are no longer monolithic.  Rather, they are distributed: they run on a set of machines (physical or virtual) that coordinate with one another to perform important tasks.  For example, a web-search application within Google may store potential search-query results within 10s of 1000s of individual application nodes and query all of them in response to user search requests.  This web-search application may also depend on other distributed applications to perform its task (e.g., an advertisement-retrieval application or a distributed storage system, such as GFS or Ceph).  It is also worth noting that these distributed applications have become critical to almost every aspect of modern society.  We use them when searching (e.g., on Google),  shopping (e.g., on Amazon), when watching videos (e.g., using Netflix), and when playing massive multi-player online games (e.g., playing Pokemon Go).  


It is critical to provide tools and techniques to help developers understand behaviour of these complex applications and debug problems that arise within them.  Looking at individual machines is no longer sufficient for two key reasons.  First, knowing the performance characteristics or behaviour of any single node yields little insight about the performance or behaviour of the system as a whole.  Second, some types of (potentially problematic) behaviour may be emergent---it may only be observable when the entire distributed system is analyzed as whole.   As a step toward analyzing distributed systems as a whole, recent work has created workflow-centric or end-to-end tracing methods.  Instead of focusing on individual machines, these techniques capture the flow of the work done to process individual requests within and among the components of a distributed system.  Seeing the necessity of such tracing, many large technology companies are starting to adopt end-to-end tracing.  Examples include Google with Dapper, Facebook with Canopy, and Uber with Jaeger.  



There is a rich opportunity to build on end-to-end tracing to explore how to analyze distributed applications holistically.  There are three key challenges.  The first challenge is in instrumenting complex distributed applications with end-to-end tracing.  Doing so requires building up expertise in the application of interest and building up intimate knowledge of how end-to-end tracing works.  The second is in understanding what trace data must be preserved the analys(es) of interest (preserving all trace data is not possible due to scale/complexity).  The third challenge lies in understanding how to analyze, visualize, or create models based on the trace data to provide the required insights.  The projects suggested below all revolve around addressing these important and timely challenges.  



Project 1: Investigating cross-layer problems

Mentors: Raja Sambasivan (BU) and Peter Portante (Red Hat)

Useful skills:

Min-max team size: 2-4

Expected project hours per week (per team member): 6-8

Will the project be open source?: Yes

End-to-end tracing infrastructures capture activity only within applications.  But, distributed applications running in data centers run atop of complex stacks comprised of multiple layers related to virtualization and networking.  A problem observed within the application, such as excessively slow response times, may be caused by the issues within application itself (e.g., a poorly-written data structure implementation) or by issues within any layer below it.  For example, two virtual machines (VMs or containers) co-located on the same physical machine may contend with each other for resources (e.g., CPU).  This might lead to reduced performance for either or both of the applications running within the VMs.  Also, recent fixes to lower stack layers (e.g., to the kernel to address Intel’s recent security vulnerabilities) may impact application performance.  Extending end-to-end tracing across layers would further enhance its ability to provide insight into the behaviour of the system as a whole.  It would give developers insight into these important cross-layer problems so that they can focus their diagnosis efforts where needed.


Project tasks: This project will involve extending end-to-end tracing to lower layers of the stack, specifically the linux kernel.  As part of this effort, we will also trace TCP/IP packets sent between nodes of the distributed application.  Our approach will be to convert existing logging mechanisms within the kernel to tracing, thus creating an independent end-to-end tracing mechanism within the kernel.  We will then interface it with application-level tracing infrastructures (e.g., using Open-tracing-compatible APIs).   We will use the resulting cross-layer traces to analyze where time is spent in the kernel on behalf of application-level requests.  Time permitting, we will use our cross-layer end-to-end tracing mechanism to explore issues related to contention between containers co-located on the same machine.  

Skills students will learn



Project 2: Implementing “always on” end-to-end tracing in the Ceph distributed storage service

Mentors:  Mania Abdi, Raja Sambasivan, Peter Portante (Red Hat)

Useful skills:

Min-max team size: 2-4

Expected project hours per week (per team member): 6-8

Will the project be open source?: Yes


Ceph is an extremely popular open-source distributed storage service used within many cloud environments.  For example, we use it extensively within the Massachusetts Open Cloud to store data for our users.  It is an immensely complex service consisting of over a million lines of code.  A single ceph deployment may consist of numerous storage nodes and various libraries and gateways to support different types of storage (block, object, filesystem).  While Ceph has blkin tracing today, it is not continuously enabled and is fairly rudimentary (it does not capture concurrency, synchronization, or allow critical paths to be extracted).   As a step toward understanding important performance and correctness problems in this important service, there is a need to implement an “always-on,” sophisticated end-to-end tracing system within it.



Project tasks: In this project, we will expand on existing efforts to implement end-to-end tracing within Ceph.  The goal will be to create an “always-on” tracing system with low overhead.  We will drive Ceph with various workloads and use the resulting traces to understand Ceph’s performance characteristics under various scenarios.  We will also use the traces to understand performance issues a team at BU working on Ceph has encountered.

Skills students will learn



Project 3: Increasing the Efficacy of Tracing via Granularity Mapping

Mentors: Peter Portante (Red Hat), Raja Sambasivan

Useful skills:

Min-max team size: 2-4

Expected project hours per week (per team member): 6-8

Will the project be open source?: Yes


Capturing trace data has a cost associated with it.  The traces require a certain amount of storage, computation to process them after capture, computation during capture, and network bandwidth to transfer trace-point data from its origin in a distributed application to the trace point store.  The higher the frequency, or granularity, at which traces are captured, the higher the associated costs related to capture and storage of those traces.  If one has too high a granularity, the distributed system might be negatively impacted by trace capture due to its resource cost, disrupting the behaviors being observed.  If one has too low a granularity of trace capture, the traces run the risk of being ineffective or not representative of the system’s behavior, because those traces miss interesting behaviors.  Because low granularity traces present little-to-no drain on resources for a distributed system, one can consider capturing such traces continuously without risk of impacting the system.  Production systems can then run tracing continually at scale without worry of a performance impact.



Project tasks: Your task is to come up with 2 or 3 different approaches or techniques for increasing the effectiveness of low-granularity traces by mapping many low granularity traces to a much smaller set of high-granularity traces.  With such a mapping, one can envision a system where we can determine when to engage a high-granularity trace without having adverse impact on the system while still capturing useful data.  We have existing trace data that can be used to start your work.  You will have to implement at least one of the approaches within an existing end-to-end tracing system (e.g., Jaeger).



Questions you’ll need to answer:

Skills students will learn



Project 4: Cross-Layer Tracing in Kubernetes

Mentors: Peter Portante (Red Hat), Raja Sambasivan

Useful skills:

Min-max team size: 2-4

Expected project hours per week (per team member): 6-8

Will the project be open source?: Yes


Kubernetes is an open-source platform designed to automate deploying, scaling, and operating application containers - in other words it is a container deployment orchestration system.  Being an orchestration system, much of the operation of kubernetes is implemented in the “control-plane”, where Kube commands tell this control-plane how Kube should orchestrate its deployment tasks.  There are parts of Kubernetes which provide “data-plane” features for the application, like ingress and services.  As these data-plane features become part of the flow and behavior of the application, instrumenting them with trace points gives us a cross-layer trace enabling a deeper understanding of an applications behavior.



Project tasks: Your goal is to consider the data-plane sub-systems of Kubernetes and instrument one or more of them to implement cross-layer tracing for Kubernetes.  There might be a use case for instrumenting the control-plane of Kubernetes with trace points where the application communicates with Kube to change its operation.  We suggest focusing on the Jaeger tracing system for this project.  We’ll provide suggestions for Kubernetes applications one can use for tracing.  



Advanced options and stretch goals:

Skills students will learn



Aria Tosca Parser and Cloud Orchestrator

 

Project Logistics:

Mentors: Thomas Nadeau email: thomasdnadeau@gmail.com;  

Min-max team size: 2-4

Expected project hours per week (per team member): 6-8

Will the project be open source: yes

 

Preferred Past Experience:

Python coding  -  Required

Experience with Linux (i.e.: ubuntu, debian, etc...) - Required

Project Overview:

Background:

The AriaTosca (https://www.ariatosca.org) project has two main components: the Tosca language front-end compiler and the back-end orchestrator.  

The front end can give students exposure to a world-class compiler as well as learn the TOSCA language (its a yaml-based language).

The back end is used to provision Kubernetes, OpenStack, Azure, AWS, etc… work loads.  Code is in python.

 

Project Specifics:

The AriaTosca (https://www.ariatosca.org) project has two main components: the Tosca language front-end compiler and the back-end orchestrator.  The  front end can give students exposure to a world-class compiler as well as learn the TOSCA language (its a yaml-based language). The back end is used to provision kubernetes, open stack, azur, aws, etc… work loads.  Code is in python.

Aria is the basis for the very popular Cloudify (cloudify.co) web provisioning system.  The base Aria system can be used to orchestrate/provision work loads via various cloud orchestration and systems (i.e.: openstack, azur, aws, etc...) as well as interfacing with other emerging ones such as https://www.onap.org


We have a running podcast series describing the project and some of its components.

https://soundcloud.com/theopensourcepodcast/the-aria-tosca-podcast-episode-1

Check out additional resources on our twitter feed

https://twitter.com/AriaTosca

Aria Tosca is an open source project hosted by The Apache Foundation.

http://ariatosca.org/



Some Technologies you will learn/use:



Security Scan for OpenStack

 

Project Logistics:

Mentors:  email: ;  Billy Field email: billy.field@trilio.io;

Min-max team size: 2-4

Expected project hours per week (per team member): 6-8

Will the project be open source: yes

 

Preferred Past Experience:

More relevance with API calls, than OpenStack expertise. Valuable

Trilio platform is built on Python, though students will need to leverage APIs. Valuable

Linux fundamentals. Required

Security fundamentals Valuable

Scripting Valuable


Project Overview:

Background:

Security scans are integral process of any Commercial account, specifically the Financial Services industry. Security teams can either scan end-point devices or target data repositories. Trilio is a native OpenStack Cloud data protection software technology, that creates a snapshot of the production environment, making it easy to restore an entire workload/environment with a single click. Trilio exposes these snapshots to 3rd party applications so that organizations can use the solution for Security, BC/DR, and other solutions.

 

Project Specifics:

In this project students will understand how to leverage the Trilio APIs in order to integrate with OpenVAS (Open Source vulnerability scanning application). Students will also research other 3rd Security tools such as Nessus and Bandit.



Independent assessment of Trilio technology:
https://www.trilio.io/portfolio/esg-lab-review-trilio-vault-2017/

https://www.youtube.com/watch?v=b0aBde7CIHc


Trilio Content section

https://www.trilio.io/whitepaper/


Some Technologies you will learn/use:





Hardware Auditing Service for HIL (bare metal cloud)

Project Logistics:

Preferred Past Experience:

Project Overview:

Background:

A increasingly new category of cloud service is Hardware as a Service (HaaS), where users of a cloud can elastically acquire physical rather than virtual computers.  Benefits include security (you don’t have to trust complicated virtualized stacks), performance, determinism (e.g., for performance experiments), and standing complex higher level services.  The MOC has developed the Hardware Isolation Layer (HIL) both as the basis for a HaaS offering, and to allow physical machines to be moved between different services of the cloud.  


HIL is a low-level tool that allows users to reserve physical machines and connecting them via isolated networks. It manages network switches to provide network isolation, and can control nodes using out-of-band management.   A system administrator identifies  to HIL information about the physical resources, such as the list of available machines, their network interfaces, and where (i.e. switch port) those interfaces are connected.


Project Specifics:

The goal of this project is to develop a service that can query the network switches on behalf of HIL users and ensure that HIL configuration is accurate and consistent with the actual state of switches.  The two driving use cases for this auditing service are (1) to detect manual modifications of the switch configuration, and to allow external security auditing of the HIL service.

In addition, this service will also help with tracing, SLA maintenance of networks and 

help us identify the network topology, eliminating the need of manually tracing network cables. It gives network engineers visibility into the IP, MAC, VLAN, status and availability of ports, and could become a valuable tool for system administrators and network engineers to troubleshoot networking in a data-center.


Some technologies you will learn/use:




Bolted: Security as a Service

NOTE: There are two separate projects listed in this document.  We are looking for different teams of students for each one.

Useful Skills

Some technologies you will learn/use

Overview

Background

Today, user’s of the cloud need to trust the provider not only to not be malicious, but to not have bugs in the enormously complex software they deploy for virtualization and to manage their clouds.  This imposes major barriers for security sensitive customers (e.g., military, medical, financial) as well as many open source developers that don’t want to trust the large cloud providers.  Is there a way that we can get the elasticity of clouds while limiting our trust in the provider?

Bolted is a specification for a more secure bare-metal provisioning implementation.  The system is designed to be flexible, allowing customers to choose their own desired level of security for their cloud environment (as dictated by needed speed-security tradeoffs) as well as what degree of trust they are placing in the provider.

Currently, bare-metal cloud services do not offer clients the ability to be certain that their nodes are in a trusted state, and cannot guarantee that previous tenants of that hardware have not tampered with the underlying hardware or firmware.  Existing systems also do not ensure that nodes are properly sanitized after use, ensuring information cannot be exfiltrated from the nodes by a future tenant. The Bolted design is meant to help address these issues for bare-metal offerings.

A general overview of the Bolted architecture and node life-cycle can be seen in Figure 1.  The blue arrows denote state changes in the system, orange arrows represent requests made to the Bolted services and green arrows are actions taken by each service as a result of those requests.

Figure 1 Bolted Architecture: Blue arrows are state changes, orange are requests to services and green are actions taken by each service

Here, the Isolation Service allocates nodes to tenants and configures network routers to isolate nodes from other tenants.  The Attestation Service ensures that the code running on the node is correct and trusted (ranging from the firmware code through the OS-level code and even applications running in the OS).  The Provisioning Service installs software on the nodes (ranging from bootloaders to the full OS of the node). The Orchestration Service is designed to tie all of these services together so tenants do not need to communicate with the other services directly.


Secure Cloud Automated Deployment (Project 1)

Mentors: Charles Munson, Apoorve Mohan, Naved Ansari

Useful Skills


The goal of this project is to make the Bolted system more user-friendly to deploy. Right now, the Orchestration Service (as seen in Figure 1) provides a set of scripts that a tenant can run in order to bring up, attest and provision their nodes.  Although this is more convenient than having to do all of this manually, there is still a lot of room for improvement by offering an integrated experience that more easily allows tenants to perform these actions.  

The team will explore different technologies for automating the deployment of the services (e.g., using containers), determine what solution to use, and deploy it for all the different micro-services that make up Bolted. In addition, given time, a user interface will be developed for this project that will allow tenants to bring up and provision their nodes without the need for executing these underlying shell scripts themselves.  

Some technologies you will learn/use




Secure Cloud Network and Disk Encryption (Project 2)

Mentors

Useful Skills

The goal of this project is to make the Bolted system more convenient to set up network and disk encryption for, enabling automatic disk and network encryption (via LUKS and IPsec, respectively).

The Attestation Service is provided primarily by the Keylime software system, which allows the tenant to be sure their node is in a good state before provisioning, and then periodically checks the tenant’s provisioned nodes to ensure they stay in a good state. Currently, if a node ever fails attestation, a notification is sent out to the tenant of the failure and a user-supplied script may be executed (though no other action is taken to prevent the compromised node from causing issues with the tenant’s set of nodes).

For the first part of this project, the team will implement IPsec, which will not only allow nodes to securely communicate with one another in the tenant’s enclave, but will also allow IPsec keys to be automatically revoked upon attestation failure (helping to prevent the affected node from causing damage to other nodes in the system).

Another aspect of this project is provisioning, which is handled by the Provisioning Service, and is handled primarily by the Bare Metal Imaging (BMI) system. Currently, BMI stores and sends unencrypted software images to nodes during provisioning, which means it may be possible for people with access to the internal hardware to see these software images (which could potentially contain sensitive information). Note that attestation ensures these images have not been tampered with.

The second part of this project will be focusing on encrypting these provisioning images (via LUKS) in BMI and then decrypting them when they are provisioned on the node.

Some technologies you will learn/use


International Cloud Transactions

Project Logistics:

Preferred Past Experience:

Project Overview

Background: International Cloud Transactions with medical data with understanding of security and privacy implications. Utilizing an exciting new Connect Medical Care framework.

Project Specifics: Research & Develop concepts to addressing secure international cloud transactions of medical data from patients. The data as a stream has been “de-identified” from individuals. It's at the device level, monitoring data, and part of a larger customer owned collective looking a patient care trends.

Additional areas: Encrypted big data and large data object analytics with decentralized data staying with data owner, privacy controls and understanding of recent European data rights, technical use cases with infrastructure models.

Some Technologies you will learn/use: