Big Data Containers (Red Hat)– enable OpenShift/Docker access to Cloud DataVerse (scientific data repository)
Agentless introspection of bare-metal systems (IBM) – security scanning of network-booted servers
Dataverse scaling (Harvard/Red Hat) – apply cloud scaling techniques (containers, etc.) to Dataverse
Radiology in the multi-cloud (Red Hat/Children's Hosp) – cross-datacenter connector for existing X-ray analysis system
Serverless supercomputing (Red Hat) - use on-demand functions for high performance computing
Sharing Research... (NU/startup) - creating containers for data science
Predictive Analytics... (HP Enterprise) - use Spark, etc. to analyze HP customer data
SecDevOps in the Cloud (startup) - automated security for cloud deployment systems
OpenStack in Kubernetes (OpenStack/Huawei) - creating links between openstack and kubernetes
Cloud networking and Continuous Integration (Cisco) - extending CI/CD tools to include network HW
Serverless Computing on the Edge (Cisco) - combining serverless (lambda) with edge computing
Content Distribution / Web Application Firewall (Akamai) - creating a prototype firewalled CDN
Access Orchestration (BU/OpenStack) - create a new OpenStack identity federation service
Drone Mission Analysis (startup) - cloud extraction of data from civilian drone images
Tracing - Investigating cross-layer problems (BU/Red Hat) - extend Jaeger and OSprofiler trace frameworks
End-to-end tracing in Ceph (Red Hat/BU/NU) - add always-on tracing to Ceph storage system
Trace "Granularity Mapping" (Red Hat/BU) - add variable-overhead tracing to Jaeger or OSprofiler
Tracing in Kubernetes (Red Hat/BU) - add instrumentation to Kubernetes container framework
Aria Tosca Parser, Orchestrator (Apache Found.) - develop a compiler for cloud orchestration
Security Scan for OpenStack (Trilio) - add security scanning to OpenStack
Hardware Auditing Service for HIL (BU) - add security scanning to a bare-metal cloud
Secure Cloud Automated Deployment (MIT Lincoln Labs) - automated deployment for secure cloud
Secure Cloud Network and Disk Encryption (MIT Lincoln Labs) - adding encryption to secure cloud
International Cloud Transactions (startup) - implementing secure, regulatory compliant transfer of medical data
Background: The MOC has enabled researchers to use its “data lake” (i.e. Ceph object store) in conjunction with jobs launched on OpenStack. However, many analytics type workloads have been moving to container platforms such as OpenShift (Ex: radanalytics.io) to take advantage of its performance and multi-cloud benefits.
In OpenShift 3.7, a service broker concept has been added to enable OpenShift to provision and connect to internal or external services. These service brokers are based on an open standard (Open Service Broker API) and are shared between Kubernetes and Cloud Foundry.
Project Specifics: The goal of this project is to build a service broker for the Open Dataverse API on the MOC to enable analytics jobs on OpenShift to consume data from Dataverse. This will enable researchers to use the power of OpenShift (from the MOC or any other cloud) to provide a more optimized compute option that what's currently available.
See also
Background:Agentless and Out-of-box inspection of cloud resources like VMs, Containers allows us to implement various security functions seamlessly. In this project we are exploring an opportunity to extend these design principles for bare metals.
Project Specifics: In this project, we will leverage BMI to build an agentless introspection technology for bare-metal systems. This will enable us to run black-box software against bare metal system state, without any overheads, or side effects. With the BMI architecture, we have cheap, out-of-band access to system content. Building upon this content availability and near-real-time snapshotting/observation of systems, we can build cloud-native operational analytics and security applications for the cloud platform and its bare-metal instances. We expect to perform this work in the open, and contribute to the Agentless System Crawler project, extending it for bare-metal introspection as the end result of this work. We will showcase this technology via application of black-box security and compliance applications against introspected systems data, with the same fidelity as if they were running in each bare metal system.
Project Logistics:
Mentors: Dan McPherson email: dmcphers@redhat.com;
Phil Durbin email: philip_durbin@harvard.edu;
Min-max team size: 2-4
Expected project hours per week (per team member): 6-8
Will the project be open source: yes
Preferred Past Experience:
Docker - Nice to have
Kubernetes/OpenShift - Nice to have
Git - Valuable
Java - Nice to have
Project Overview:
Background:
Dataverse is a popular Open Source project that has built a very vibrant community of users. Surprisingly though, Dataverse hasn't really been built for scale or high availability. Its backlog of work primarily consists of adding more features to the project to make users happy and consists less of how to make the product more resilient and easier to host. This is a really great state for a product to be in because it means that users are happy and engaged. Even though the community isn't demanding it, there is a great need to make Dataverse easier to operationalize at scale. In particular, the MOC has an offering called Open Dataverse where it's using the project to front its research oriented datalake. This project and other projects like it are only a few outages away from making the current situation an emergency.
Project Specifics:
Dataverse was recently containerized to run on top of OpenShift (https://github.com/IQSS/dataverse/pull/4168). The goal of this project is to continue that work and make each of the components of Dataverse function at scale on OpenShift. This includes glassfish, postgresql, and solr.
Some Technologies you will learn/use:
Containers/Docker/Kubernetes/OpenShift
Software Engineering (Agile, Scrum, Git)
Cloud Computing (OpenShift, OpenStack)
Cloud Scale (Web Frameworks, Databases, Search Engines)
Project Logistics:
Mentors: Dan McPherson email: dmcphers@redhat.com; Rudolph Pienaar email: rudolph.pienaar@gmail.com ;
Min-max team size: 2-4
Expected project hours per week (per team member): 6-8
Will the project be open source: yes
Preferred Past Experience:
OpenShift/Kubernetes - Nice to have
Docker - Nice to have
Python Valuable
Project Overview:
Background:
Today,
medical image processing often happens behind closed doors without a
lot of sharing or collaboration. This has resulted in a variety
of slow and complex systems that are often bespoke to each hospital
or research facility.
The ChRIS (Children's
Research Integration System)
project's goal is to provide a standardized platform for medical
image processing. It's doing so in collaboration with the MOC
and Red Hat using technologies such as OpenStack
and
OpenShift/Kubernetes with the end goal of democratizing image
processing and making the results clinically relevant.
Project Specifics:
The goal of this project is to enable ChRIS to be able to communicate with multiple datacenters where each datacenter will have its own list of available image processors. The project will include the implementation, testing, and deployment of the project into a production or preproduction environment.
More details about the radiology collaboration
https://github.com/redhat-university-partnerships/radiology
Specifics about the work: https://github.com/redhat-university-partnerships/radiology/issues/23
More details about ChRIS on the MOC
https://docs.google.com/presentation/d/1mV56QGxpdtmGAmWtNApAlCZe1nHlnXuEZyje-SpXwA8/edit#slide=id.p
Some Technologies you will learn/use:
Software Engineering (python, git, agile, etc)
OpenShift/Kubernetes/Docker
Continuous Integration (likely with Jenkins)
Project Logistics:
Mentors: Dan McPherson email: dmcphers@redhat.com;
Min-max team size: 2-4
Expected project hours per week (per team member): 6-8
Will the project be open source: yes
Institution: BU or NEU
Preferred Past Experience:
Serverless Computing /FaaS (Function as a Service) - Valuable
Containers/Docker/Kubernetes/OpenShift - Nice to have
Git - Valuable
JavaScript - Valuable
Project Overview:
Background:
Serverless/FaaS
computing really started taking off with the launch of AWS Lamba.
However, the downside of a vendor-specific solution like AWS
Lambda is vendor lock-in - you can no longer easily move your
application to another provider and you have no control over your
cost. Recently Red Hat and other companies have made a bet on Apache
OpenWhisk,
an Open Source solution for serverless computing that will run across
all cloud and on-premise environments, and which as an Open Source
solution can be implemented by multiple vendors or by users
themselves.
Red Hat is currently making an investment to
offer OpenWhisk on top of OpenShift.
The benefit being that OpenShift can run anywhere and will
provide a platform that OpenWhisk can utilize and integrate with.
Because FaaS doesn't really work without existing services to
interact with, in the MOC, OpenWhisk will also be able to take
advantage of the features from OpenStack such as the Swift object
store.
Project Specifics: The goal of this project is to build an on demand "supercomputer" out of OpenWhisk on OpenShift on OpenStack in the MOC. Namely, given a task that is highly parallelizable (TBD which task), rather than spin up virtual machines or containers to solve the problem, we can instead use OpenWhisk/FaaS to have an on demand supercomputer. The goal would be to give a small portion of the work to each function, and spin up 1000s of workers to accomplish the job as quickly as possible. Such a model has huge benefits from a cost perspective in that you aren't paying for any overhead for running the job except for the actual execution time (no spin up or spin down cost). The problem has been solved with virtual machines many times but you get billed by the hour, min, or second with virtual machines and the time to start and stop the machines can make it cost prohibitive to spin up too many instances at once. Using containers is better, but using FaaS is really the ultimate solution. What's needed though is a framework and examples to show how this is possible and in an environment (OpenWhisk) which is generically applicable.
Some Technologies you will learn/use:
OpenWhisk (FaaS)
Containers/Docker/Kubernetes/OpenShift
Software Engineering (Agile, Scrum, Git, etc.)
OpenStack/MOC
Parallel Computing
Project Logistics:
Mentors: Sri Krishnamurthy email: s.krishnamurthy@neu.edu
Min-max team size: 4-6
Expected project hours per week (per team member): 6-8
Will the project be open source: no
Preferred Past Experience:
Job handling / concepts of distributed computing Required
Linux basics Required
Python, Mean stack, MongoDB Valuable
Docker and HPC scheduling Required
Web technologies for analytics Valuable
Project Overview:
Background: Researchers typically share their research through forums like https://arxiv.org/. In the past decade, there has been a growing interest in replicable research projects to address the so called replication and reproducibility crisis (https://en.wikipedia.org/wiki/Replication_crisis). In a recent Nature paper ( https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970 ), it was quoted that more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments. QuSandbox ( www.qusandbox.com ) is our effort to address the reproducibility problem by enabling creation of replicable data science environments. Our technology leverages Docker, many open source and proprietary technologies and the cloud to embed code and replicable data science environments to enable researchers, students and practitioners to create and share projects that can be replicated and shared.
Project Specifics: In this project, we aim to prototype an environment customized for researchers to be able to share their research. For example, think of a researcher creating a new algorithm for text analysis. Let us say a draft paper is published on arxiv and the software is available on github. Our goal is to create a platform where in addition to github, the researcher can share the code and the environment leveraging QuSandbox’s research cloud environment.
Goals of the project:
A web-based user interface through which a researcher can submit the github, environment specs and associated data dependencies.
Evaluate and prototype migration of the Amazon based environment to another cloud vendor (Openstack/Google cloud)
Evaluate and prototype using Kubernetes in addition to Docker.
Sample research pages with QuSandbox research cloud widgets embedded.
https://www.slideshare.net/QuantUniversity/machine-learning-applications-in-credit-risk
Some Technologies you will learn/use:
Working on challenges encountered by researchers and data science practitioners
Designing applications for the cloud
Using cutting edge technologies for cloud automation
Docker,Python and Distributed computing
Project Logistics:
Mentors: Jiri Schindler email: jiri.schindler@hpe.com; Peter Corbett email: peter.corbett@hpe.com;
Min-max team size: 3-5
Expected project hours per week (per team member): 6-8
Will the project be open source: no
Preferred Past Experience:
using Spark - Nice to have
Java development - Required
Scala development - Desirable, but not required if prior Java experience
Familiarity with provisioning and automation tools (Ansible, Vagrant) - Valuable
Mesosphere DC/OS or other Containers Infrastructure - Valuable
Project Overview:
Background:
The
ability to analyze and determine usage patterns across an entire
install base, spot anomalies for a specific production system or
predict future resource consumption for a customer deployment is an
integral part of any enterprise system in a production environment.
These analytics help administrators and IT personnel optimize
operations and system providers (manufacturers) deliver more
efficient and timely support.
The HPE SimpliVity
HyperConverged Infrastructure is an on-premises solution that
combines management of virtual machines (compute), storage with
application and hardware lifecycle operations (disaster recovery,
upgrades, system expansion). A SimpliVity solution consists of a
collection of high-end commodity servers with local storage that are
organized into clusters. Several clusters comprise a federation
spanning different geographies. An entire federation uses a single
management/operations console integrated into existing virtualization
infrastructure solutions (e.g., VMWare vSphere). For more details on
HPE SimpliVity see
https://www.hpe.com/us/en/integrated-systems/simplivity.html).
Each
system (node) deployed in a production environment sends periodically
detailed configuration, telemetry, and performance data to HPE where
it is processed and analyzed. The data is available for internal
consumption by various organizations (product planning, architecture,
performance engineering, support etc.). Since the acquisition of
SimpliVity by HPE in May 2016, we have embarked on a long-term
project to revamp our platform for data processing and analysis of
data we receive from tens of thousands of systems sold to-date to our
customers. There are several scrum teams working on this effort, and
in particular on the integration with existing HPE systems and
databases tracking assets, customers and other information.
Project Specifics:
The
focus of the project would be the development of new analyses and
data visualizations of the configuration and telemetry data. Some of
these may already exist as stand-alone purpose-built (one-of) tools
or scripts developed by support engineers. One of the specific
objectives would be to re-design and re-implement some of them in a
scalable fashion and make them an integral component of our analytics
platform: automatically pulling data from a common “data lake”
and making the results available to everyone authorized within HPE
for inspection.
Another project objective would be to
work with the mentors (principal investigators) on developing new
analyses such as unique data growth, compression and deduplication,
resource consumption predictors, impact of snapshots and backups on
overall storage consumption etc. The mentors will provide guidance in
selecting suitable machine learning and data analysis algorithms and
provide the necessary context. Therefore, background in machine
learning or statistical analysis is not required.
The
BU/NEU cloud computing course students would form a separate scrum
team. They would be focusing on specific goals and tasks with minimal
direct dependencies on other teams comprised of full time HPE
employees that are part of the broader effort and focus on
integration with other HPE systems including HPE InfoSight. The
student team of students would follow our agile development processes
with sprints and having their own backlog of stories. This separation
will allow the students to make progress on their own, and cleanly
delineate their work from that of the other teams. The students would
follow (to the extent that is practical) our practices for developing
production code (including writing unit tests and adopting continuous
integration flow with automated deployment of a new version of the
code onto a container infrastructure).
The overall effort
dictates the selection of technologies, tools, and languages; the
students would be expected to adopt them. We leverage open source
technologies and tools readily available to everyone without special
license, including, among others, Apache Spark, NiFi,
Mesos (Mesosphere DC/OS), and Zeppelin.
We also use CI tools like git, Jenkins, gradle, ansible etc.
The expectation is that the artifacts (code and algorithms) developed
by the student team would be directly leveraged within HPE for the
production data platform and stay with the company at the conclusion
of the project.
General information about the specific product and its features.
https://www.hpe.com/us/en/integrated-systems/simplivity.html
Overall effort within HPE that this project would be part of.
https://www.hpe.com/us/en/storage/infosight.html
Some Technologies you will learn/use:
Operating as a scrum team that is part of a larger effort
Development of (private) cloud big data platform applications.
Data analysis and presentation of real-life customer systems data.
Continuous integration and automation practices necessary for production/large scale deployments
Interactions with senior technical executives and other experienced engineers.
Automate the deployment and securing of cloud based simple & complex services and applications for self-service consumption in MOC using CONS3RT
Mentor: Peter Walsh (peter.walsh-at-jackpinetech-dot-com )
Min-max team size: 3-4
Expected project hours per week (per team member): 4-6
Will the project be open source? Yes
Scripting (PowerShell, Bash, Groovy, or any higher level language) (important)
Virtualization concepts (useful)
Web services (nice to have)
"If it is not automated, it is broken" - An axiom that could not be more true in the cloud and security world. Those involved in the design, delivery and use of cloud applications and services prove it to themselves everyday. The work we do consists of many pieces pulled together into an orchestrated dance. Security is often applied late in the cycle as an overhead task rather than being baked in throughout the lifecycle. You may do the same thing on Monday, Tuesday, Wednesday, but then a message comes in or the phone rings and you miss a step. And that is all that is needed for a vulnerability.
Automation is not just about speed; it delivers consistency, repeatability and comprehensive coverage, all of which are paramount to security. SecDevOps is a critical and growing specialty in industry.
CONS3RT gives users with a better way to deploy, validate, ands secure the infrastructure and systems they field in the cloud. By using a modular approach with a broad library of software and security components - application, utilities, data sets, lock downs, configurations, etc - users can automate the entire lifecycle of their environments. Users leverage this library of resources to automate the execution and securing anything from simple workstations to complex environments (e.g 50 or more servers). Every thing in the library can be shared and reused "as is" and/or used as the basis for something more.
Working with a set of experienced asset developers, the team will identify options for security elements for fielding systems that impact the the user community. After evaluating alternatives, the team will select a target components in the area of system and infrastructure security configuration, assessment and monitoring. Then the team will develop, integrate, and test the assets to make it real. The resulting assets and designs will be shared and used across the CONS3RT community in multiple sites, including the MOC.
CONS3RT is a leading-edge cloud and security orchestration service providing users with DevOps Automation and Validation. There are several tools that target cloud management and more that provide DevOps, but CONS3RT focuses on cloud usefulness and the strategic needs of DevOps so users can get things done! Users can leverage the Provisioning, Build and Test as a Service capabilities so their organization can achieve the highest levels of continuous integration, continuous delivery and stongest security.
Jackpine Technologies is a Boston area start-up focused on making cloud useful & secure. Our mission is to empower organizations to create exceptional software through innovative practices, technologies and products. We are small, nimble team with a great collaborative atmosphere. To date, all of our interns have stuck around after graduation so we must be doing something right. Students participating in this project will have access to some of the industry leaders in cloud DevOps.
SecureCloud orchestration
DevOps principles
Software development discipline
Application integration
Use of APIs
(you can ask questions at https://etherpad.openstack.org/p/dims-k8s-openstack-moc - it’s an “etherpad”, sort of like a google doc, and the mentor will monitor it for questions)
Mentor: Davanum Srinivas, Huawei (email: davanum AT gmail.com)
Min-max team size: 3-5
Expected project hours per week (per team member): 6-8
Will the project be open source? Yes
the Go language (Valuable)
Kubernetes is essentially a Container Orchestration Engine, it is able to coordinate running applications in containers across many machines. Kubernetes is very popular in public clouds and not so much in private clouds. Since OpenStack is the pre-eminent private cloud and Open Source as well, we have a unique opportunity to make the private OpenStack based cloud support be the very best to compete with the giants. So we will start with exploring the level of support for OpenStack in projects like kubernetes-anywhere (https://github.com/kubernetes/kubernetes-anywhere) , kops (https://github.com/kubernetes/kops), kubeadm (https://github.com/kubernetes/kubeadm). Then we kick the tires to see what works, what does not, find and fix problems / issues and basically ensure that things work.
We will define a set of scenarios that should work including things like exposing OpenStack Cinder volumes to Kubernetes applications, making sure Kubernetes can use OpenStack Neutron/Octavia based LBaaS etc. We will also setup CI/CD systems (possibly at http://openlabtesting.org/) and ensure that the Kubernetes end-to-end and conformance tests run as well to make sure that all the functionality in Kubernetes works well. Note that this is working across a stack OpenStack is at the bottom, Gophercloud talks to OpenStack, Terraform uses Gophercloud to deploy IaaS resources, kubeadm can setup kubernetes clusters, kops and kubernetes-anywhere use all of the above mentioned to give a really good user experience for deployers and operators. So your will be participating in a full eco-system, all your work will be in public repositories, under Apache License, Version 2.0 and you will be easily able to point your work to future employers. Not to mention, you work will be immediately used by a whole lot of people and is not just for demo. If this rocks your boat, let me know.
Some Technologies you will learn/use:
kubernetes - ee talks from the recent 2017 summit in Austin - https://www.youtube.com/playlist?list=PLj6h78yzYM2P-3-xqvmWaZbbI1sW-ulZb
openstack - see talks from recent 2017 summit in Sydney - https://www.youtube.com/playlist?list=PLvotnqPM0MjK9IRqdFVpKALX5ImTUZUsk
gophercloud - https://github.com/gophercloud/gophercloud - Go Library that talks to OpenStack API
terraform
https://www.terraform.io/ - high-level configuration language for infrastructure
https://github.com/terraform-providers/terraform-provider-openstack - OpenStack specific provider for Terraform
Project Logistics:
Mentors: Leon Zachery email: lzachery@cisco.com;
Min-max team size: 4-5
Expected project hours per week (per team member): 6-8
Will the project be open source: yes
Project Overview:
Connectivity to the cloud and between clouds is an evolving technology area. CI/CD (Continuous Integration/Continuous Deployment) is essential in ensuring robust operation of networking in production cloud environments. A judicious balance between evaluating functional correctness and run times makes the infrastructure more valuable. Often times the tests do not provide enough coverage and the pass results are not truly representative of the state of the change sets. In the other extreme, we may have multiple tests that do not add much value in terms of coverage and just result in longer run times. Evaluation of hardware based networking devices add additional constraints in setting up the deployment to accurately represent the performance benefits of physical devices as well.
Students can consider looking at mapping some formal verification techniques to adapt to construct representative optimal test sets or for the more pragmatic minded even focus on building tools using metrics such as code coverage to provide guidance on usefulness of additional tests. Some of the constraints with hardware devices can be modeled using the ASR1k based environment present at BU.
What the students will get:
- Good Knowledge of CI/CD immediately applicable to industry
Project Logistics:
Mentors: Leon Zachery email: lzachery@cisco.com;
Min-max team size: 4-5
Expected project hours per week (per team member): 6-8
Institution: BU or NEU
Project Overview:
Serverless Computing paradigms are useful for low latency and low footprint task execution. Augmented Reality and many mobile applications require low latency compute with minimal network response times. Many of the Edge network devices have significant compute capacities that can be leveraged for a quick turnaround. Low footprint functions are quite ideal to perform some quick compute without any compromise to network functionality. ASR1k platforms support running containers and we would like to prototype and investigate use cases that can enable edge compute with some applications. Any application can be used as a sample for an implementation in this environment with the intent to understand and drive for some generic infrastructure to support serverless environments on an ASR1k.
What the students will get out of this:
- Cloud Computing and Serverless workflows
- Containers and container orchestration.
- Depending on Applications - some understanding of those specific domains.
Project Logistics:
Mentors: Karl Redgate email: karl.redgate@gmail.com, karl.redgate@akamai.com (Akamai Security business unit)
Min-max team size: 3-6
Expected project hours per week (per team member): 6-8
Will the project be open source: yes
Preferred Past Experience:
Some networking knowledge (Rather important)
Linux/OSX command line (Valuable)
Node.js/JavaScript (Valuable)
Bash (Valuable)
Project Overview:
There are now several Content Distribution Networks on the Internet from service providers. Some companies are now rolling out their own CDNs. We will generate our own limited CDN with integrated Web Application Firewall with an API for provisioning a customer.
Some Technologies you will learn/use:
Understanding of how cloud services operate
Understanding of HTTP and cacheing
Experience with deploying services
Experience creating REST APIs
How to run a Scrum team
Mentor: Kristi Nikolla <knikolla@bu.edu>
Min-max team size: 3-6
Expected project hours per week (per team member): 6-8
Will the project be open source: yes
Python - important
Linux - important
OpenStack - nice to have
Experience with RESTful services - nice to have
OpenStack is a cloud operating system that manages compute, storage and network resources in a datacenter. Federating multiple OpenStack clouds allows users of one cloud to access resources from other clouds participating in the federation.
The goal of the project is to design and implement a service for orchestrating the management of user authorization in multiple OpenStack clouds that are federated together. This will require learning and interacting with various cloud services using restful api’s.
This would allow a user in one cloud to request a quota of resources in another cloud using a REST API interface.
Key cloud services used to manage public clouds and how to interact with them automatically (specifically OpenStack - the most popular open source cloud management platform and its services & Keystone - the Identity Service)
Identity Federation and Single Sign-On
Background: Drones are opening up entirely new paradigms for commercial businesses. These flying robots are quickly stepping in to do jobs that are dull, dirty and dangerous. The age of robotics is here, and today is like Internet 1995.
Project Specifics: In many applications a drone can do a better job than a human, and one of those is infrastructure inspection. Being new technology, there are no standards in place and the results of these inspections can vary depending on the skill and experience of the drone operator. A Rhode Island startup is working to change that. VertSpec is presently developing software that automates the inspection of Cell Phone Towers and Power lines. Their web-based system will program any drone to do a precision inspection that can be repeated over time.
Commercial drone operators can contract with VertSpec and download pre-program inspection missions then visit the inspection site and fly the mission. After the mission is complete, the operator needs to send back data to VertSpec to double check that the inspection is complete. VertSpec needs a way to extract data from the images taken during the inspection and send it back to their cloud service for verification. The data that needs to be extracted from the pictures are standard EXIF data, XMP (DJI Meta Data) and a thumbnail of each image.
The solution to be provided as part of this cloud computing class project is a low-bandwidth web-based tool to upload JSON Objects that include camera, XMP and EXIF data. The cloud service will process the data and return a result to the drone operator. The result will be either ‘Mission Complete’ or an instruction to re-fly all or part of the mission.
At the beginning of time, when dinosaurs roamed the earth, most applications were simple and monolithic: they ran on single machines. Think Minesweeper, Pacman, and the original Doom video game. To debug problems in these simple applications, it was natural to think in machine-centric terms: “What is going on in this machine that is affecting performance of my application?” Many standard techniques that we still use to debug applications today arose from this machine-centric mindset. Some examples include logging, adding printfs to code (also called caveman debugging), and Linux performance counters.
In the present day, applications have become immensely complex. Many are no longer monolithic. Rather, they are distributed: they run on a set of machines (physical or virtual) that coordinate with one another to perform important tasks. For example, a web-search application within Google may store potential search-query results within 10s of 1000s of individual application nodes and query all of them in response to user search requests. This web-search application may also depend on other distributed applications to perform its task (e.g., an advertisement-retrieval application or a distributed storage system, such as GFS or Ceph). It is also worth noting that these distributed applications have become critical to almost every aspect of modern society. We use them when searching (e.g., on Google), shopping (e.g., on Amazon), when watching videos (e.g., using Netflix), and when playing massive multi-player online games (e.g., playing Pokemon Go).
It is critical to provide tools and techniques to help developers understand behaviour of these complex applications and debug problems that arise within them. Looking at individual machines is no longer sufficient for two key reasons. First, knowing the performance characteristics or behaviour of any single node yields little insight about the performance or behaviour of the system as a whole. Second, some types of (potentially problematic) behaviour may be emergent---it may only be observable when the entire distributed system is analyzed as whole. As a step toward analyzing distributed systems as a whole, recent work has created workflow-centric or end-to-end tracing methods. Instead of focusing on individual machines, these techniques capture the flow of the work done to process individual requests within and among the components of a distributed system. Seeing the necessity of such tracing, many large technology companies are starting to adopt end-to-end tracing. Examples include Google with Dapper, Facebook with Canopy, and Uber with Jaeger.
There is a rich opportunity to build on end-to-end tracing to explore how to analyze distributed applications holistically. There are three key challenges. The first challenge is in instrumenting complex distributed applications with end-to-end tracing. Doing so requires building up expertise in the application of interest and building up intimate knowledge of how end-to-end tracing works. The second is in understanding what trace data must be preserved the analys(es) of interest (preserving all trace data is not possible due to scale/complexity). The third challenge lies in understanding how to analyze, visualize, or create models based on the trace data to provide the required insights. The projects suggested below all revolve around addressing these important and timely challenges.
Mentors: Raja Sambasivan (BU) and Peter Portante (Red Hat)
Useful skills:
Familiarity with the linux kernel
C programming language
Min-max team size: 2-4
Expected project hours per week (per team member): 6-8
Will the project be open source?: Yes
End-to-end tracing infrastructures capture activity only within applications. But, distributed applications running in data centers run atop of complex stacks comprised of multiple layers related to virtualization and networking. A problem observed within the application, such as excessively slow response times, may be caused by the issues within application itself (e.g., a poorly-written data structure implementation) or by issues within any layer below it. For example, two virtual machines (VMs or containers) co-located on the same physical machine may contend with each other for resources (e.g., CPU). This might lead to reduced performance for either or both of the applications running within the VMs. Also, recent fixes to lower stack layers (e.g., to the kernel to address Intel’s recent security vulnerabilities) may impact application performance. Extending end-to-end tracing across layers would further enhance its ability to provide insight into the behaviour of the system as a whole. It would give developers insight into these important cross-layer problems so that they can focus their diagnosis efforts where needed.
Project tasks: This project will involve extending end-to-end tracing to lower layers of the stack, specifically the linux kernel. As part of this effort, we will also trace TCP/IP packets sent between nodes of the distributed application. Our approach will be to convert existing logging mechanisms within the kernel to tracing, thus creating an independent end-to-end tracing mechanism within the kernel. We will then interface it with application-level tracing infrastructures (e.g., using Open-tracing-compatible APIs). We will use the resulting cross-layer traces to analyze where time is spent in the kernel on behalf of application-level requests. Time permitting, we will use our cross-layer end-to-end tracing mechanism to explore issues related to contention between containers co-located on the same machine.
How to implement and use end-to-end tracing, which is becoming the de facto method to analyze and debug distributed applications at large technology companies.
A thorough understanding of existing open-source end-to-end tracing infrastructures, Uber’s Jaeger and Mirantis’s OSProfiler.
How to apply machine-learning techniques to derive insights from rich trace data.
Depending on the project, familiarity with commonly-used distributed applications and microservices. Examples include Ceph and OpenStack.
Mentors: Mania Abdi, Raja Sambasivan, Peter Portante (Red Hat)
Useful skills:
C, C++, Python
Familiarity with distributed storage services (e.g., AWS, S3).
Min-max team size: 2-4
Expected project hours per week (per team member): 6-8
Will the project be open source?: Yes
Ceph is an extremely popular open-source distributed storage service used within many cloud environments. For example, we use it extensively within the Massachusetts Open Cloud to store data for our users. It is an immensely complex service consisting of over a million lines of code. A single ceph deployment may consist of numerous storage nodes and various libraries and gateways to support different types of storage (block, object, filesystem). While Ceph has blkin tracing today, it is not continuously enabled and is fairly rudimentary (it does not capture concurrency, synchronization, or allow critical paths to be extracted). As a step toward understanding important performance and correctness problems in this important service, there is a need to implement an “always-on,” sophisticated end-to-end tracing system within it.
Project tasks: In this project, we will expand on existing efforts to implement end-to-end tracing within Ceph. The goal will be to create an “always-on” tracing system with low overhead. We will drive Ceph with various workloads and use the resulting traces to understand Ceph’s performance characteristics under various scenarios. We will also use the traces to understand performance issues a team at BU working on Ceph has encountered.
Ceph
How to implement and use end-to-end tracing, which is becoming the de facto method to analyze and debug distributed applications at large technology companies.
A thorough understanding of existing open-source end-to-end tracing infrastructures, Uber’s Jaeger and Mirantis’s OSProfiler.
How to apply machine-learning techniques to derive insights from rich trace data.
Depending on the project, familiarity with commonly-used distributed applications and microservices. Examples include Ceph and OpenStack.
Mentors: Peter Portante (Red Hat), Raja Sambasivan
Useful skills:
Some familiarity with basic machine learning techniques
C, C++, or Python
Min-max team size: 2-4
Expected project hours per week (per team member): 6-8
Will the project be open source?: Yes
Capturing trace data has a cost associated with it. The traces require a certain amount of storage, computation to process them after capture, computation during capture, and network bandwidth to transfer trace-point data from its origin in a distributed application to the trace point store. The higher the frequency, or granularity, at which traces are captured, the higher the associated costs related to capture and storage of those traces. If one has too high a granularity, the distributed system might be negatively impacted by trace capture due to its resource cost, disrupting the behaviors being observed. If one has too low a granularity of trace capture, the traces run the risk of being ineffective or not representative of the system’s behavior, because those traces miss interesting behaviors. Because low granularity traces present little-to-no drain on resources for a distributed system, one can consider capturing such traces continuously without risk of impacting the system. Production systems can then run tracing continually at scale without worry of a performance impact.
Project tasks: Your task is to come up with 2 or 3 different approaches or techniques for increasing the effectiveness of low-granularity traces by mapping many low granularity traces to a much smaller set of high-granularity traces. With such a mapping, one can envision a system where we can determine when to engage a high-granularity trace without having adverse impact on the system while still capturing useful data. We have existing trace data that can be used to start your work. You will have to implement at least one of the approaches within an existing end-to-end tracing system (e.g., Jaeger).
Questions you’ll need to answer:
What trace points do you need to always leave on to do the mapping?
Does the technique break down across different systems or workload behaviors?
How well did the approaches or technique work?
What had to be changed in the approach or technique as it was applied to different applications?
How to implement and use end-to-end tracing, which is becoming the de facto method to analyze and debug distributed applications at large technology companies.
A thorough understanding of existing open-source end-to-end tracing infrastructures, Uber’s Jaeger and Mirantis’s OSProfiler.
How to apply machine-learning techniques to derive insights from rich trace data.
Depending on the project, familiarity with commonly-used distributed applications and microservices. Examples include Ceph and OpenStack.
Mentors: Peter Portante (Red Hat), Raja Sambasivan
Useful skills:
Some familiarity with general networking concepts, HTTP protocol
Helpful to have cursory knowledge of container subsystem concepts
Go, Python, Bash
Min-max team size: 2-4
Expected project hours per week (per team member): 6-8
Will the project be open source?: Yes
“Kubernetes is an open-source platform designed to automate deploying, scaling, and operating application containers” - in other words it is a container deployment orchestration system. Being an orchestration system, much of the operation of kubernetes is implemented in the “control-plane”, where Kube commands tell this control-plane how Kube should orchestrate its deployment tasks. There are parts of Kubernetes which provide “data-plane” features for the application, like ingress and services. As these data-plane features become part of the flow and behavior of the application, instrumenting them with trace points gives us a cross-layer trace enabling a deeper understanding of an applications behavior.
Project tasks: Your goal is to consider the data-plane sub-systems of Kubernetes and instrument one or more of them to implement cross-layer tracing for Kubernetes. There might be a use case for instrumenting the control-plane of Kubernetes with trace points where the application communicates with Kube to change its operation. We suggest focusing on the Jaeger tracing system for this project. We’ll provide suggestions for Kubernetes applications one can use for tracing.
Advanced options and stretch goals:
Kubernetes
How to implement and use end-to-end tracing, which is becoming the de facto method to analyze and debug distributed applications at large technology companies.
A thorough understanding of existing open-source end-to-end tracing infrastructures, Uber’s Jaeger and Mirantis’s OSProfiler.
How to apply machine-learning techniques to derive insights from rich trace data.
Depending on the project, familiarity with commonly-used distributed applications and microservices. Examples include Ceph and OpenStack.
Aria Tosca Parser and Cloud Orchestrator
Project Logistics:
Mentors: Thomas Nadeau email: thomasdnadeau@gmail.com;
Min-max team size: 2-4
Expected project hours per week (per team member): 6-8
Will the project be open source: yes
Preferred Past Experience:
Python coding - Required
Experience with Linux (i.e.: ubuntu, debian, etc...) - Required
Project Overview:
Background:
The AriaTosca (https://www.ariatosca.org) project has two main components: the Tosca language front-end compiler and the back-end orchestrator.
The front end can give students exposure to a world-class compiler as well as learn the TOSCA language (its a yaml-based language).
The back end is used to provision Kubernetes, OpenStack, Azure, AWS, etc… work loads. Code is in python.
Project Specifics:
The
AriaTosca (https://www.ariatosca.org) project has two main
components: the Tosca language front-end compiler and the back-end
orchestrator. The front end can give students exposure to
a world-class compiler as well as learn the TOSCA language (its a
yaml-based language). The back end is used to provision kubernetes,
open stack, azur, aws, etc… work loads. Code is in
python.
Aria is the basis for the very popular Cloudify
(cloudify.co) web provisioning system. The base Aria system can
be used to orchestrate/provision work loads via various cloud
orchestration and systems (i.e.: openstack, azur, aws, etc...) as
well as interfacing with other emerging ones such as
https://www.onap.org
We have a running podcast series describing the project and some of its components.
https://soundcloud.com/theopensourcepodcast/the-aria-tosca-podcast-episode-1
Check out additional resources on our twitter feed
https://twitter.com/AriaTosca
Aria Tosca is an open source project hosted by The Apache Foundation.
http://ariatosca.org/
Some Technologies you will learn/use:
Open source software process, governance and software development
TOSCA and YAML languages
Python
Cloud orchestration
Mentors: email: ; Billy Field email: billy.field@trilio.io;
Min-max team size: 2-4
Expected project hours per week (per team member): 6-8
Will the project be open source: yes
Preferred Past Experience:
More relevance with API calls, than OpenStack expertise. Valuable
Trilio platform is built on Python, though students will need to leverage APIs. Valuable
Linux fundamentals. Required
Security fundamentals Valuable
Scripting Valuable
Project Overview:
Background:
Security scans are integral process of any Commercial account, specifically the Financial Services industry. Security teams can either scan end-point devices or target data repositories. Trilio is a native OpenStack Cloud data protection software technology, that creates a snapshot of the production environment, making it easy to restore an entire workload/environment with a single click. Trilio exposes these snapshots to 3rd party applications so that organizations can use the solution for Security, BC/DR, and other solutions.
Project Specifics:
In this project students will understand how to leverage the Trilio APIs in order to integrate with OpenVAS (Open Source vulnerability scanning application). Students will also research other 3rd Security tools such as Nessus and Bandit.
Independent
assessment of Trilio
technology:
https://www.trilio.io/portfolio/esg-lab-review-trilio-vault-2017/
https://www.youtube.com/watch?v=b0aBde7CIHc
Trilio Content section
https://www.trilio.io/whitepaper/
Some Technologies you will learn/use:
OpenStack fundamentals.
Security fundamentals deployed in real-world.
Third party application integration into next generation data repositories.
Federal/Industry security and compliance requirements.
Data protection in a Cloud environment.
Mentors: Naved Ansari (naved001@bu.edu); Sahil Tikale (tikale.bu.edu)
Min-Max team size: 3-5
Expected project hours per week (per team member): 8 to 10
Will the project be open source?: yes
Python - required
Switches - Nice to have
Knowledge of networking softwares - Nice to have
REST API - Nice to have
Background:
A increasingly new category of cloud service is Hardware as a Service (HaaS), where users of a cloud can elastically acquire physical rather than virtual computers. Benefits include security (you don’t have to trust complicated virtualized stacks), performance, determinism (e.g., for performance experiments), and standing complex higher level services. The MOC has developed the Hardware Isolation Layer (HIL) both as the basis for a HaaS offering, and to allow physical machines to be moved between different services of the cloud.
HIL is a low-level tool that allows users to reserve physical machines and connecting them via isolated networks. It manages network switches to provide network isolation, and can control nodes using out-of-band management. A system administrator identifies to HIL information about the physical resources, such as the list of available machines, their network interfaces, and where (i.e. switch port) those interfaces are connected.
Project Specifics:
The goal of this project is to develop a service that can query the network switches on behalf of HIL users and ensure that HIL configuration is accurate and consistent with the actual state of switches. The two driving use cases for this auditing service are (1) to detect manual modifications of the switch configuration, and to allow external security auditing of the HIL service.
In addition, this service will also help with tracing, SLA maintenance of networks and
help us identify the network topology, eliminating the need of manually tracing network cables. It gives network engineers visibility into the IP, MAC, VLAN, status and availability of ports, and could become a valuable tool for system administrators and network engineers to troubleshoot networking in a data-center.
Some technologies you will learn/use:
Ethernet switch management
Simple Network Management Protocol (SNMP)
Address Resolution Protocol (ARP)
ping (ICMP)
Nmap (Network Mapper)
Any other switch specific tools
NOTE: There are two separate projects listed in this document. We are looking for different teams of students for each one.
Linux shell scripting
Python (nice to have)
RESTful interfaces
System deployment technologies (e.g., containers) – Project 1
WebApp/GUI development (nice to have) – Project 1
Encrypted storage systems (e.g., LUKS) – Project 2
Secure networking (e.g,. IPsec) – Project 2
HIL (Hardware Isolation Layer) for hardware isolation (network and node isolation)
BMI (Bare Metal Imaging) for the provisioning of nodes
Keylime for TPM-based attestation of the nodes
System provisioning in a bare-metal cloud environment
Today, user’s of the cloud need to trust the provider not only to not be malicious, but to not have bugs in the enormously complex software they deploy for virtualization and to manage their clouds. This imposes major barriers for security sensitive customers (e.g., military, medical, financial) as well as many open source developers that don’t want to trust the large cloud providers. Is there a way that we can get the elasticity of clouds while limiting our trust in the provider?
Bolted is a specification for a more secure bare-metal provisioning implementation. The system is designed to be flexible, allowing customers to choose their own desired level of security for their cloud environment (as dictated by needed speed-security tradeoffs) as well as what degree of trust they are placing in the provider.
Currently, bare-metal cloud services do not offer clients the ability to be certain that their nodes are in a trusted state, and cannot guarantee that previous tenants of that hardware have not tampered with the underlying hardware or firmware. Existing systems also do not ensure that nodes are properly sanitized after use, ensuring information cannot be exfiltrated from the nodes by a future tenant. The Bolted design is meant to help address these issues for bare-metal offerings.
A general overview of the Bolted architecture and node life-cycle can be seen in Figure 1. The blue arrows denote state changes in the system, orange arrows represent requests made to the Bolted services and green arrows are actions taken by each service as a result of those requests.
Figure 1 Bolted Architecture: Blue arrows are state changes, orange are requests to services and green are actions taken by each service
Here, the Isolation Service allocates nodes to tenants and configures network routers to isolate nodes from other tenants. The Attestation Service ensures that the code running on the node is correct and trusted (ranging from the firmware code through the OS-level code and even applications running in the OS). The Provisioning Service installs software on the nodes (ranging from bootloaders to the full OS of the node). The Orchestration Service is designed to tie all of these services together so tenants do not need to communicate with the other services directly.
Mentors: Charles Munson, Apoorve Mohan, Naved Ansari
Linux shell scripting (valuable, not hard to learn)
Python (nice to have)
RESTful interfaces (nice to have)
System deployment technologies (e.g., containers) – nice to have
WebApp/GUI development (nice to have)
The goal of this project is to make the Bolted system more user-friendly to deploy. Right now, the Orchestration Service (as seen in Figure 1) provides a set of scripts that a tenant can run in order to bring up, attest and provision their nodes. Although this is more convenient than having to do all of this manually, there is still a lot of room for improvement by offering an integrated experience that more easily allows tenants to perform these actions.
The team will explore different technologies for automating the deployment of the services (e.g., using containers), determine what solution to use, and deploy it for all the different micro-services that make up Bolted. In addition, given time, a user interface will be developed for this project that will allow tenants to bring up and provision their nodes without the need for executing these underlying shell scripts themselves.
HIL (Hardware Isolation Layer) for hardware isolation (network and node isolation)
BMI (Bare Metal Imaging) for the provisioning of nodes
Keylime for TPM-based attestation of the nodes
System provisioning in a bare-metal cloud environment
System deployment technologies (e.g. containers)
The goal of this project is to make the Bolted system more convenient to set up network and disk encryption for, enabling automatic disk and network encryption (via LUKS and IPsec, respectively).
The Attestation Service is provided primarily by the Keylime software system, which allows the tenant to be sure their node is in a good state before provisioning, and then periodically checks the tenant’s provisioned nodes to ensure they stay in a good state. Currently, if a node ever fails attestation, a notification is sent out to the tenant of the failure and a user-supplied script may be executed (though no other action is taken to prevent the compromised node from causing issues with the tenant’s set of nodes).
For the first part of this project, the team will implement IPsec, which will not only allow nodes to securely communicate with one another in the tenant’s enclave, but will also allow IPsec keys to be automatically revoked upon attestation failure (helping to prevent the affected node from causing damage to other nodes in the system).
Another aspect of this project is provisioning, which is handled by the Provisioning Service, and is handled primarily by the Bare Metal Imaging (BMI) system. Currently, BMI stores and sends unencrypted software images to nodes during provisioning, which means it may be possible for people with access to the internal hardware to see these software images (which could potentially contain sensitive information). Note that attestation ensures these images have not been tampered with.
The second part of this project will be focusing on encrypting these provisioning images (via LUKS) in BMI and then decrypting them when they are provisioned on the node.
Background: International Cloud Transactions with medical data with understanding of security and privacy implications. Utilizing an exciting new Connect Medical Care framework.
Project Specifics: Research & Develop concepts to addressing secure international cloud transactions of medical data from patients. The data as a stream has been “de-identified” from individuals. It's at the device level, monitoring data, and part of a larger customer owned collective looking a patient care trends.
Additional areas: Encrypted big data and large data object analytics with decentralized data staying with data owner, privacy controls and understanding of recent European data rights, technical use cases with infrastructure models.