search menu icon-carat-right cmu-wordmark

A Tool Set to Support Big Data Systems Acquisition

Acquiring and developing big data systems is difficult. We developed an approach that reduces risk and simplifies the selection and acquisition of big data technologies.

The Challenges of Big Data

The challenges of big data are daunting. Military operations, intelligence analysis, logistics, and health care all represent big data applications that are experiencing data that is growing at exponential rates. These data applications need scalable software solutions that can sustain future operations. For example, the Military Health System operates a database with more than 100 application interfaces that is growing at petascale while providing care for more than 9.7 million people.

With requirements for systems of ever-increasing scale and complexity, the DoD urgently needs to advance its acquisition practices. Without powerful decision-support technologies for these practices, failures like the initial Integrated Electronic Health Record system and the 2,000% cost increases in Defense Health Agency systems noted in a 2014 GAO report on major automated information systems will become common.

Acquiring big data systems presents special difficulties. Complex, rapidly evolving, and non-standardized technologies are built on radically different data models. A development organization must choose a technology early in the architecture design process, and that technology then constrains the design.

Selecting a big data storage and processing technology that best supports your mission needs for timely development, cost-effective delivery, and future growth is not easy. Using these new technologies to design and construct a massively scalable big data system is an immense challenge for software architects and program managers alike.

A Risk-Reduction Approach to Develop Systems That Manage Big Data

We developed an approach to help the DoD and other enterprises develop and evolve systems to manage big data. The approach, known as Lightweight Evaluation and Architecture Prototyping for Big Data (LEAP4BD), helps organizations reduce risk and simplify the selection and acquisition of big data technologies.

Our approach is based on principles from proven architecture and technology analysis and evaluation techniques such as the T-Check and the Architecture Tradeoff Analysis Method. LEAP4BD customizes these techniques to focus on architectural and database technology issues most pertinent to big data systems.

Working with an organization’s key business and technical stakeholders, we follow four steps:

  1. Assess the organization's existing and future data landscape.
  2. Identify the architecturally significant requirements for the system, and develop decision criteria.
  3. Evaluate candidate technologies against quality attribute decision criteria.
  4. Validate architecture decisions and technology selections through focused prototyping and measurement.

LEAP4BD is a rigorous method that organizations can use to design enduring big data systems that scale and evolve to meet their long-term requirements. Key benefits include

  • reducing the burden of justification for investments to build, deploy, and operate the application
  • ensuring that the application satisfies its quality attribute requirements, reducing development risks by increasing confidence in architecture design and database technology selection
  • identifying project risks to mitigate in design and implementation, detailed mitigation strategies, and measures for continual assessment

Decision Support for Big Data System Acquisition

Software engineers can choose from a dizzying array of off-the-shelf components for building big data systems. Highly distributed, scalable NoSQL databases have emerged in this space, but their use requires making tradeoffs among quality attributes (e.g., consistency vs. availability).

We developed a tool called Quality at Scale for Big Data, or QuABaseBD (pronounced kay-base-bee-dee). QuABaseBD is a knowledge base that links computer science and software architecture principles to the implementation details needed for six big data technologies. It helps practitioners improve their competency in acquiring and developing big data systems.

QuABaseBD provides decision support for architects who are selecting NoSQL products. It is an interactive assistant that provides reasoning support for aspects that include general quality attributes (expressed as scenarios), architecture approaches and tactics, and features implemented by concrete NoSQL products that realize those tactics. You will find QuABaseBD useful if you ask questions such as

  • What features are available in this database?
  • Which database is best suited for satisfying my data modeling requirements?
  • How do I best use databases that support only eventual consistency to build my application and achieve the data consistency I require?
  • Are the database and design approaches proposed in these responses to an RFP compatible with my application?
  • Which databases are designed for building applications with very high availability?
  • What software design approaches should I consider to make my applications scalable?
  • Which databases are best suited to support each design approach?
  • What are the trade-offs of each design approach?

Looking Ahead: Speed Up and Scale Up with Machine Learning

QuABaseBD is a significant step toward improved acquisition of big data systems, but the current manual approach to populating QuABaseBD cannot match the pace of the rapid evolution of technology. Machine-learning methods can help solve this limitation by mining the online documentation of big data technology platforms and automating the population of the knowledge base.

Using advanced machine-learning methods, such as concept graph learning (developed at the CMU Language Technology Institute) to automatically update QuABaseBD content will speed the population of data in the knowledge base. Machine learning can enable QuABaseBD to be updated rapidly and allow it to reflect the characteristics of new and evolving implementation technologies.

This machine-learning approach can enable dynamic and up-to-date decision support for DoD acquisition of the next generation of scalable big data systems. For example, QuABaseBD could be queried to enhance proposal evaluation and provide confidence that proposed technology solutions are appropriate choices that support high-priority quality attributes. This approach could also apply to building acquisition decision-support knowledge bases in domains other than big data systems.

Learn More

Machine Learning for Big Data System Acquisition Poster (SEI 2015 Research Review)

Machine Learning for Big Data System Acquisition Poster (SEI 2015 Research Review)

October 22, 2015 Poster
John Klein

Tackles the question, "Can we automatically identify relevant document pages that contain the knowledge required for a curator to populate the knowledge base?"

Machine Learning for Big Data Systems Acquisition

Machine Learning for Big Data Systems Acquisition

October 16, 2015 Presentation
John Klein

Tackles the question, "Can we automatically identify relevant document pages that contain the knowledge required for a curator to populate the knowledge base?"

Architecture Knowledge for Evaluating Scalable Databases

Architecture Knowledge for Evaluating Scalable Databases

May 08, 2015 Conference Paper
Ian GortonJohn KleinAlbert Nurgaliev (Carnegie Mellon University)

This paper presents a feature taxonomy that enables comparison and evaluation of distributed database platforms and demonstrates it with nine database technologies.

Design Assistant for NoSQL Technology Selection

Design Assistant for NoSQL Technology Selection

May 06, 2015 Conference Paper
John KleinIan Gorton

This paper presents a knowledge model, its implementation in a semantic platform, and a populated knowledge base for big data system architects choosing a NoSQL database.