Scalable "big data" systems are significant long-term investments that must scale to handle ever-increasing data volumes, and therefore represent high-risk applications in which the software and data architecture are fundamental components of ensuring success.
This one-day course is designed for architects and technical stakeholders such as product managers, development managers, and systems engineers involved in the development of big data applications. It focuses on the relationship among application software, data models, and deployment architectures, and how specific technology selection relates to all of these. While we touch briefly on data analytics, the course focuses on the distributed data storage and access infrastructure, and the architecture tradeoffs needed to achieve scalability, consistency, availability, and performance. We illustrate these architecture principles with examples from selected NoSQL product implementations.
Who should attend?
- Technical stakeholders involved in the development of big-data applications
- Product managers, development managers, and systems engineers
- The major elements of big data software architectures
- The different types and major features of NoSQL databases
- Patterns for designing data models that support high performance and scalability
- Distributed data processing frameworks
At the completion of the course, learners will understand:
- What "big data" is, how and why it has evolved, and the technologies that have emerged to address its complexities in the realm of computer science and software engineering
- The basics of distributed systems, including durability, transactional consistency, and replica consistency
- The quality attributes important in distributed systems and how they are achieved in practice
- Specific technologies, such as NoSQL and NewSQL databases
- Data modeling and the common types of data that need modeling
- Performance considerations in data modeling
- Distributed data processing frameworks employed in big data systems, such as Hadoop and its associated HDFS file system, which support downstream activities
- The newly emerging distributed data processing frameworks
- Distributed computations with Spark
- Stream processing with Storm
- Architectural issues present when building big data systems
- Big data system design tactics
- Software engineering heuristics to achieve effective, reliable, and scalable software systems
This course has no prerequisites.
Students will receive the complete set of slides and recommendations for related papers and reference materials.
This one day course meets at the following times:
8:30 a.m.-4:30 p.m.
Training courses provided by the SEI are not academic courses for academic credit toward a degree. Any certificates provided are evidence of the completion of the courses and are not official academic credentials.