NEWS AT SEI
This article was originally published in News at SEI on: May 1, 2008
Yochai Benkler, in his book The Wealth of Networks, puts forth a provocative argument: that we are in the midst of a radical transformation in how we create our information environment. This change is at the heart of the open-source software (OSS) movement but OSS is only one example of how society is restructuring around new models of production and consumption of services. The aspect that is most startling “is the rise of effective, large-scale cooperative efforts—peer production of information, knowledge, and culture ... . We are beginning to see the expansion of this model not only to our core software platforms, but beyond them into every domain of information and cultural production” [Benkler 06]. The networked information environment has dramatically transformed the marketplace, creating new modes and opportunities for how we make and exchange information. “Crowdsourcing” is now used for creation in the arts, in basic research, and in retail business [Howe 06]. These changes have been society-transforming.
So what is the place of architecture in a crowdsourced world? Crowdsourced systems are created via commons-based peer production [Benkler 06]. A “commons” is the opposite of property; the term refers to a set of shared, accessible community resources. Peer production is production that harnesses the creative energies of many self-selecting participants without any financial compensation and lacking a formal managerial structure. The importance of this form of production is undeniable: according to Alexa.com, five of the 10 most popular websites in the world are produced this way (MySpace, YouTube, Facebook, Wikipedia, and Blogger), and with the exception of Wikipedia, all are for-profit enterprises.
There are a number of characteristics of crowdsourced systems—observed in the SEI ultra-large-scale (ULS) systems report [Northrop 06] and in our own surveys of websites and OSS projects—that challenge existing models of system development. Software engineering has long embraced a centralized production model, where requirements are collected and negotiated, projects are managed, architectures are created, and correctness is determined in a controlled, planned process. It is hierarchical and rule-oriented, not commons-based or egalitarian. Even Agile methods are centralized, stressing the importance of face-to-face communication and the advantages of the bullpen—a single open office where workers freely interact.
Crowdsourced systems, however, are community driven and de-centralized with little overall control [Mockus 02]. Consequently we can no longer design and implement such systems using older models. If systems are constantly in a state of “perpetual beta” [O’Reilly 05], if they derive value from being constantly updated and combined in novel ways, and if their value is in their comprehensiveness and ubiquity, then the new model must reflect this. Examples of fundamental shifts in the logic for system development are:
- Open teams: assumptions of a closed team of developers who work from a consistent set of requirements must be abandoned. “Based on our usual assumptions about volunteer projects and decentralized production processes that have no managers, [Linux] was a model that could not succeed. But it did,” Benkler observes [Benkler 06] Even in the for-profit world, the assumption of closed teams is outmoded: as Peter Drucker observed a decade ago, managers must lead and motivate knowledge workers as though they were unpaid volunteers, including them in strategic direction and governance [Drucker 98].
- Mashability: Enormous effort traditionally goes into making systems that are difficult to tear apart, for historical, intellectual property and security reasons. However, mashability is as a core capability of crowdsourced systems. Web browsers make it simple to view source any page’s source, and it is accepted practice to use parts of existing websites in new creations. For example, Google Maps, prior to making its APIs public, was already used by others in their mashups.
- Conflicting, unknowable requirements: While iterative life cycles accept that requirements will change, they still operate under the assumption that, in any given iteration, a team can collect and analyze those requirements. However requirements in a peer-produced system emerge from its individuals, operating independently.
- Continuous evolution: As a consequence of having constantly changing requirements and non-centralized resources, a peer-produced system is never done, and hence it is never stable. The term “perpetual beta” was coined to describe this new phenomenon. One can not conceive of its functionality in terms of releases any more than a city has a release. Parts are being created, modified, and torn down at all times. In other words, we must accept change as a constant. Wikipedia entries, Facebook or Orkut applications change from day to day; OSS projects employ a continuous build process [Mockus 02].
- Focus on operations: Historically lifecycle models have focused on development and maintenance as the activities of interest. However much of the value of peer-produced systems is that they are as reliable and accessible as public utilities. Clearly Google, eBay, Amazon and other popular websites have taken this lesson seriously.
- Sufficient correctness: Completeness, consistency, and correctness are goals that are, to varying degrees, anathema to peer produced systems. For example, collaborative tagging—while enormously valuable for the semantic web—does not depend upon consistency among the taggers. Wikipedia never claims to be complete or even fully correct. Similarly “perpetual beta” is an admission and acceptance of ongoing incompleteness in software [O’Reilly 05].
- Unstable resources: Applications that are peer-produced are subject to the whims of the peers. Resources—people, computation, information, and connectivity—come and go. Mockus et al, describing OSS development, noted that such systems “are built by potentially large numbers of volunteers … . Work is not assigned; people undertake the work they choose to undertake.” [Mockus 02]. However, large numbers tend to ameliorate the whims of any individual or individual resource.
- Emergent behaviors: large-scale systems—computational and biological—exhibit emergent behaviors. This has been noted in traffic patterns, epidemics, computer viruses, and systems of systems [Fisher 06]. Certainly large-scale, web-based applications such as Second Life, eBay, and MySpace have seen complex behaviors emerge that were beyond the vision and intent of their creators, such as Second Life’s tax revolt and the recent eBay seller boycott.
Clearly this new environment requires a new logic for development. To better understand this logic, we distinguish three realms of a crowdsourced project and some example roles within each realm: kernel (architects, business owners, policy makers), periphery (developers, producers/consumers), and masses (customers, end users):
Figure 1: The realms of crowdsourced projects
As the figure indicates, there may be differences in the permeability between the realms. For example, in OSS it is possible to move from the role of an end user to a developer to a kernel architect. In social networking it is effectively impossible for a prosumer to become part of the kernel. Given this model, what is the role of architecture?
The architecture must be divided into a kernel infrastructure and a set of peripheral services, and these are created by different communities using different processes. Kernel services—like the kernels of Linux and Perl, the Apache Core, Wikipedia’s wiki, or Facebook’s application platform—are designed and implemented by a select set of highly experienced and motivated developers who are themselves intense users of the product. These kernel services provide a platform on which subsequent development is based (like Linux’s kernel), a set of zoning rules (like the Internet’s communication protocols), or both (like Facebook’s application platform). The kernel must be highly modular; this allows a project to scale as its community grows while allowing an original visionary developer or team to retain intellectual control [Northrop 06]. The kernel provides the means to achieve and monitor quality attributes such as performance, security, and availability. The design of the periphery is enabled by and constrained by the kernel, using its services and complying with its protocols; but the periphery is otherwise unspecified. This lack of specification permits the unbridled growth and parallel creation at the periphery.
Similarly,requirements must be bifurcated into:
- kernel-service requirements that, in and of themselves, deliver little or no end-user value (Linux’s kernel, Wikipedia’s wiki, Facebook’s open platform, BitTorrent’s P2P network)
- periphery requirements that are contributed by the peer network (the prosumers) which deliver the majority of the end-user value: YouTube videos, Wikipedia entries, Firefox add-ons, Facebook applications.
The nature of the requirements in these two categories is different: kernel service requirements are about quality attributes and their tradeoffs while periphery requirements are about end-user perceivable functions.
Finally, implementation is also bifurcated: the vast majority of implementation is crowdsourced but the crowdsourcing model applies only to the periphery. A distinct group needs to implement the kernel and this group will be close-knit and highly motivated. As Mockus has noted of OSS projects: “developers are working only on things for which they have a real passion” [Mockus 02]. The periphery will develop at its own pace, to its own standards, using its own tools, releasing code as it pleases.
What are the implications of this model on software development? For some projects, there are no implications. Not all projects will take advantage of crowdsourcing. Some projects will be deemed high security, or highly proprietary, or simply have too much legacy to take advantage of this model. However, there is an increasingly important class of projects for which this model applies. And we need to understand, plan for, and analyze the architectures of such systems. In those projects the kernel architecture must be built by a small, experienced, motivated team that focuses on modularity, core services, and core quality attributes to enable the parallel activities of the periphery.
Benkler, Y. The Wealth of Networks: How Social Production Transforms Markets and Freedom. New Haven, CT: Yale University Press, 2006.
Drucker, P. “Management’s New Paradigms.” Forbes, Oct. 5, 1998, 152-177.
Fisher, D. An Emergent Perspective on Interoperation in Systems of Systems (CMU/SEI-2006-TR-003). Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon University, 2006.
Howe, J. “The Rise of Crowdsourcing.” Wired 14 (June 6, 2006).
Mockus, A.; Fielding, R.; & Herbsleb, J. “Two Case Studies of Open Source Software Development: Apache and Mozilla.” ACM Transactions on Software Engineering and Methodology 11, 3 (July 2002): 309-346.
Northrop, L.; Feiler, P.; Gabriel, R.; Goodenough, J.; Linger, R.; Longstaff, T.; Kazman, R.; Klein, M.; Schmidt, D.; Sullivan, K.; & Wallnau, K. Ultra-Large-Scale Systems: The Software Challenge of the Future. Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon University, 2006.
O’Reilly, T. “What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software.”
About the Author
Rick Kazman is a senior member of the technical staff at the SEI, where he is a technical lead in the Architecture Tradeoff Analysis Initiative. He is also an adjunct professor at the Universities of Waterloo and Toronto. His primary research interests within software engineering are software architecture, design tools, and software visualization. He is the author of more than 50 papers and co-author of several books, including the book titled Software Architecture in Practice. Kazman received a BA and MMath from the University of Waterloo, an MA from York University, and a PhD from Carnegie Mellon University.
The views expressed in this article are the author's only and do not represent directly or imply any official position or view of the Software Engineering Institute or Carnegie Mellon University. This article is intended to stimulate further discussion about this topic.