Software Engineering Institute | Carnegie Mellon University
Software Engineering Institute | Carnegie Mellon University

Implementation Framework – Collection Management


The Software Engineering Institute (SEI) Emerging Technology Center at Carnegie Mellon University studied the state of cyber intelligence across government, industry, and academia to advance the analytical capabilities of organizations by using best practices to implement solutions for shared challenges. The study, known as the Cyber Intelligence Tradecraft Project (CITP), defined cyber intelligence as the acquisition and analysis of information to identify, track, and predict cyber capabilities, intentions, and activities to offer courses of action that enhance decision making.

One of the most prevalent challenges observed in the Cyber Intelligence Tradecraft Project (CITP) was that organizations were inundated with data. One company noted that they subscribed to multiple feeds from third party intelligence vendors, only to discover that the vendors largely reported the same information. Further compounding the problem was that the same information would be reported from multiple government sources a couple days later. Other organizations collected information without necessarily understanding why or what to do with it; an organization surveyed in CITP was capturing passive domain name system (DNS) information, but deleted it before any analysis was performed on the data because it took up too much storage space.

These examples highlight the necessity for a process to organize and manage data gathering efforts. Knowing where to get information and what type of information is needed ensures that key stakeholders, most importantly leadership, are provided the information needed to enhance their decision making. In government and military intelligence organizations, this process is called collection management. Throughout the course of the CITP research, it became apparent that commercial organizations were employing some facets of collection management into their cyber intelligence processes. Most borrowed the government terminology and identified it as collection management, although some did use variations of this term (prioritization management or requirements management). For the purposes of this framework, we will use the term collection management, and define it as the process of overseeing data gathering efforts to satisfy the intelligence requirements of key stakeholders.

Three core aspects lie at the heart of the collection management process: a requirement, the actual data gathering, and analysis of the data to answer the requirement. As organizations become more advanced in their implementation of this process, more factors need to be considered. This implementation framework divides these factors into three levels: basic, established, and advanced. Not every organization needs to employ an advanced collection management process – it will be up to individual organizations to assess what their needs are and pick the right mix of factors for themselves. What this framework provides is a method for adding additional rigor to the three facets of collection management through the combination of best practices in use by CITP participants.


Download this chart of the Collection Management Implementation Framework


Basic Collection Management:

Organizations looking to operate a basic collection management program should develop the three core pillars of collection management: requirements, collection operations, and analysis.

Collection Management in Practice
A CITP participant in the IT sector assigns one senior analyst to run collection management specifically for their open source intelligence. The analyst looks for duplication of data, tracks sources to ensure they are still relevant, and feeds data to the various business units (such as product development) based on each unit'secific needs.

At the basic level, a requirements process requires the organization to have a mechanism in place to collect and document the intelligence requirements of its stakeholders. Traditional stakeholders include leadership, business units or operational units, network security, and intelligence analysts. The mechanism to collect these requirements can be formal or informal (e-mail submission or captured in weekly meetings), but stakeholders need to know how to request information.

Collection operations is the process of taking requirements and marrying them up to available data. For a basic collection operations process, the key is for the organization to understand its access to data sources and the location of their knowledge gaps. Acknowledging the existence of these gaps and understanding where they are is as important as knowing what data can be obtained internally and ensuring the most relevant data is being made available for analysis.

Analysis and reporting is the process of taking the collected data and turning it into intelligence by fusing sources, adding context, and providing analysis. The resulting report aids the decision making process for leadership and other stakeholders.

Indicators of Success

  • Stakeholders in the organization know who to go to for intelligence needs and how to submit their requests
  • The organization understands what data is available and that some requirements cannot be met using existing sources
  • The organization has identified duplicative or irrelevant data sources.

Established Collection Management:

Organizations that operate a more sophisticated collection management process build off the capabilities established in the basic process. Adding additional rigor to the requirements process and assessing the quality of the source data facilitates answering the right questions with the best available data.

Collection Management in Practice
A CITP participant in the financial sector works closely with leadership to help determine priorities. The CIO often lets the Threat Intelligence branch know what keeps him up at night, and the Threat Intelligence branch has worked closely with senior officers in risk management, drawing out their questions, to formalize intelligence requirements.

As stakeholders learn where to go for their intelligence needs, the likely outgrowth is an abundance of requests that can overwhelm the intelligence analysis and reporting capacity of the organization. Established collection management programs realize that not all requests are created equal; some requests have a higher priority than others. At this intermediate stage of development, the first step is to separate and prioritize leadership'squirements from the rest of the requirements. Depending on the organization, the definition of "leadership" will fluctuate, from C-level officers to directors of security, business unit management, or directors of federal agencies. The key is to identify what level of leadership receives this priority status and designate their requests as Priority Intelligence Requirements (PIRs). PIRs should be evaluated for two factors: what does leadership need and when do they need the information. What does leadership need entails identifying the information that leadership requires in order to support critical decision making. Once captured, it is important to determine when this information is needed. Are these onetime requests from leadership? Are they standing requirements? Is the requirement so critical that once there is data to support an answer, it must be reported urgently? Armed with the knowledge of what information is required and when it is needed, an overall priority can be assigned to the PIR vis-à-vis the other PIRs in the system. This priority aids in determining what resources are needed in the collection operations process.

Evolving collection operations from the basic level requires assessing and managing the previously identified data sources. Collection managers at this level should ensure there are enough data sources to provide redundancy (if needed) and corroboration of data. This can be accomplished through a simple matrix that maps PIRs to the sources of available data. The matrix is an easy way to visualize if there are sources available to answer the PIRs and if there are multiple sources that, when correlated, provide a higher level of confidence in the data. The matrix also shows gaps in sources. These gaps can be outsourced to third party intelligence service providers or used as justification to field new hardware, personnel, or tools to collect the information. When complete, the matrix will show internal data sources, open sources, and third party sources.

The internal data sources mostly will be network-based, such as intrusion detection system (IDS) logs, proxy logs, host-based logs, netflow, etc. The key considerations for the network-based data are affordability, availability, and ensuring the data being collected is valid. Organizations should ensure that the cost of collecting the data is justified. Similarly, if additional network insight is required to answer a PIR, the same calculus can be applied to justify the purchase. If a sensor is in place, it is important to monitor its availability. Planning for system maintenance, downtimes, upgrades, and unscheduled outages will help collection managers prepare for periods when information will not be available and other sources may be needed. Lastly, the collection manager should validate the reliability of the system and ensure that it is collecting the right data.

Open source and third party data fill out the data source matrix. The key concepts to consider here are if outsourcing is affordable and makes sense. Is the information that is being outsourced to a third party also freely available in open source? If so, does the organization have the personnel to collect and analyze it?

Advancing analysis and reporting from the basic level simply requires the analyst to identify if the requirements are being met with the data that is being collected. In many cases, the requirements are partially answered, so additional or continued collection is necessary. In other cases, the data provided did not address the PIRs and so the collection manager has to re-evaluate which sources of data are best suited to answer the PIRs. When requirements are met, then the task can be closed out. Ideally, organizations will track which PIRs have been closed so that if the issue re-emerges in the future, the organization has data already available and knows where to get new information if required.

Indicators of Success

  • The organization tracks satisfied requirements so data/sources can be re-used if the same issue arises again
  • The organization maintains an updated matrix depicting sources of data from internal, open source, and third party sources
  • The organization can validate and justify the costs of third party intelligence providers/consortiums
  • The organization maintains a prioritized list of leadership's intelligence requirements

Advanced Collection Management:

Organizations looking to field advanced collection management processes will invest additional resources in managing data sources and prioritizing intelligence requirements from multiple stakeholders, including analysts themselves.

With PIRs from leadership defined and prioritized, the next step is to incorporate the needs of other stakeholders. One requirement commonly overlooked in the commercial sector was the analysts. Organizations operating a more sophisticated collection management process must take into account the PIRs of the intelligence analysts, and apply the same standards of organization and prioritization as PIRs from other stakeholders. Analysts must assess the priority of their requirements, as well as the urgency with which it is needed. With PIRs being generated and prioritized for all the organization's stakeholders, it is important to validate the requirements on a regular basis. If the requirement is still valid, the priority may have changed over time. Validating the requirements helps ensure that the data collection is focused on the most relevant needs of the stakeholders.

Collection Management in Practice
A CITP participant in the IT sector aligns its analysts' intelligence requirements with the company's five year strategic plan. This allows them to stay proactive and relevant to the business units as the company's goals evolve.

The other process that evolves is collection operations. The first big step from the previous capability is validating third party collection and evaluating their provided data. Information coming from third party providers usually can be tied to a discrete collection capability that can be validated to confirm it has the right level of access and assess its history of accurate reporting. This ties closely to evaluating the data that comes from third party providers. Just as with an organization's own data collection, the usefulness of the third party data needs to be evaluated, which helps justify the costs to obtain it. Lastly, organizations should determine if the third party source can be directly controlled. If so, it can be integrated into the organization's own collection matrix and adjusted as needed.

Organizations adopting an advanced collection management process must look at data sources beyond network data. These sources of information include data from business units, physical security, or even human sources. As with all other data gathering, the sources need to be validated and the information needs to be evaluated to ensure that accurate and actionable information is being provided to the analysts. Tracking the availability of these information sources can be difficult, but certain events may be recurring that are easier to monitor, such as conferences, meetings, or overseas business travel.

The last concept to consider is tipping and queuing. This is the idea that collecting information from the data sources can be improved by coordinating activities between multiple sensors, or if certain events cause the collection to begin or end. If configured correctly, tipping and queuing makes the collection apparatus more refined, and provides a smaller set of highly relevant data for analysts to work with in their daily operations.

Indicators of Success

  • More cost effective data gathering
  • Greater understanding by the analysts/data collectors as to what information is actually needed
  • Improved and more efficient coordination with third party intelligence providers