Velocity by Booz Allen

Velocity studies the complex issues that are emerging for mission and technology leaders on the front lines of government innovation.

Insights for Federal Innovators V2. 2024

The Age of Principled AI

PG. 36

Contents

DEPARTMENTS

Letter From the CTO Susan Penfield

The Final Word Horacio Rozanski

Page 36 “ A year from now my hope is that Responsible AI will emerge as the cornerstone of all AI. Every person that touches AI in some capacity will have a role in accountability,

ABOUT THE COVER Cover art was designed by Brody Rose. It uses AI-generated imagery from Adobe Express and Adobe Photoshop’s AI Generative Fill function. Its imagery represents optimism through change, with the protea flower as a symbol of resilience and strength in transformation. A NOTE TO READERS The ideas and opinions contained herein are those offered by the individual authors. They are intended as considerations in the associated technical areas discussed. They do not necessarily represent the firm’s views, but offer the breadth, depth, and currency of Booz Allen’s technical talent and that of our partners. References to any products or companies are not meant to be an endorsement by Booz Allen or any other contributors.

transparency, and fairness embedded in AI systems and organizations.”

— Navrina Singh , founder and CEO at Credo AI

F E A T U R E S

28 EQUITY AI for Everyone Four Pillars for Fostering a Diverse, Equitable, and Inclusive AI-Driven Future John Larson and Ramon Hill

30 TALENT A Workforce Disrupted?

36 HUMANITY The Age of Principled AI Exploring Risk, Responsibility, and Possibility John Larson and Geoff Schaefer

42 CIO CORNER Navigating IT Through AI Adoption Q&A With Brad Stone,

46 PERFORMANCE Unlocking Human Potential Personalized Intelligence Enhances Cognitive and Mission Performance

52 ENVIRONMENT The Physical Impact of Data Achieving Both Computational Power and Sustainability Brianna Hogan

58 CRITICAL SERVICES The Emerging Citizen Experience Advantages and Challenges of Human-Machine Service Delivery Will Healy, Ernest Sohn, and Santiago Milian

Empowering Employees to Embrace AI, from Adaptation to Adoption Betty Thompson, Joe Rohner, Julie McPherson, and Logan Gibson

Chief Information Officer and Chief Data Officer, Booz Allen

Munjeet Singh, Cameron Mayer, and Dave Prakash

Page 28 “ Cultivating diverse talent isn’t just about filling seats— it’s about ensuring that the AI systems of tomorrow are built with the collective wisdom of our entire society, reflecting the richness of our shared experiences and values.”

MISSION SPOTLIGHTS

TECHNICAL COLUMNS

COMPUTING TECHNOLOGY .......................................................................................................... 11 Can Quantum Supercharge AI?

SPACE 06 Linking Large Language Models for Space Domain Awareness Advancing AI to Ensure Freedom and Safety in Space Ron Craig and Michelle Harper INTELLIGENCE 15 Open-Source Overload Mobilizing Publicly Available Data for National Security Eric Zitz and Gabi Rubin CLIMATE 68 From Science to Practical Climate Resilience The Vision, Barriers, and Novel AI Approaches to Operationalize Climate Science Prachi Sukhatankar

Quantum Computing’s Role in the Evolution of AI Isabella Bello Martinez, Ryan Caulfield, and Brian Rost

EMERGENT BEHAVIOR .................................................................................................................. 19 Generative AI in the Wild Deploying Powerful Mission Applications with Purpose Ted Edwards and Alison Smith PROCESSING ................................................................................................................................ 24 Managing Edge AI Sprawl Simplifying Complexity Beyond the Enterprise Brad Beaulieu, Beau Oliver, Josh Strosnider, and Rebecca Allegar CYBERSECURITY .......................................................................................................................... 64 A Digital Tapestry Weaving AI into Unified Cyber Defense Patrick Myers and Aaron Sant-Miller SOFTWARE DELIVERY ................................................................................................................... 74 Code Writing Code The Next Major Disruption in Software Development Jarid Cottrell, Josh Gordon, Michael Martoccia, and Sahil Sanghvi

— Alex Kotran, co-founder and CEO at aiEDU

FROM THE CHIEF TECHNOLOGY OFFICER

I am excited to welcome you to the second edition of Velocity , our annual publication for government and industry leaders to explore emerging issues at the intersection of mission and technology. In this edition, we’re placing the spotlight squarely on artificial intelligence (AI)—from adopting and deploying this uniquely powerful technology to using it responsibly (see our cover story on page 36) and channeling it effectively for mission impact. Advances in generative AI dominated headlines this year, as we crest the latest AI hype wave. Machines rapidly creating language and images democratized a sense of AI’s possibilities and risks. While there’s uncertainty about how the future of this technology will unfold, we do know that AI is categorically different from other past and emerging capabilities. Most important is its extreme ubiquity—the way it is poised to reshape all areas of human life. At this open-ended AI moment, I invite you to explore the insights we’ve gathered from experts across the industry landscape as we seek to expand and deepen our knowledge through fundamental questions: How will AI reimagine everything, from the global workforce to sustainability practices—and how should organizations proactively prepare, adapt, and respond? How will agencies across defense, intelligence, and civil missions harness emerging tools to accelerate capabilities and create decision advantage?

What advances in cybersecurity, edge computing, and quantum computing should be top of mind for government innovation? I hope you will find ideas and insights throughout Velocity that your organization can investigate further throughout your AI journey. As this collection demonstrates, the way we approach, build, and deploy AI to the mission today will set agencies on the path to meaningful, accelerated, and long-term impact. In this era of dramatic change, I believe we are well positioned to harness this singular enabler for good and to come together for responsible innovation. Thank you for joining the conversation.

Susan Penfield Chief Technology Officer Booz Allen

Booz Allen’s AI Adoption Studio in Washington, DC

C rowded orbits, fast-moving adversaries, the number of active satellites expected to grow to more than 60,000 by 2030—space leaders have a challenge only advanced AI can address. Space domain awareness, the practice of tracking and understanding factors that can affect U.S. space operations, requires integration and calculations beyond current capabilities. We find that a new approach, networking large language models (LLMs), can accelerate capabilities across an enterprise of global partners. The U.S. Needs to Move Faster in the New Space Race Once envisioned as a sanctuary for exploration and other peaceful pursuits, space is now increasingly “competitive, congested and contested.” Rival nations are developing anti-satellite tactics and weapons, militarizing the domain even as a new commercial space industry is booming— centered on initiatives from selling space data to building rockets to further NASA’s mission to Mars. Moreover, proliferated satellite constellations are being rapidly developed and launched in clusters. The first megaconstellation was launched in 2019; four years later, these groups already make up more than half of active satellites. A single network can contain hundreds or even thousands of satellites—most notably Starlink, which plans to expand its fleet to as many as 42,000. This significantly crowds a domain that has become critical for daily living. From national security to climate science, communications to traffic directions, we depend on satellite services to be there when we need them. All these dependencies create an urgency to adopt innovations and strategies that will ensure the U.S. and its allies stay at

Linking Large Language Models for Space Domain Awareness ADVANCING AI TO ENSURE FREEDOM AND SAFETY IN SPACE Ron Craig and Michelle Harper MISSION SPOTLIGHT: SPACE

As space capabilities continue to accelerate, risk will rise along with reward. Expected LAUNCHING AN ERA OF HIGHER RISK

satellite growth: The total number could top 60,000 by 2030. Estimated space junk : More than 100 trillion pieces may already be orbiting. Expanded risk : Even tiny paint flecks can damage a satellite.

the forefront of space—and knowing the location of space assets and their operators’ intent is foundational to that goal. Celestial Chess: A Life- or-Death Game Staggering as it is to contemplate a trillion objects traveling through space, the challenge isn’t just about the computational power required. It’s about transforming a process where operators manually track data on multiple screens into a system capable of integrating complex datasets, automating processes, and applying advanced algorithms to add a new level of precision—enabling predictive analytics and, ultimately, recommended courses of action.

SOURCES: Phys.org: “Scientists Call for Global Push to Eliminate Space Junk”; NASA.gov: "Space Debris and Human Spacecraft"

particular set of satellites. The primary LLM identifies the satellites and, equipped with past data, knows that the last 40 times that constellation passed in that orbit in that configuration, the satellites moved slightly closer to a U.S. Space Command satellite. Because the LLM is also configured with the other models, it can query those LLMs on the behavior as well. As a result, the primary LLM can inform the operator that this constellation has recently been flagged for moving out of its orbit and its speed is increasing just enough to create that devastating conjunction. And thanks to information from the LLM trained on adversarial threats, the primary LLM could also queue up details for the operator identifying tactics that could be at play. Going one step further, we can imagine one of the LLMs is trained in the physics of the problem. The primary LLM could use this capability to extrapolate possible scenarios and courses of action—perhaps recommending a maneuver that uses less fuel or is less disruptive to the orbits of other satellites. The human remains in charge and is empowered to make a more confident decision, faster. Knowledge gained on this encounter will feed into the primary LLM’s learning process. And because it is linked— nested with the other LLMs to provide hierarchical, context-based learning—it gains knowledge from them at a massive rate. Its information is continuously updated as the nested LLMs are trained and validated on the latest feeds in their vast database. Every event helps the system become steadily more intelligent, creating an ever more sentient space domain awareness capability. Adding Classified Data—and What It Takes to Do It The crowding of space requires higher accuracy in tracking and predicting space movement, requiring data from all sources—especially classified data, traditionally difficult to share. A critical aspect of the linked approach is that networking LLMs has been demonstrated in a secure enclave, using both classified and unclassified data. The innovation can be taken to space organizations by teams with a development environment built on open architectures, infrastructure that leverages government-owned technologies, zero trust architecture, and experience providing flexible modernization for government missions. For example, networking LLMs requires the same granular security policies as military initiatives, such as Joint All-Domain Command and Control (JADC2). Automated DataOps ensures the onboarding of diverse feeds, standardizing formats and enforcing data standards and policies plus providing granular security. Common tools and interoperable technologies simplify development. And cross-domain solutions ensure automated workflows with modular elements that can be adapted for mission demands.

Some of the most critical datasets are classified, requiring laborious manual processes to share across domains. Although technical cross-domain solutions exist, they often can’t keep pace with new file types and data structures and lack the resiliency to keep operating under stress or adversarial attack. Modeling threat scenarios requires a vast amount of data to train algorithms. As space defense is a relatively new area, data is scarce and, in some cases, doesn’t yet exist. Therefore, generating synthetic data is a necessary step, with attendant responsibilities, such as ensuring algorithms are free of bias. Tracking the Future: The BRAVO Hackathon LLMs—deep learning algorithms that generate content and perform other complex functions using very large datasets—exploded in popularity following the launch of OpenAI’s ChatGPT in the fall of 2022. While most users have been experimenting with creating poetry, writing essays, or paraphrasing information, Booz Allen has been exploring new applications for LLMs across space- domain applications. The capability to network LLMs was demonstrated in spring 2023 at the Air Force’s BRAVO hackathon, a multi-classification event drawing over 400 experts

to compete in prototyping data solutions for pressing problems. The award for best data visualization and the Best User Interface award went to a team that linked two LLMs in a classified environment using zero trust protocols. The hackathon gave the team a chance to give ensemble modeling—a process for improving multiple diverse algorithms to arrive at an outcome—new power by networking LLMs rather than individual algorithms. This opened a new path to generate the fast, comprehensive answers required to move space operations with speed and accuracy. It also provided two-way communication between specialized LLMs to amplify space operators’ awareness. After rapidly deploying a user interface, the team deployed two LLMs and wrote an app that allowed them to talk with each other (see Figure 1). The first LLM was trained in radar sensor data, while the other was trained in Earth observation (EO) imagery. The team executed a scenario where the team member, acting as operator, asked the first LLM to watch a certain area in Asia and send an alert if anything of interest was found. No special codes were needed; the operator simply typed the request as if texting a colleague. In practice the request could have been activated another way, according to operator preference; for example, via voice recognition. The first LLM, designated as moderator, located a radar image and asked the second model if it had any data. The second LLM, trained in EO, responded that it did and sent the image along. The first LLM then delivered both images to the operator along with a message saying, essentially, “I found a radar image at that location and retrieved an EO image at that same location.” The process was simple and streamlined for the human partner. “Software building conferences like BRAVO allowed us to push, and sometimes stumble, on some interesting solutions,” said Collin Paran, the AI solutions architect who led the Booz Allen team. “Linked, multimodal, and networked AI with two-way communication will certainly unlock more insights for different organizations.” Why Networking LLMs Dramatically Increases Space Awareness The synergy of LLMs working together and delivering ever more detailed, insightful results makes this approach significant. Say you have a mission-focused LLM trained on the Space Command’s catalog of objects in orbit, the Unified Data Library. Imagine you network it with an LLM trained in avoiding collisions, called conjunctions; one trained on radar data; and another trained on a military intelligence database of adversarial threats. You’ve conducted skilled training and testing, and you’ve been using the system for increasingly critical tasks. Now the system is deployed on a mission where the operator wants to understand the behavior of a

The complexities can be compared to a game of chess—a game computers have become famously good at—played in multiple dimensions, with decisions made at split- second speed. Some of the factors: Objects need to be tracked in multiple orbits. Most commercial satellites, human space missions, and the International Space Station are in low Earth orbit (LEO), a regime that extends to about 2,000 km. Higher orbits, such as medium Earth orbit (MEO) and geostationary orbit (GEO), host navigation, weather, communications, and national security satellites. NASA’s Artemis program and robotic adjuncts from multiple nations generate more traffic between Earth and the Moon. Operating safely in this region of cislunar space requires new technologies and tactics for object detection, forecasting, and collision avoidance. Data pours in from multiple sensors from multiple sources. This results in a profusion of siloed datasets in multiple formats that need to be ingested and processed, with granular security applied. Although the increased number and diversity of data sources improve our ability to perform space domain awareness, it also introduces a data fusion challenge as multiple data sources with different formats and reference frames must be integrated in real time.

Radar Satellite

EO Satellite

Figure 1: Networking LLMs demonstrated at the BRAVO hackathon. Two specially trained models shared information and seamlessly delivered a report to the operator.

Monitor this area on the ground and alert me if you find anything

Radar Database

EO Database

LLM #2 , do you have anything else at this location?

LLM #1

LLM #2

Can talk to eachother

I found a radar image at this location... ....and I also retrieved an Earth Observation image of the same location

I have an image of an object at that location

COMPUTING TECHNOLOGY

Networking LLMs to leverage their unique strengths ensures real-time advances as they collaborate on tasks. Linking these models is a practical way to deliver increasingly powerful results.

Can Quantum Supercharge AI? QUANTUM COMPUTING’S ROLE IN THE EVOLUTION OF AI Isabella Bello Martinez, Ryan Caulfield, and Brian Rost Q uantum mechanics and machine learning, two of the most transformative forces of the DID YOU KNOW: Quantum mechanics

Amplifying AI Advantages Real-time communication and collaboration between LLMs that are continually trained on trusted data opens the way to multiple advances. For example, the practice: • Frees up operators to focus on assessment rather than switching screens and manually evaluating and comparing data to anticipate threats. • Enables automated fusion of classified, civil government, and commercial data to train more powerful, precise AI models. • Provides each stakeholder with automatic access to data, improving decision making for stakeholders across domains. Training, Testing—and Then Trusting The concept of training LLMs to be networked starts with a focus on the mission and ensuring compliance with ethical guidelines like the AI Bill of Rights and the NIST AI Risk Management Framework. AI scientists need to confirm that data sources, including other LLMs, are trained on unique datasets from verifiable sources. Developers can quickly incorporate intelligent agents and tools that integrate easily with trusted sources once data ingestion is assured and a training pipeline and prompt templates are built. Meanwhile, strategies can streamline the process. For example, training on servers before migrating systems to the cloud saves on costly cloud computing.

Focused, nested training using trusted data and ensuring a strategic intersection between the LLMs is critical to ensure rapid, accurate returns. Data scientists need to go through the system and assess different weights, inputs, and other components, testing its information with truth data and then entrusting it with small tasks as a first step to more strategic ones. For example, it could be asked to develop a red-team attack scenario that the human experts can incorporate into a training exercise. Linking LLMs Can Launch Adaptive Space Awareness As General Chance Saltzman, the Space Force’s chief of space operations, emphasizes, resilience is essential and continuous awareness is critical for the Space Force’s strategy of competitive endurance. Networking LLMs to leverage their unique strengths ensures real-time advances as they collaborate on tasks. Linking these models is a practical way to deliver increasingly powerful results. It’s scalable, allowing the networking of multiple LLMs. It’s model-agnostic, so it can be used with any LLM. And it holds the promise of connecting the vast, siloed datasets that are key to avoiding celestial collisions and countering adversarial attacks. Ron Craig is vice president of space strategy and solutions at Booz Allen. Michelle Harper leads software projects that accelerate integrated capabilities for Booz Allen clients, including the Space Force.

Although quantum machine learning is still in its early stages, the progress made so far indicates that QML will have a transformative effect on AI. The impact will be felt across diverse fields, many of which are directly aligned with government interests, such as designing better materials for assets in space, improving health diagnostics, and advancing computer vision for superior ISTAR (intelligence, surveillance, target acquisition, and reconnaissance). An Evolution, in Partnership with Classical Algorithms Humanity has been “computing” since we first started using numbers. While the types of computations and technology used today are drastically more sophisticated than keeping track of bushels of wheat on an abacus, the fundamental computational model has remained the same. A computer carries out these computations using bits, which are objects that can be in one of two states—like “on” or “off”—called 0 and 1. For the first time in history, we’re starting to compute using a completely novel computing paradigm known as quantum computing . Based on quantum mechanics , quantum computing is expected to have a wide- reaching and transformative impact. At the core of quantum computation are qubits (quantum bits), the quantum version of bits. Like a bit, a qubit can exist in the 0 state or the 1 state. Unlike a bit, a qubit can also exist in a uniquely quantum state that is analogous to being partly 0 and partly 1, or existing along a continuum between 0 and 1. This makes qubits more complex than bits, enabling quantum computing to tackle problems well beyond what would be possible with classical computing.

is the physics which governs very small (particles, atoms), very cold (superconductors, superfluids), and exotic systems (lasers, semiconductors, stars) from which surprising behavior arises. Researchers are experimenting with different ways of making qubits , and no clear winner has emerged yet. Implementations range from basic quantum objects such as atoms, ions, or light (photons) to more exotic quantum systems such as nanodiamonds, superconductors, and more. At the heart of the advantage of quantum computation is a uniquely quantum phenomenon called entanglement , whereby multiple qubits become fundamentally linked and share information between themselves in ways not possible classically. The operations implemented by a quantum computer are called quantum gates. Both quantum and classical algorithms can be specified as circuits, a graphical representation of a series of gates to be applied to the qubits.

past two centuries, are converging to mark the start of a new era of AI. This convergence—known as quantum machine learning—has the potential to address limitations of classical (meaning “not quantum”) machine learning, particularly processing power and speed. Classical machine learning has undoubtedly made significant strides in data processing and predictive analytics. Yet it is often limited by computer speed and memory, especially when dealing with large and complex datasets. Quantum machine learning (QML) leverages some of the unique features of quantum systems to transitioning from offering a purely theoretical advantage to finding real- world, high-impact applications. In the realm of drug discovery, where the search for new drugs often involves navigating a vast space of molecular combinations, QML has shown potential for identifying promising compounds more efficiently. Recent evidence also suggests QML provides advantages for computer vision, where identifying key features in unlabeled images is becoming increasingly important. Relatively small quantum models have the power to perform well on even the largest and most complex datasets that would otherwise require impractically large, classical models. These, and other advantages of QML, arise from the fact that quantum systems are inherently more complex and more capable of representing complicated patterns than comparable classical systems. overcome these limitations. QML is currently on the cusp of

SPEED READ

There’s an urgent need for advanced AI as an expected surge in the number of active satellites by 2030 makes space increasingly “competitive, congested, and contested.” Networking large language models (LLMs) can enhance space domain awareness and address these challenges. The BRAVO hackathon showcased the transformative capabilities of networked LLMs in space operations. By linking these models, they can communicate, share data, and amplify space operators’ awareness, leading to more efficient and precise decision making. By allowing LLMs to collaborate and learn from each other, networking them can provide comprehensive insights into space behaviors and threats. This interconnected system promises enhanced space domain awareness and strategic advantage.

integration of quantum computing into machine learning systems, offers one promising way to do just that. QML is anticipated to provide improvements in speed and performance to diverse AI application areas spanning medicine, finance, data analysis, and more. Evidence has been mounting that QML is indeed capable of delivering on these promises. For example, a 2021 study in the Journal of Chemical Information and Modeling demonstrated how using QML could accelerate the process of drug discovery to combat diseases, such as COVID-19 and tuberculosis, compared to using analogous machine learning methods (see "Quantum Machine Learning Algorithms for Drug Discovery Applications"). This was demonstrated by using the quantum

versions of common classical machine learning methods, such as deep neural networks and support vector machines, to classify which molecules were potential inhibitors for a target disease. Despite the quantum computer’s imperfect, error-prone nature, the QML models achieved similar accuracy to the classical models while demonstrating a speed advantage that grew with the size of the dataset. The timing data suggests that as molecule databases continue to expand, QML’s superiority will only become more pronounced and remain viable when classical machine learning methods begin to struggle. This is not limited merely to drug discovery; it stands as a strong affirmation that the theoretical speedups for QML will translate into wide-reaching, real-world impacts.

Both quantum and classical computers compute by executing a series of instructions called an algorithm, which manipulates the states of their underlying (qu)bits. Unlike classical algorithms, which can only flip bits between 0 and 1, quantum algorithms use a richer variety of operations that take advantage of the qubit’s complexity. Quantum Computing Meets Machine Learning From computer vision to cybersecurity and everything in between, this past decade has shown us the versatility and power of machine learning technologies. Given this, we can expect that expanding the capabilities of machine learning will continue to drive progress across the board. QML, the

Figure 1: Illustrative steps for inference and training of a hybrid quantum-classical model

Û(φ 1 )

Û(φ 3 )

Û(φ 2 )

Standard ML training algorithms to improve quantum circuit

Convert classical data into qubits

Apply adjustable quantum circuit

Measure qubits to get back classical data

HIGH-FREQUENCY QUESTIONS ABOUT QUANTUM COMPUTERS

Are quantum computers better at everything?

No. Quantum computers excel at a limited number of tasks. Examples include, but are not limited to, simulating quantum physics for material science or drug development, optimization for logistics or finance, and math for machine learning or cryptography. They offer no benefit for many things we do with computers. That said, the advantage of using a quantum computer for the right kinds of problems can be enormous. No. Quantum and classical computers will work together. Because quantum computers offer no advantage for most computing tasks, we will still want classical computers to handle most of our computations. In the same way many current high-performance computing setups call a GPU (graphics processing unit) to accelerate certain tasks, future setups will likely be classical computers that can call a QPU (quantum processing unit) as needed.

Will quantum computers replace classical computers?

DID YOU KNOW: A quantum simulator is classical software which mimics a quantum computer. Researchers often use this because 1) it mimics a perfect quantum computer without errors and 2) it can be difficult and expensive to get time on a real quantum computer. Of course, because quantum computers are more powerful than classical ones, the quantum simulator is very inefficient and only works for simulating a small number of qubits.

In addition to speed advantages, QML is predicted to perform well using significantly smaller models than classical machine learning, making it possible to tackle previously infeasible problems. The number of features a classical model can represent is directly related to its size. Qubits, however, contain more information than bits, so significantly smaller quantum models can represent the same number of features. This may allow for reasonably sized quantum models to perform well in situations where an infeasibly large classical model would be needed. Evidence of this was provided by a 2022 computer vision study in Quantum Science and Technology that focused on learning with unlabeled data (see "Quantum Self-Supervised Learning"). It is quickly becoming infeasible to label many datasets of interest due to their sheer size, such as photos on social media or images taken by self- driving cars, so the ability to efficiently learn on unlabeled data is increasingly critical. Techniques for learning on

complex, unlabeled datasets often require impractically large models since they must discern and represent complicated patterns in the data. In this study, the researchers trained a model to classify simple images of planes, cars, birds, cats, and deer. They then trained the same model again but replaced part of the model with a quantum equivalent. Using a quantum simulator , the authors showed that the quantum model outperformed a classical model of the same size and that a smaller quantum model could achieve the same performance as the classical model. The quantum model was then run on a real, imperfect quantum computer and matched the classical performance, despite the high error rate of the quantum computer. The study shows that this QML technique is capable of overcoming the classical bottlenecks and is robust against the errors in our current quantum computers, suggesting that QML for computer vision may become a practical application soon.

So, quantum computers actually exist?

Yes. Many companies, governments, and academic institutions currently have quantum computers that vary greatly in size, power, architecture, and more.

If quantum computers already exist and are supposed to be so powerful, then why aren’t we doing more with them?

Quantum computers are still not sufficiently powerful to be commercially impactful. Despite the largest general-purpose quantum computers theoretically having plenty of qubits (hundreds) to completely outclass our best supercomputers for certain problems, these quantum computers are unlikely to outclass a smartphone. This gap between theory and practice exists because current quantum computers are prone to errors that severely limit their power. It is known, in theory, that these errors can be overcome, and that “perfect” quantum computers can be built. The timeline for achieving this is still unclear, but most experts estimate in terms of decades, not years (see The Quantum Threat Timeline Report 2022). We expect that imperfect quantum computers will be useful for solving real-world problems soon, with many estimates for achieving a quantum advantage falling between this year and 2028. It remains an open question of when and for what quantum computers will first become impactful. One promising candidate application for near- term quantum computers is machine learning.

If “perfect” quantum computers won’t be available soon, why should we care about them now?

From computer vision to cybersecurity and everything in between, this past decade has shown us the versatility and power of machine learning technologies. Given this, we can expect that expanding the capabilities of machine learning will continue to drive progress across the board.

QML, the integration of quantum computing into machine learning systems, offers one promising way to do just that.

Other studies run on quantum simulators point to quantum

the data on this small number of qubits, the original data had to be compressed significantly. Though QML displayed benefits over machine learning models trained on such compressed data, it still lags behind the performance machine learning models trained on the full, uncompressed data. Advancement Within Reach As the power of quantum computers continues to grow, we can expect QML to swiftly catch up to and surpass the current state of the art as it develops into a powerful tool. This rapidly developing technology is poised to bring unprecedented advancements to a wide range of AI application areas, from computational biology to climate modeling, by offering improvements in performance and efficiency. Although QML is still a nascent technology, it has already been validated through small-scale

experiments and theoretical work. It’s essential to acknowledge that, like all emergent technologies, QML has its nuances and challenges. However, given the explosive growth trajectory of quantum computing, we can expect QML to rapidly transition from a fledgling technology with limited practicality to an invaluable tool that improves our ability to solve complex problems beyond the reach of classical computing techniques and advance the national security agenda. Isabella Bello Martinez , Ryan Caulfield , and Brian Rost are scientists at Booz Allen, helping clients understand what quantum computing can do today and how to prepare for the next wave of capabilities.

advantages for a variety of machine learning tasks crucial to applications, such as detecting financial irregularities, increasing battery efficiency, and diagnosing diseases, such as breast cancer and COVID-19. The evidence contained in these studies and many others is strengthened by the fact that it serves to confirm known theoretical advantages of QML. The combination of empirical results and theoretical predictions underscores the reliability and potential of QML for real- world applications. While these initial results are exciting, it’s crucial to clarify that current best QML models cannot yet outperform the best classical machine learning models. The QML examples highlighted above were built using a small number of qubits, with the computer vision being the largest at only eight qubits. To fit

SPEED READ

Open-Source Overload MOBILIZING PUBLICLY AVAILABLE DATA FOR NATIONAL SECURITY Eric Zitz and Gabi Rubin MISSION SPOTLIGHT: INTELLIGENCE

Quantum computers are increasingly being utilized as part of machine learning, creating the exciting new field of quantum machine learning (QML), which promises to overcome some of the processing power and speed limitations of other machine learning methods. QML is likely set to revolutionize areas such as drug discovery and computer vision by efficiently handling large and complex data sets. While quantum computers excel at specific tasks, they will need to work in tandem with classical computers to enhance their capabilities for certain problems. Relatively small quantum models have the power to perform well on even the largest and most complex datasets that would otherwise require impractically large, classical models. Current QML models have shown promise but they still face challenges, such as the need for data compression due to limited quantum bits (qubits). Despite these hurdles, the rapid growth and development of quantum computing indicates that QML could soon transition from a nascent technology to a transformative force.

While it will never replace classified intelligence collection and analysis, OSINT is the “INT” that best balances the traditional need for secrecy with the increased need for rapid information sharing to address developing and emergent threats.

Apart from building advanced analytic engines to automate the exploitation, this data challenge requires analysts to determine what data to prioritize and target, based on the mission problem, in order to limit the volume required for aggregation. Accomplishing this can provide analysts with previously unavailable insights, including patterns of life; messaging trends; social, financial, and supply chain networks; breaking news updates; and more. Let’s take a closer look at the range of sources available and which need to be curated to paint a complete picture. While OSINT aggregation, processing, and analysis is not free, many of the sources that form the basis for OSINT insights are. Free and open, publicly available information can range extensively in terms of data type and content. Though by no means an exhaustive list, some examples include: • news articles • public social media posts • marine or air traffic monitoring sites

I n light of today’s national security threats, the need for the U.S. intelligence community (IC) to swiftly process and disseminate information has never been more urgent. Dynamic and multifaceted challenges demand unprecedented speed and agility, and this imperative has been further magnified by the exponential rise of publicly available information. Open-source intelligence (OSINT), as defined by the SANS Institute, is “intelligence produced by collecting, evaluating, and analyzing publicly available information with the purpose of answering a specific intelligence question.” Beyond mere information, OSINT can contextualize, enhance, and validate analysis and provide opportunities for increased dissemination. While it will never replace classified intelligence collection and analysis, OSINT is the “INT” that best balances the traditional need for secrecy with the increased need for rapid information sharing to address developing and emergent threats. In a panel about the future IC workforce, Patrice Tibbs, deputy chief of the Open Source Enterprise at the Central Intelligence Agency, called OSINT the “INT of first resort” and said that OSINT brings a new perspective not only to the information being collected but also to the entire process of intelligence collection. But OSINT sources are far from static and have a scope that continues to expand. What was once limited to historical research or news monitoring now encompasses a rich array of technology and collection methods. Even technical domains traditionally associated with geospatial intelligence or signals intelligence can now be integrated and analyzed alongside more traditional OSINT sources, giving rise to more robust and holistic intelligence. Director of National Intelligence Avril Haines affirms that various intelligence disciplines and IC agencies are all diligently developing their versions of open-source capabilities. “However, we are not in a position where we feel as if the entire intelligence community is leveraging … the best of what we can do in this space yet, and that is something that we have been focused on,” she said, in a hearing to the Senate Committee on Armed Services.

With its advantages, OSINT also disrupts the intelligence playing field. According to former Defense Intelligence Agency Director Robert Ashley and former Principal Executive of the Office of the Director of National Intelligence (ODNI) Neil Wiley, “the ubiquity and accessibility of this public data” narrows the advantage of the IC’s niche and proprietary intelligence sources and methods, which emphasizes the need to continuously evolve and adapt how publicly available information is integrated with classified resources. National security organizations must evolve their understanding of OSINT to keep pace with emerging adversarial capabilities in big data aggregation, cloud computing, AI, and machine learning (ML) analytics. Data, Data Everywhere Big data stands as the formidable linchpin within the OSINT domain, shaping the very frameworks of collection and processing methodologies. According to Statista, in this year alone, 120 zettabytes of data will be created, transformed, captured, copied, and consumed and that number will grow by 20–30 zettabytes annually thereafter. To put that into perspective, one zettabyte is equal to one trillion gigabytes, or over 570 million years of YouTube videos. Even if the IC is only expected to ingest and analyze a fraction of that, it’s a nearly inconceivable task.

• international trade databases • company registration databases • government public records • nongovernmental organization (NGO) reports • civil imagery • human geography and infrastructure data

license agreement that affords a continuing right of access.” The report further notes that “there is today a large and growing amount of CAI that is available to the public, including foreign governments (and their intelligence services) and private-sector entities, as well as the IC. CAI clearly provides intelligence value, whether considered in isolation and/or in combination with other information, and whether reviewed by humans and/or by machines.” As the IC increases investment in commercially available information in alignment with the nation’s strategic imperatives, it will need to continue developing new OSINT policies and frameworks. These guidelines can help ensure the IC handles and collects information properly, maintains vigilance against potential counterintelligence threats, and allocates resources comprehensively to support the development of tools, training, and tradecraft.

It’s a matter of knowing that information exists, then collecting, vetting, and processing it to address intelligence requirements. However, an additional level of vetting is often required for free sources of publicly available information to ensure the data can be validated and avoid leveraging mis- or disinformation. Extracting value from freely accessible information sources is often a race against the clock. For example, in 2023, Twitter stopped allowing unregistered viewers to see individual tweets and limited the number of tweets non-paying users could view per day. Many aggregators (though not all) were locked out and no longer able to leverage the Twitter application programming interface (API). Similarly, the United Nations’ Comtrade database recently updated its subscription plan and now only allows limited access and no downloads on their free public user license. There is also a subset of publicly available information that is commercially sold, which includes geolocation; commercial satellite and airborne imagery; radio frequency data; subscription news and journal articles; and databases of pre-aggregated public records, among many others. An ODNI report from 2022 describes that the purchase of commercially available information (CAI) can be made one time or on a subscription basis and may involve a purchaser “directly ingesting the CAI or obtaining a

A zettabyte is 1,000,000,000,000,000,000,000 bytes (or more than 570 million years of YouTube videos)

Deciphering and Accelerating the Value of OSINT Modern technologies are turning the tide and

empowering national security organizations to harness the immense potential of public and commercially available data and analytic sources. Advanced modeling and automation capabilities, decentralized data processing, and AI/ML are alleviating the tasks that were once arduously executed through hours of manual research, exploitation, integration, and analysis by the IC.

EMERGENT BEHAVIOR

AI’s ability to process zettabyte-scale datasets is already being seen in generative AI and large language models, and new methods and approaches are being applied to less structured data, such as imagery. Synthetaic Chief Executive Officer Corey Jaskolski says, “The future of AI lies in intuitive tools that put that power into the hands of subject-matter experts who can elucidate insights from the AI in real time and then make meaning from the insights it generates.” As AI/ML capabilities are paired more deeply with human analysts, national security stakeholders will see costs greatly reduced and mission outcomes dramatically enhanced. “The ability for analysts to easily capture, collaborate, and automate their tradecraft frees them to perform higher level analysis instead of worrying about data representation and translation,” says Nask Incorporated Technical Director Ken Pratt. “This is a force multiplier allowing fewer analysts to perform more effective and valuable analysis against a larger corpus of data.” Beyond automation, the decentralization of information processing heralds remarkable efficiencies. Here, raw data is exploited closer to its source and the integration occurs downstream, closer to the analyst or user. The embrace of decentralized workflows and the synergy of specialized tools significantly amplify the pace of exploitation and insight generation. This is true for nearly all data types and sources, including text-based foreign media and social media; published databases; and technical data from satellites, aircraft, and other sensors. An excellent example of this is the potential for decentralized Commercial Synthetic Aperture RADAR (COMSAR) collection and exploitation closer to the source, which would maximize the community’s ability to harness the rich, remote sensing metadata while

minimizing the costs of data transfer and storage of large COMSAR imagery files. “By delivering and processing that data at the edge, the analytic insight delivery time to the user is measured in seconds and minutes, not hours,” says Ursa Space Vice President of Government Programs George Flick. “This allows for quicker situational awareness and decision making for users and operators. Speed is essential.” New and emerging technologies and techniques in AI/ML are designed for large and complex datasets, and advanced algorithms shine the closer they’re hosted to the raw data. OSINT has continued to evolve alongside the emergence of new data sources and techniques, spanning the spectrum from free or open to commercial realms. The contemporary landscape boasts an unprecedented volume and diversity of OSINT, demanding more robust analytic capabilities in AI/ML and automation to sustain an intelligence edge and enable the IC to harness the massive amounts of information proliferating in the public sphere. The key to leveraging these sources and capabilities and driving enhancements for national security missions is to purposefully discover, identify, process, and integrate the right data—it is not to try to process every byte of unclassified data that’s out there. By understanding the value of the data, decentralizing its exploitation, and layering it through AI/ML technologies and techniques, the national security community can create an advantage through these sources and insights. Eric Zitz is a mission technology leader for Booz Allen’s national security business. He develops new capabilities and solutions that integrate data science, modeling, and automation to enhance intelligence operations. Gabi Rubin is a leader in Booz Allen’s OSINT capability Global4Sight®, which offers language-enabled and data-driven intelligence solutions for civilian, intelligence, and military agencies.

Generative AI in the Wild DEPLOYING POWERFUL MISSION APPLICATIONS WITH PURPOSE Ted Edwards and Alison Smith

I n an article whose title shares that quote, PitchBook data spotlights the striking separation between early- stage AI startups and other young companies in initial funding rounds. In 2023, generative AI companies’ pre-money valuations increased an incredible 16% from the prior year, compared to a significant drop of 24% for startups in all other sectors attempting series A and B funding (see Figure 1). The enthusiasm from investors is well founded: Generative AI recently took AI from the realm of engineers and democratized the technology in a way that no other low-code/no-code platform has ever done. OpenAI’s ChatGPT launched into the public consciousness with a bang—setting the record for the fastest-growing consumer base for an application—and immediately demonstrated far-reaching “In the world of startup valuations, there’s generative AI—and everything else.”

capabilities as an AI assistant on everything from composing recipes to summarizing complex topics and generating computer source code. Google and Bing have incorporated generative AI into their search systems, enabling direct responses to queries and circumventing a user’s need to sift through numerous webpages to find answers. Service providers like Midjourney, Stability AI, and OpenAI’s DALL-E have harnessed generative AI to create accurate and striking images from text-based descriptions of the desired output. Despite the groundbreaking nature of these services, enterprises and their users should take a deliberate approach. Generative AI surfaces considerations regarding fair use and copyright based on how they use training data. Models can easily mislead users, novices, and experts alike, making it difficult to distinguish fabrications from factual content (this particular risk manifests as

“hallucinations,” which are outputs and information from models that sound highly plausible and convincing but are simply made up or incorrect). And of course, for enterprises with sensitive data, there’s the serious risk of data spillage by employees sharing confidential information with open- source generative AI models. The immense power of generative AI is ripe to uncover transformative opportunities across government missions. But like any tool, generative AI is only effective if it is applied with purpose. Therefore, rather than asking “How can I use generative AI?” the more nuanced, strategic question is: “Where will generative AI tackle a challenge better than the other tools in our toolbox?” This article explores the singular role of generative AI in revolutionizing government missions while agencies navigate emerging challenges; weigh the costs of application; and ensure responsible, effective use.

Comparison of median early-stage, pre-money valuations (in millions)

$100

SPEED READ

Figure 1: Generative AI Startups vs. Other Startups

Generative AI Startups

$86.5

The modern landscape of national security challenges compels the U.S. intelligence community to swiftly process and disseminate information with unprecedented speed. This urgency is heightened by the exponential rise of publicly available information intensifying the demand for agility and adaptability. The ubiquity of public data narrows the advantage of classified intelligence sources, necessitating continuous evolution and integration. Open-source intelligence (OSINT) strikes a balance between secrecy and rapid information sharing, offering new perspectives on intelligence collection processes. The potential of OSINT is not static; it evolves to encompass diverse sources and domains, from text-based to technical data, fostering a holistic intelligence picture. National security organizations leverage modern technologies to tap into the vast trove of publicly available and commercially sold data. Automation, decentralized data processing, and AI/ML capabilities expedite the exploitation of data previously requiring manual efforts. The key is to identify valuable data, decentralize its exploitation, and integrate it with AI/ML technologies.

All Startups

$50

$44.5

$40

$38

$26

Source: PitchBook data Geography: U.S. *As of May 18, 2023

2020

2021

2022

2023

Page 1 Page 2-3 Page 4-5 Page 6-7 Page 8-9 Page 10-11 Page 12-13 Page 14-15 Page 16-17 Page 18-19 Page 20-21 Page 22-23 Page 24-25 Page 26-27 Page 28-29 Page 30-31 Page 32-33 Page 34-35 Page 36-37 Page 38-39 Page 40-41 Page 42-43 Page 44-45 Page 46-47 Page 48-49 Page 50-51 Page 52-53 Page 54-55 Page 56-57 Page 58-59 Page 60-61 Page 62-63 Page 64-65 Page 66-67 Page 68-69 Page 70-71 Page 72-73 Page 74-75 Page 76-77 Page 78-79 Page 80-81 Page 82-83 Page 84-85 Page 86-87 Page 88