无忧传媒

无忧传媒: Lessons Learned About Generative AI

Reflections on Generative AI

VELOCITY V3. 2025 | Ernest Sohn and Alison Smith

Lessons learned from two years of prompting

Generative artificial intelligence's (GenAI鈥檚) most remarkable attribute is how rapidly it continues to advance. More than two years after the release of ChatGPT, the emergent abilities of early frontier models like GPT-3鈥攕uch as computer programming skills and early signs of reasoning鈥攁re now the core competencies of newer, more advanced models. A indicated that OpenAI鈥檚 o1 and o1-mini models have demonstrated the ability to pass the company鈥檚 research engineer hiring interview for coding. In October 2024, Anthropic announced that it was teaching its AI assistant Claude how to use a computer like a human and tackle open-ended tasks like conducting research.

Though GenAI鈥檚 technological capabilities are advancing at an accelerated rate, its impact at the enterprise level has been more nuanced. In the simplest terms, the technology has not yet consistently and competently performed complex operations without human assistance. This paradigm will likely change within the next three years as the underlying technology matures and enterprises pair large language models (LLM) and other types of GenAI with other technologies and with one another to create intelligent systems with higher levels of autonomy.

With those advances on the near horizon, it has become easier to see GenAI clearly, both in its creative power and existing shortcomings. Every answered prompt implies a new truth about the technology鈥檚 inner workings, and we can now take stock of what we鈥檝e learned from more than two years of prompting. Here are key perspectives on GenAI鈥檚 evolving story鈥攆rom the invaluable role humans play in its deployment to the implications of its stochastic nature and the use case types it鈥檚 best suited for.

The Best Use Cases for GenAI

Amid confirmation of both its promise and its sometimes-puzzling breakdowns, GenAI keeps edging closer to the heart of critical missions. In manufacturing, the technology is modernizing core processes, including incident response systems that harness interactive copilots to predict and proactively address issues with little human involvement. As part of a Defense Advanced Research Projects Agency initiative, GenAI now in opensource software underlying critical infrastructure.

The technology鈥檚 ability to rapidly innovate gives it the potential to help agencies address issues of critical national importance. Before these sweeping transformations can occur, agencies can reap the benefits of one of its most prolific proven use cases: improving operational efficiency. The U.S. Patent and Trademark Office, for example, harnesses its robust search capabilities to help examiners find relevant documents faster聽when processing patent applications. The Department of Defense has developed the Acqbot writing tool to reduce the human time and effort needed to generate contracts. What ties these applications together is the focus on using GenAI to help workers complete everyday tasks more efficiently. Still, these kinds of projects represent only generic early adoption successes with limited effect. The mature use cases of the future will be far more powerful and will entail combining GenAI with other types of AI to create maximum impact.

Today, organizations can use GenAI to operationalize applications ranging from customer service chatbots and software coding agents to tools for scientific discovery and policy adjudication. All are underpinned by GenAI鈥檚 capacity to rapidly and competently perform human functions, such as information retrieval, data aggregation, summarization, interpretation, analytical processing, synthesis, predictive modeling, iterative design, and, of course, content generation.

Nonetheless, GenAI isn鈥檛 yet suited for every task. Use cases necessitating more definitive, explainable, and predictable outputs, such as precision manufacturing or network security monitoring, may be more appropriately addressed by traditional machine learning algorithms. Why? Unlike traditional algorithms, which can be developed to produce the same output given the same input, GenAI models can generate different outputs due to their use of stochastic processes. Sources of variability across the model lifecycle include randomly initialized parameters within neural networks and techniques, such as Monte Carlo sampling that introduces randomness into the inference process, among many others. Further, traditional machine learning algorithms are highly specialized in terms of their data and domain, whereas GenAI models tend to excel in their generalizability.

The stochastic nature of GenAI creates a need for meticulous use case analysis to accurately calibrate operational and mission risks. For well-defined problems with specific rules and constraints, traditional AI often presents a more reliable option, particularly when the user doesn鈥檛 need the model to explain its decisions. Evaluating the tradeoffs between different AI paradigms enables agencies to identify the right approach. By embracing the probabilistic and creative spirit of GenAI applications, agencies can better harness their potential to drive transformation while supporting safe deployment.

The Relative Strengths of Different Types of AI

Traditional AI

Accuracy: Precision and reliability of results

Interpretability: Access to model decisions and outputs

Generative AI

Creativity: Invention of new ideas and logical conclusions

Generalizability: Application of learned knowledge to new, unseen data

How to Support Responsible AI

Along with the valuable assistance to core processes that they provide, LLMs鈥 glitches and miscalculations cannot be ignored. According to a published in December 2023, researchers at Google tricked ChatGPT into leaking sensitive training data when asked to repeat the word 鈥減oem.鈥 A study published in December 2023 in the indicated that the LAION-5B dataset used to train Stable Diffusion was found to contain hundreds of illegal images.

Indeed, GenAI鈥檚 blistering growth amplifies concerns about ethics, security, and responsible AI development. As a result, many countries have voiced concerns over the potential risks associated with AI advancements and advocated for international cooperation to develop a shared regulatory framework. These concerns stem from the rapid pace of GenAI development and its profound impact as a general-purpose technology across various sectors. In response, nations have begun to form global alliances and issue declarations aimed at fostering collaboration and establishing guidelines for AI governance. Examples include the G7 Hiroshima Process on Generative AI; the Bletchley Declaration affirming commitment to safe, human-centric AI; the United Nations Global Digital Compact to create a common AI ethics framework; the industry-driven AI Alliance; and the Frontier Model Forum to promote AI safety research.

While these initiatives stress the importance of collaboration, much of the language within these declarations remains vague and lacks concrete follow-up actions, raising questions about the genuine commitment to international governance. It has also become evident that eight countries, particularly those with disproportionate access to resources and advanced capabilities in chip production and foundation models, may prioritize their national interests. This pattern suggests that, despite the appearance of unity and collaboration, many countries continue to pursue individual strategies.

Though efforts to codify approaches to responsible AI at the national and global levels are driving needed awareness of the matter, they should not entice leaders to pause ongoing GenAI initiatives out of fear of overstepping some yet-to-be-drawn boundaries. The best path forward splits the difference between exploration and caution by encouraging iteration on lower risk use cases that will continue to promote development and understanding without pushing boundaries in an uncontrolled manner. There is no substitute for learning from doing, which is why continuing to put GenAI into production will position enterprises for success. It will also enable them to make valuable contributions to the field of responsible AI as they develop a more precise understanding of which guardrails are needed to balance innovation with responsible use.

Prompting with Purpose and 无忧传媒

At the core of this experimentation lies prompt engineering, which directly influences the quality, relevance, and safety of AI outputs. While anyone with an internet connection can use commercial LLMs, there is no question that skilled users who understand how the underlying technology works will extract significantly more value from GenAI. Expert prompt engineering steers the LLM to produce outputs that users can confidently integrate into codebases, databases, and various components of a system. Better prompting yields better content.

The importance of human intuition coupled with domain expertise in getting value out of LLMs is why investing in training programs to enhance employees鈥 AI literacy, prompt engineering skills, and AI-human collaboration techniques is one of the best ways enterprises can set themselves up for success. As the field of human-machine teaming continues to evolve, those with more knowledge of how AI works and more practice using it will be best positioned to integrate the technology into enterprise operations in the most impactful and responsible ways possible.听

鈥淏y embracing the probabilistic and creative spirit of GenAI applications, agencies can better harness their potential to drive transformation while supporting safe deployment.鈥

The Critical Role of Data Engineering

The success of enterprise GenAI applications requires mastering both prompt engineering and the art and science of data preparation鈥攜et many underestimate the critical role of data engineering in steering LLM behavior. Massive context windows won鈥檛 eliminate the need for careful data preparation; simply putting more raw content into an LLM won't lead to precise, reliable outputs. In some cases, adding too much context introduces new risks like overwhelming the attention span and diluting relevant information. In particular, retrieval augmented generation (RAG) applications must focus on effectively integrating high-quality data into the LLM application. Properly preparing the data helps models to perform optimally, increasing the likelihood that models will produce actionable, accurate, and robust outputs that can be used in decision making. In reverse, improperly or insufficiently preparing the data opens the door to hallucinations that offer little to no value for users.

Traditional data preprocessing techniques remain equally relevant to GenAI applications, including basic cleaning and formatting, such as removing personally identifiable information (PII) or irrelevant or incomplete data and converting the remaining data into a consistent format and tokenization, where text is broken into smaller, more manageable units. In addition, strategies like chunking optimization, metadata enrichment, structured hierarchical retrieval, and reranking help to ensure RAG systems can effectively bridge the gap between raw information and actionable insights needed for enterprise-specific use cases and workflows. For instance, sliding window chunking approaches have shown to be well-suited for short to medium texts where continuity between segments of text is critical (e.g., conversations) and hierarchical models can be well-suited for processing long, multitudinous, and complex documents.

This upfront investment in data preparation, though resource-intensive, ultimately improves response accuracy, reduces operational costs, and helps maintain regulatory compliance while protecting sensitive information through appropriate filtering and access controls.听

What鈥檚 Next for GenAI

Ongoing advances in GenAI鈥檚 capabilities make it tempting to speculate about what LLMs will do next. But any prognostication about the technology鈥檚 future that focuses exclusively on improving model performance would ignore several crucial and potentially mitigating factors.

For starters, the cost of training LLMs is skyrocketing and may ultimately become prohibitive. In August 2024, Epoch AI released detailing that while it鈥檚 feasible from a technical standpoint to scale the training of AI models at the current pace, the cost of doing so will require developers to invest no less than hundreds of billions of dollars over the coming years. Even as unit costs come down, overall steep costs coupled with the uncertainty of downstream costs will be difficult to swallow for even the most cash-rich tech companies, especially considering that questions about the technology鈥檚 ability to generate commensurate returns have .听听

As the economics of models become challenging to navigate, the true cost of operationalizing GenAI will shift from model training to the point of inference: where AI is deployed in real-world environments. This will, in turn, change the focus of conversations about GenAI toward best practices for engineering end-to-end applications that use the technology for specific mission use cases, from cybersecurity to military applications to law enforcement. Enterprises will invest more time and effort into creating evaluation sets that track AI applications against traditional performance measures and keep tabs on costs. How enterprises engineer solutions will ultimately determine their ability to use GenAI to create value at acceptable price points.

Part of the way companies will adjust to this new reality is by pursuing vertical integration. Hardware providers that are doing groundbreaking work within the GenAI stack will seek to create software layers, and software companies may seek to manufacture their own hardware. A good example of this vertical integration is NVIDIA. The company that is known around the world for building the graphic processing units that have powered the AI revolution is already building off its success in hardware to reimagine what software can do. NVIDIA has created Omniverse, a platform that enables developers to simulate physical worlds with remarkable precision, and NIM Agent Blueprints, which help developers build applications that use one or more AI agents.

What these developments show is that the true power of GenAI lies beyond the remarkable powers of the algorithms underpinning it. As with all technological revolutions, GenAI started out with a bang. Now comes the less glitzy, but no less impactful, stage: integrating it with end-to-end applications in cost-effective, innovation-driving ways.

Key Takeaways

  • Generative AI鈥檚 superpower is the speed at which it advances, a trait that can be seen in how far the technology has progressed since the launch of ChatGPT in November 2022.
  • But as the cost of training large-language models skyrockets, organizations will focus less on improving model performance and more on operationalizing the technology at the point of inference: where AI is deployed in real-world environments.
  • As this shift takes place, organizations that prioritize AI literacy and data engineering will be well-positioned to deploy generative AI effectively and responsibly.

Meet the Authors

an AI leader for 无忧传媒鈥檚 Chief Technology Office, delivers AI services and capabilities to help federal civilian clients meet mission-critical needs.听

无忧传媒鈥檚 director of generative AI, leads GenAI solutioning across the firm and helps teams create best practices for AI development and use.

References
  • Alison Nathan, Jenny Grimberg, and Ashley Rhodes, 鈥淕en AI: Too Much Spend, Too Little Benefit?鈥 Top of Mind 129, June 25, 2024, .
  • Brandon Vigliarolo,鈥淪tudy Uncovers Presence of CSAM in Popular AI Training Dataset,鈥 The A Register, December 20, 2023,聽.
  • David Vergun, 鈥淒ARPA Aims to Develop AI, Autonomy Applications Warfighters Can Trust,鈥 DOD News, March 27, 2024, .
  • Ethan Mollick, 鈥淪omething New: On OpenAI's 鈥楽trawberry鈥 and Reasoning,鈥 One Useful Thing, September 12, 2024,聽
  • 鈥淚ntroducing Computer Use, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku,鈥 Anthropic, updated November 4, 2024,聽
  • Jamie Sevilla, Tamay Besiroglu, Ben Cottier, Josh You, Edu Rold谩n, Pablo Villalobos, and Ege Erdil, 鈥淐an AI Scaling Continue Through 2030?鈥 EpochAI, August 20, 2024, .
  • Jory Heckman, 鈥淒oD Builds AI Tool to Speed Up 鈥楢ntiquated Process鈥 for Contract Writing,鈥 Federal News Network, February 9, 2023, .
  • Peter Grad, 鈥淭rick Prompts ChatGPT to Leak Private Data,鈥 Tech Xplore, December 1, 2023, .
  • Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha, 鈥淎 Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications,鈥 arXiv, February 5, 2024, .听

VELOCITY MAGAZINE

无忧传媒's annual publication dissecting issues at the center of mission and innovation.

Subscribe

Want more insights from Velocity? Sign up to receive more stories about emerging technologies and the impacts they鈥檙e making on missions of national importance.



1 - 4 of 8