Federal Leaders Are Shifting From ‘Mitigating’ to ‘Managing’ AI Bias
From DHS’s rigorous testing protocols to INL’s “personality” assessments, experts say human oversight is the only way to safeguard AI systems.
While the public narrative often treats bias as a software error to be “fixed” or “patched,” federal IT leaders argued that bias is an inherent part of any system that interprets data and requires active oversight.
“Bias is simply the application of values in a certain situation,” said Kevin Byrne, DevSecOps lead at Idaho National Laboratory (INL), during GovCIO Media & Research’s AI Summit in Tysons, Virginia, Friday. “I probably prefer the term ‘managing’ bias to ‘mitigating’ bias, because there’s inherently always bias. There’s always a set of values put over whatever data we get.”
Defining the ‘Value Structure’ of AI
Byrne said some AI responses are objectively “wrong,” while others are simply subjective interpretations based on the data the model was fed.
“If I ask, ‘What is the number one?’ And it says, ‘a giraffe,’ I’m probably going to think that’s wrong,” Byrne explained. “But if it says, ‘it’s the loneliest number,’ that’s not necessarily wrong, but that’s clearly indicative of a unique value structure.”
This perspective shifts the burden from the machine to the human handler, he noted, and the primary challenge for organizations like the INL becomes “getting to know the model much like one would get to know a new colleague.” Byrne described exercises where different models, such as Anthropic’s Opus 4.5 and Grok, are asked to compare themselves to fictional characters to reveal their underlying “personalities” and consistencies.
“A pretty fun exercise is to have the models ask about each other,” Byrne explained. “I asked Grok to compare itself and Anthropic Opus 4.5 to fictional characters. Grok said that it was Tony Stark, and it said that Opus 4.5 was like Spock. Opus 4.5 said that it was more like Hermione Granger, and that Grok is more like Deadpool. And I love that because it shows both consistencies as well as differences.”
DHS and the Work of Testing
Arun Vemury, a senior advisor at the Department of Homeland Security’s Science and Technology Directorate, agreed that bias is a “very overloaded term,” adding that his agency faces a unique set of challenges where bias can have immediate, real-world consequences — specifically in biometrics and identity technologies. DHS’s objective, Vemury said, is more clinical in trying to drive technical errors to zero to remove the possibility of disparate impact.
“Our goal, at the end of the day, is to drive errors to zero. If you make no errors, there really is no bias,” Vemury said.
He added that his office uses a test and data-driven approach that many organizations tend to overlook due to cost and complexity.
“Testing, testing, testing. It’s not sexy. Nobody likes paying for it, but it’s probably the best way to figure out if there are any unintended issues or consequences,” Vemury said. “[We can find out] whether there was something that happened in the model where, all of a sudden, now we’re seeing some undesirable behavior.”
Data vs. Algorithms
Tommy Gardner, CTO of HP Federal, added that bias originates in particular places in AI systems. He said that the math underlying AI is fundamentally neutral; the problem lies in the human choices made during data acquisition.
“The algorithms are not bias. They’re mathematics. It’s kind of hard to bias math. This application and where the true bias comes in … is from your data selection. How you collect the data?” Gardner noted.
He pointed to the classic principle of “GIGO”— garbage in, garbage out. Because ethics are social constructs that vary from person to person based on their upbringing, peers and even childhood coaches, any dataset collected by humans will naturally carry those predispositions.
“There’s no zero or one right or wrong on ethics. It becomes a social event of what we can agree on,” Gardner said. “That’s why we need to get up and discuss these things so we understand where the damage is done, what is the damage done and what was the root cause of that damage?”
Gardner suggested that the only way to safeguard these systems is through rigorous debate and transparency, ensuring that “unintended consequences” are considered before a system is deployed in high-stakes environments like finance or national defense.
Human-Algorithm Teaming and the Echo Chamber
Vemury added that the human must remain an active participant in the system to ensure accountability, ethics and efficacy.
“On our team, we call it human-algorithm teaming, or human-machine teaming,” Vemury said. “How do we take the things that the human being does well and combine it with what the model or the technology does well, and try to optimize the overall process?”
This collaboration is vital because humans and machines often fail in different ways, he said. A human might be able to spot a “hallucination” that lacks common sense, while a machine can process data at a speed that would overwhelm a human brain, Vemury added.
Byrne issued a warning about the psychological trap of using AI that always seems “correct.” He argued that if an AI tool never challenges its user, it has likely become a mirror of the user’s own existing biases.
“If the model is telling you exactly what you expect constantly, that’s a red flag that you’re in an echo chamber, because realistically, you have your own bias. If it isn’t occasionally, at least surprising you, if not disagreeing with you, you really owe it to yourself to ask, ‘Well, what’s going on?’” Byrne said.
This is a carousel with manually rotating slides. Use Next and Previous buttons to navigate or jump to a slide with the slide dots
-
Pentagon’s Generative AI Push is ‘Game-Changing,’ Official Says
Federal AI leaders say the last six months have redefined what’s possible, and the next six could move even faster.
3m read -
Building Resilient AI Infrastructure
Officials from the Transportation Department, Government Accountability Office and CDW will discuss how agencies are navigating the transition from experimental AI to scalable, production-grade systems that deliver tangible ROI without requiring a "rip and replace" of existing legacy assets.
22m watch -
What’s Coming in Federal IT in 2026
Agency leaders are operationalizing AI and modernizing legacy software to meet the demands of a changing government landscape.
6m listen -
Taka Ariga’s Take on Scaling AI After Public Service
Former federal AI leader Taka Ariga shares lessons from GAO and OPM and offers advice for feds navigating career transitions and AI adoption.
7m watch