A Microsoft supervisor claims OpenAI’s DALL-E 3 has safety vulnerabilities that would permit customers to generate violent or specific photos (related to those who not too long ago focused Taylor Swift). GeekWire reported Tuesday the corporate’s authorized crew blocked Microsoft engineering chief Shane Jones’ makes an attempt to alert the general public concerning the exploit. The self-described whistleblower is now taking his message to Capitol Hill.
“I reached the conclusion that DALL·E 3 posed a public safety risk and should be removed from public use until OpenAI could address the risks associated with this model,” Jones wrote to US Senators Patty Murray (D-WA) and Maria Cantwell (D-WA), Rep. Adam Smith (D-WA ninth District), and Washington state Lawyer Normal Bob Ferguson (D). GeekWire revealed Jones’ full letter.
Jones claims he found an exploit permitting him to bypass DALL-E 3’s safety guardrails in early December. He says he reported the difficulty to his superiors at Microsoft, who instructed him to “personally report the issue directly to OpenAI.” After doing so, he claims he realized that the flaw might permit the technology of “violent and disturbing harmful images.”
Jones then tried to take his trigger public in a LinkedIn publish. “On the morning of December 14, 2023 I publicly published a letter on LinkedIn to OpenAI’s non-profit board of directors urging them to suspend the availability of DALL·E 3),” Jones wrote. “Because Microsoft is a board observer at OpenAI and I had previously shared my concerns with my leadership team, I promptly made Microsoft aware of the letter I had posted.”
Microsoft’s response was allegedly to demand he take away his publish. “Shortly after disclosing the letter to my leadership team, my manager contacted me and told me that Microsoft’s legal department had demanded that I delete the post,” he wrote in his letter. “He told me that Microsoft’s legal department would follow up with their specific justification for the takedown request via email very soon, and that I needed to delete it immediately without waiting for the email from legal.”
Jones complied, however he says the extra fine-grained response from Microsoft’s authorized crew by no means arrived. “I never received an explanation or justification from them,” he wrote. He says additional makes an attempt to be taught extra from the corporate’s authorized division had been ignored. “Microsoft’s legal department has still not responded or communicated directly with me,” he wrote.
An OpenAI spokesperson wrote to Engadget in an e mail, “We immediately investigated the Microsoft employee’s report when we received it on December 1 and confirmed that the technique he shared does not bypass our safety systems. Safety is our priority and we take a multi-pronged approach. In the underlying DALL-E 3 model, we’ve worked to filter the most explicit content from its training data including graphic sexual and violent content, and have developed robust image classifiers that steer the model away from generating harmful images.
“We’ve also implemented additional safeguards for our products, ChatGPT and the DALL-E API – including declining requests that ask for a public figure by name,” the OpenAI spokesperson continued. “We identify and refuse messages that violate our policies and filter all generated images before they are shown to the user. We use external expert red teaming to test for misuse and strengthen our safeguards.”
In the meantime, a Microsoft spokesperson wrote to Engadget, “We are committed to addressing any and all concerns employees have in accordance with our company policies, and appreciate the employee’s effort in studying and testing our latest technology to further enhance its safety. When it comes to safety bypasses or concerns that could have a potential impact on our services or our partners, we have established robust internal reporting channels to properly investigate and remediate any issues, which we recommended that the employee utilize so we could appropriately validate and test his concerns before escalating it publicly.”
“Since his report concerned an OpenAI product, we encouraged him to report through OpenAI’s standard reporting channels and one of our senior product leaders shared the employee’s feedback with OpenAI, who investigated the matter right away,” wrote the Microsoft spokesperson. “At the same time, our teams investigated and confirmed that the techniques reported did not bypass our safety filters in any of our AI-powered image generation solutions. Employee feedback is a critical part of our culture, and we are connecting with this colleague to address any remaining concerns he may have.”
Microsoft added that its Workplace of Accountable AI has established an inner reporting instrument for workers to report and escalate considerations about AI fashions.
The whistleblower says the pornographic deepfakes of Taylor Swift that circulated on X final week are one illustration of what related vulnerabilities might produce if left unchecked. 404 Media reported Monday that Microsoft Designer, which makes use of DALL-E 3 as a backend, was a part of the deepfakers’ toolset that made the video. The publication claims Microsoft, after being notified, patched that exact loophole.
“Microsoft was aware of these vulnerabilities and the potential for abuse,” Jones concluded. It isn’t clear if the exploits used to make the Swift deepfake had been instantly associated to these Jones reported in December.
Jones urges his representatives in Washington, DC, to take motion. He suggests the US authorities create a system for reporting and monitoring particular AI vulnerabilities — whereas defending staff like him who converse out. “We need to hold companies accountable for the safety of their products and their responsibility to disclose known risks to the public,” he wrote. “Concerned employees, like myself, should not be intimidated into staying silent.”
Replace, January 30, 2024, 8:41 PM ET: This story has been up to date so as to add statements to Engadget from OpenAI and Microsoft.