OpenAI reportedly hired workers in Kenya – screening tens of thousands of text samples for sexist, racist, violent and pornographic content – to help make its ChatGPT model less toxic.
Released last November, ChatGPT has taken the internet by storm. Its ability to generate text automatically given an input prompt has led to millions of users instructing it to perform all types of different tasks – telling jokes, writing code, answering questions and more.
Not all of those instructions have been entirely benign – we’re only human after all. However, ChatGPT is designed to be more conversational and safer than its predecessor GPT-3 – it can admit to errors and refuse to carry out inappropriate requests.
In order to learn the characteristics of offensive and abusive language, ChatGPT needed to be trained on giant amounts of clean, labeled data showing the difference between safe and harmful content.
Labeling data is tedious and time consuming. The work is typically outsourced to contractors that recruit employees from countries where labor is cheaper. In 2021, OpenAI reportedly signed three contracts worth about $200,000 with Sama – a startup providing data annotation services based in San Francisco and operating in developing countries – to label text to train ChatGPT, according to a report in Time this week.
Sama then recruited three dozen workers in Kenya who were split into three groups, each tasked with combing through thousands of text samples containing sex abuse, hate speech, and violence.
Scraped from the internet, the text described all sorts of dangerous, illegal and lewd acts including murder, suicide, torture and incest. Some employees reported being traumatized from having to process so much horrific content. One man said he suffered from recurring visions after reading a passage describing a man having sex with a dog in the company of a young child.
“That was torture,” he said. “You will read a number of statements like that all through the week. By the time it gets to Friday, you are disturbed from thinking through that picture.”
The workers endured nine-hour shifts, and made between 163 and 248 Kenyan Shillings per hour. That’s about $1.32 to $2 – genuinely insignificant sums for OpenAI, which is predicted to turnover $200 million in 2023.
Another data labeling contract not related to ChatGPT involved Sama asking employees to find nasty, pornographic and violent images portraying things like death, rape and bestiality. The content – some of which would be illegal in the US – allegedly prompted Sama to end its contracts with OpenAI by February 2022, eight months before originally planned. Employees recruited by Sama were reportedly told that their work with OpenAI was canceled after the startup faced harsh criticism for working with Meta’s Facebook on another content moderation project.
“Sama ended the OpenAI contract because the team in East Africa was not comfortable with the requested work. The Meta contract end is separate, but related. After consulting with its global team, Sama made the decision to exit all content moderation work as it did not align with the company’s vision and mission,” a company spokesperson told us.
The upstart will end all data labeling projects for content moderation in March 2023 and has set up an “ethics guild” – a group of employees who will review work requests, we’re told.
In statements shared with Time, OpenAI confirmed it had worked with Sama to label data used to train ChatGPT and said there was a miscommunication about the types of images they wanted to collect. It reportedly did not want illegal images and did not view them.
The value of a dollar
“To clarify, Sama pays between 26,600 and 40,000 Kenyan Shillings ($209 to $322) per month, which is more than double the minimum wage in Kenya and also well above the living wage,” a spokesperson from Sama told The Register.
“To compare it to US wages is a false equivalence that mischaracterizes the situation. A comparative Western wage would be between $30 and $45 per hour. Sama pays almost double what other content moderators in the area pay, and offers a full benefits and pension package .
“Our mission is to ensure artificial general intelligence benefits all of humanity, and we work hard to build safe and useful AI systems that limit bias and harmful content. Classifying and filtering harmful [text and images] is a necessary step in minimizing the amount of violent and sexual content included in training data and creating tools that can detect harmful content.”
Data labeling businesses like Sama say they are helping lift people out of poverty in poorer countries, but Time‘s investigation is a stark reminder that the seemingly magical abilities of AI models are built on the back of low-cost labor.
“Sama’s mission is to break down barriers to formal employment by giving work to talented people who may not otherwise have equal career opportunities. Sama employs people who would not otherwise have the qualifications for entry-level tech jobs, then trains them – not just for a job, but a career path – by offering continuing education classes, CV writing classes, financial education classes and opportunities to advance inside or outside the organization,” a spokesperson told us in a statement.
“Sama has impacted over 60,000 people, sent 20 people to university through our scholarship programs, and provided $160,000 in funding for employee businesses and startups.”
Meanwhile, the hype of artificial general intelligence continues to build. Even the best AI companies haven’t yet cracked the secret of building models advanced enough to learn patterns from data effectively with little to no human supervision.
The technology may have progressed, but it still relies on workers from developing countries sitting in front of computer screens performing repetitive and monotonous tasks all day to train giant AI models for tech companies making millions of dollars.
The Register has asked OpenAI for comment. ®