Facebook's language gaps make it harder to screen for hate and terrorism

The hashtag #AlAqsa was briefly removed by Instagram last May as tensions rose across the Middle East. It refers to the Al-Aqsa Mosque located in Jerusalem's Old City. This was a flashpoint in the Gaza war.

Facebook's language gaps make it harder to screen for hate and terrorism

Facebook, the owner of Instagram, later apologized. It explained that its algorithms had misunderstood the third-holiest place in Islam for militant group Al-Aqsa Martyrs Brigade. This is an armed offshoot from the secular Fatah party.

It was the latest example of how Facebook shuts down political speech in the Middle East, according to many Arabic-speaking users. Arabic is one of the most used languages on Facebook's platforms. The company frequently issues public apology after similar content removals.

Now, internal company documents from the former Facebook product manager-turned-whistleblower Frances Haugen show the problems are far more systemic than just a few innocent mistakes, and that Facebook has understood the depth of these failings for years while doing little about it.

These errors aren't limited to Arabic. The files reveal that terrorist content and hate speech flourish in the most volatile areas of the globe because there are not enough moderators who can speak the local language and understand cultural contexts. And its platforms have failed to develop artificial-intelligence solutions that can catch harmful content in different languages.

These loopholes allow inflammatory language to thrive on the platform in countries like Afghanistan or Myanmar. In Syria and the Palestinian territories, Facebook prohibits ordinary speech and imposes blanket bans on common terms.

"The problem is that the platform wasn't built with the intention of one day mediating the political speech of all people in the world," Eliza Campbell, director at the Middle East Institute Cyber Program, stated. Moderation, despite the political importance and resources Facebook has, is an incredibly under-resourced project.

This story, like others, is based upon Haugen's disclosures made to the Securities and Exchange Commission. These disclosures were also provided by Haugen's legal team in redacted form to Congress. A consortium of news agencies, including The Associated Press, reviewed the redacted versions.

A Facebook spokesperson stated that the company had invested in local language and topic expertise over the past two years to increase its review capacity all around the globe.

The company stated that there is still much to be done in Arabic content moderation. We do research to understand the complexity of this issue and find ways to improve.

Myanmar is a country where Facebook-based misinformation has been repeatedly linked to ethnic and religious violence. The company admitted in internal reports that they had not stopped hate speech targeting the Rohingya Muslim minority.

In 2018, Facebook publicly announced that 100 native Myanmar speakers would be recruited to its platforms to guard against the Rohingya's persecution. The U.S. called it ethnic cleansing. The company did not disclose how many content moderators they hired, nor which dialects they covered.

Global Witness, a rights group representing Myanmar's rights, said that Facebook's recommendation algorithm continues to amplify army propaganda and other content that violates the company’s Myanmar policies despite its public promises and numerous internal reports. This was despite the fact that the company has made many public promises.

The documents from India show that Facebook employees debated last March whether they could crack down on "fear mongering and anti-Muslim narratives", which Prime Minister Narendra Modi's Hindu nationalist group Rashtriya Swayamsevak Sangh broadcasts on its platform.

The company noted in one document that Modi's party members had set up multiple accounts to accelerate the spread of Islamophobic material. The research revealed that much of the content was not flagged or taken action on by Facebook because it did not have moderators or automated filters that could speak Bengali and Hindi.

Facebook's automated systems as well as human moderators face particular difficulties in Arabic. Each of them struggles to understand the spoken dialects unique each country and region. Their vocabularies are also influenced by different cultural contexts and historical influences.

For example, the Moroccan colloquial Arabic includes Berber and French words and is spoken using short vowels. Egyptian Arabic however, has some Turkish from the Ottoman conquest. Others dialects are closer in comparison to the Quran's official version. These dialects may not be mutually comprehensible in some cases. There is also no standard way to transcribe colloquial Arabic.

Facebook gained a huge following in the Middle East in 2011 during the Arab Spring uprisings. Users credited the platform for providing an opportunity for free expression in a region that is tightly controlled by autocratic governments and news sources. However, this reputation has been changing in recent years.

Numerous activists and journalists from Palestine have had their accounts deleted. The archives of the civil war in Syria have been deleted. A vast vocabulary of everyday words has been made unavailable to Arabic speakers, the third most popular language on Facebook with millions of users around the world.

The first message sent to Hassan Slaieh (a prominent journalist from the Gaza Strip blockaded), felt like a punch in the gut. The notification stated that "Your account was permanently disabled because you violated Facebook's Community Standards." This was after years of his posts about violence between Israel, Hamas and Israel being flagged as content violations.

He lost all of his six-year-old memories, personal stories, and photos of Gazans' lives. It was less shocking that his Facebook page was taken down last year. He had to start over for the 17th time.

He tried to be clever. He had learned, like many Palestinians to avoid Arabic words for "martyr", "prisoner" and references to Israel's militaristic occupation. He would add symbols or spaces between letters if he was referring to militant groups.

Others in the region are using a more sophisticated approach to fooling Facebook's algorithms. They use an ancient Arabic script that does not contain dots or marks that allow readers to distinguish between letters that are identical. According to internal documents, this writing style was common before Arabic learning boomed with the spread Islam. It has been able to bypass hate speech censors on Facebook’s Instagram app.

However, Slaieh's methods didn't pass the test. Slaieh believes that Facebook banned him for simply doing his job. He posts photos from Gaza of protesters injured at the Israeli border. Mothers weep over their sons' coffins.

Critique, humor and even simple mentions about groups on the Dangerous Individuals and Organizations List of the company -- which is a docket that was modeled after the U.S. government's equivalent -- can all be grounds for a denial.

"We were wrongly enforcing antiterrorism content in Arabic," a document states. It also notes that the current system "limits users to participating in political speech, impairing their right of freedom of expression."

According to internal documents, the Facebook blacklist also includes Gaza's ruling Hamas party and Hezbollah militant group. This, in turn, results in widespread perceptions of censorship by Facebook employees.

Mai el-Mahdy was a former employee of Facebook who worked in Arabic content moderation up to 2017.

Facebook responded to questions from the AP by saying that it consults independent experts in order to develop moderation policies. It also says that it goes to great lengths "to ensure they are agnostic towards religion, region and political outlook or ideology."

It added, "We know that our systems are imperfect."

It is widely believed that the company's biases and language gaps have caused people to believe that it favors governments over minorities.

Former Facebook employees claim that different governments have put pressure on them, and threatened regulation and penalties. Israel is the only Mideast country where Facebook has a national office. It is a huge source of advertising revenue. The country's former right-wing prime minister Benjamin Netanyahu was previously advised by its public policy director.

Israeli security agencies and watchdogs monitor Facebook, bombarding it with thousands upon thousands of orders to remove Palestinian accounts and posts in an effort to combat incitement.

Ashraf Timeon, Facebook's former head for policy in the Middle East and North Africa, said that "they flood our system, completely overwhelming it." He left Facebook in 2017. "That forces Israel to make mistakes." There is no other region with such an in-depth understanding of Facebook's workings.

Facebook stated in a statement it does not respond to takedown requests from governments differently than those from rights groups or community members. However, it can restrict access to content that is based on local laws.

It stated that any suggestion that content was removed only under Israeli government pressure is totally false.

Journalists and activists covering the opposition in Syria have also complained about censorship. Electronic armies supporting President Bashar al-Assad aggressively flagged dissident content as being removed.

Raed, an ex-journalist at the Aleppo Media Center (an antigovernment activist group and citizen journalist in Syria), claimed that Facebook had erased most his documentation about the government's shelling of neighborhoods and hospitals.

He said, "Facebook always tells me we break the rules," but he didn't give his full name out of fear of reprisals.

Many users in Afghanistan cannot comprehend Facebook's rules. An internal report from January stated that Facebook had not translated hate speech and misinformation pages into Pashto and Dari, which are the most common languages in Afghanistan. English is not widely used in Afghanistan.

The drop-down menus in English are not available for Afghan users who want to flag hate speech posts. The Community Standards page is also in English. Also, the site doesn't have a database of hate speech terms, slurs, and code words from Afghanistan that can be used to moderate Dari or Pashto content. This is a common practice elsewhere. Facebook cannot build automated filters to catch the most serious violations of the law in Afghanistan without this word bank.

Internal Facebook documents revealed that engineers were primarily interested in messages and posts written in English when they investigated the abuse of Middle East domestic workers. The flagged words list didn't include Tagalog which is the main language in the Philippines where many housemaids and domestic workers are from.

In much of the Arab world, the opposite is true -- the company over-relies on artificial-intelligence filters that make mistakes, leading to "a lot of false positives and a media backlash," one document reads. Incompetent human moderators tend to accept takedown requests passively, rather than screening proactive.

Sophie Zhang, a former Facebook employee-turned-whistleblower who worked at the company for nearly three years before being fired last year, said contractors in Facebook's Ireland office complained to her they had to depend on Google Translate because the company did not assign them content based on what languages they knew.

Facebook has outsourced most of its content moderation to large companies, which employ workers from all over the world, including in Morocco, Casablanca, and Germany. These firms do not sponsor Arabic work visas, which limits the pool to local hires in precarious circumstances -- mostly Moroccans who have exaggerated their linguistic abilities. One document stated that 77% of them flagged inoffensive Arabic posts as terrorist content, often because they get lost in translations of Arabic's 30-odd dialects.

Another document states that "these reps should never be fielding content in non-Maghreb regions, however, right now it's commonplace", referring to North Africa, which includes Morocco. Further, the file states that Casablanca's office claimed it could handle all dialects of Arabic in a survey. A report stated that reviewers incorrectly flagged 90% of Egyptian dialect content in one case.

The region's highest reported number of hate speech via Facebook is Iraq. One document stated that reviewers have little to no knowledge of the dialect in Iraq.

"Journalists try to expose human rights violations, but we just get blocked," said a Baghdad-based press freedom activist who spoke under anonymity out of fear of reprisals. "We know Facebook attempts to limit the influence militias, but it's not effective," he said.

Linguists criticized Facebook's system for failing to recognize the region's vast variety of colloquial languages that Arabic speakers translate in different ways.

Enam al-Wer is an English professor of Arabic Linguistics at the University of Essex. He said that the stereotype that Arabic is one entity is a problem because of the language's "huge differences" between countries, class, gender, religion, and ethnicity.

These problems aside, moderators remain at the forefront of what makes Facebook an important arbiter for political expression in a turbulent region.

Even though the Haugen documents predate the Gaza war of this year, the 11-day episodes show how little was done to address the issues raised in Facebook's internal reports.

Activists in Gaza, West Bank and elsewhere lost the ability to livestream. Newsfeeds are a primary source of information for many people, and whole archives from the conflict disappeared. When they posted about Palestinians, influencers who were used to getting tens of thousand of likes on posts saw their reach plummet.

Soliman Hijjy from Gaza said, "This has restrained and prevented me feeling free to publish anything I want for fear that my account will be lost." His aerials of the Mediterranean Sea have been viewed thousands more than his Israeli bombs images. This is a common phenomenon when photos violate community standards.

During the conflict, hundreds of complaints were filed by Palestinian activists to Facebook. This led to the company often admitting error and restoring accounts and posts.

Facebook stated in internal documents that it had made errors in almost half of the Arabic language takedown requests.

It stated that "repeating false positives causes a great drain of resources."

Facebook's semi-independent oversight body called for an impartial investigation into its Arabic and Hebrew content moderation. It announced the reversal last month of one Palestinian post. According to the policy advisory statement, it called for improvements in the broad terrorist blacklist to "increase comprehension of the exceptions to neutral discussion and condemnation and news reporting."

Facebook internal documents stressed that it was important to "enhance" algorithms and to enlist more Arab moderators (from less-represented countries) and to limit them to areas where they have the appropriate dialect expertise.

The report stated that "with the potential severity of offline harm and the size of the Arabic userbase, it is certain of the greatest importance to allocate more resources to the task of improving Arabic systems."

However, the company lamented that there is no clear mitigation strategy.

Many in the Middle East are concerned that Facebook's failures could have a significant impact on their lives. They can increase inequality, suppress civic activism, and encourage violence.

"We asked Facebook: Do people want to share their experiences on social media platforms or do you want them to close them down?" Husam Zomlot (the Palestinian envoy to Britain), said that they had recently discussed Arabic content suppression in London with Facebook officials. "If you remove people's voices the alternatives will be even worse."