WeCounterHate

Objectives

With the advent of social media, technology gave hate a way to spread like never before. Twitter became a powerful weapon for those promoting hateful ideals.

We decided to use the same technology that empowered hate, to stop it.

That was the genesis of #WeCounterHate, a people-powered, machine-learning platform created to stop the spread of hate speech on Twitter, one retweet at a time.

First, AI helps identify tweets containing hate speech. Once identified, they're tagged with a reply. This permanent marker lets those looking to spread hate know that retweeting will commit a donation to a nonprofit that fights for inclusion, diversity and equality.

Potential retweeters are presented with a decision: Don't retweet the hateful ideology (thus stopping its spread), or retweet it and financially benefit a nonprofit organization they're opposed to (Life After Hate). Either way, love wins.

#WeCounterHate is funded by everyday people donating their hard-earned money so it can be used to fight the spread of hate speech.

Strategy and Execution

The goal of #WeCounterHate is simple: cause people to think twice about spreading hate speech on Twitter by making retweets of hate benefit a nonprofit that fights against it.

Hate speech is difficult to parse because of its sheer volume online (one hate tweet every two seconds) and because the context is riddled with subtleties and impacted by current events. The hate speech conversations that we tracked came from nine different countries, covering six different targeted groups, and used three times as many words compared to normal language.

We needed to ingest and analyze this universe of data using a variety of partners to unpack the terms used as well as their intensity and toxicity. Our approach integrated three separate technologies to synthesize and report on hate speech in near real time.

In the training phase, we had former white supremacists convey their personal knowledge about hate on social media. We used this information to help train our AI. We also used a more volume-centered sourcing technique, achieved by scraping extremist threads on Reddit and 4chan, as well as extremist Twitter accounts. We opted for human moderation at the end of the process, to avoid countering speech that wasn't truly hateful.

The magic happens when we mark a tweet containing hate speech with a simple reply. People are unable to delete replies without deleting their entire post, so it becomes a permanent marker for all to see.

We use that marker to let those thinking about retweeting know that doing so will benefit Life After Hate—a nonprofit whose mission is to rehabilitate individuals who have lived a life of hate, and to point them down a better path.

The fully integrated machine was designed to reduce the throughput of content to the human moderator to a manageable volume. It has done so with 91% accuracy.

Results

When #WeCounterHate responds to a hate tweet, it reduces the spread of that hate by an average of 54%, and 19% of the "hatefluencers" delete the tweet outright. It all equates to more than 8MM fewer people being exposed to hate speech (at the time of this writing), essentially making for a hugely successful anti-media plan.

The platform has radically outperformed expectations of identifying hate speech (91% success) relative to a human moderator, and we are continuing to improve the model.

MM fewer people being exposed to hate speech (at the time of this writing), essentially making for a hugely successful anti-media plan.

Our hope is to continue to counter hate speech online, while collecting insightful data about how hate speech online propagates. This data will allow experts in the field to address the hate speech problem at a more systemic level.