Tweaks performance of model that recognises abusive chat.
Riot Games is turning to Apache Spark and Google’s TensorFlow to improve its ability to identify and punish players that use abusive or “toxic” language on in-game chat.
The developer and publisher of League of Legends, an online role-playing game with over 100 million monthly active users, has devoted several years to weeding out “serious toxicity” – hate speech, racism, sexism and other forms of abuse – from its community.
Wes Kerr, a senior data scientist in the player behaviour team, spends his time developing machine learning algorithms to better understand and detect unsportsmanlike behaviour in the game.
Only one percent of the community is “consistently unsportsmanlike” and they are responsible for only five percent of all unsporting behaviour in the game, Kerr told the Spark Summit in San Francisco this week.
“This means that 95 percent is coming from players who otherwise are completely sportsmanlike,” Kerr said.
“They’re having a bad day or a bad game, and they’re going to say things they ultimately regret.
“This shaped how we wanted to take action against these players. We definitely don’t want to remove them from the game but we want to warn them that they need to change in order to be more sportsmanlike in League of Legends.”
The game developer started by trying to understand how gamers chat – the language, acronyms and emojis they use and their meanings.
“League of Legends is a real-time game so players don’t have a lot of time to stop and chat or write long sequences of text,” Kerr said.
“In fact, they’ve adopted a notation of a lot of acronyms like GLHF – good luck have fun.
“The other challenge we face with natural language in League of Legends is this notion of the semantics of the words don’t often correspond with their real-world semantics.”
Riot Games used a neural model called Word2Vec to “dig out the language used by our players” and understand the meaning based on the context in which it was used. This was a critical first step in building a blacklist of language it did not want to see in chat.
The company “grabbed a month of our chat logs” and fed it into Word2Vec. The results were “really interesting”, Kerr said.
The model was able to take an acronym used in-game – for, example GJ, which stands for ‘good job’ – and find all the various “miss-spellings and different ways to say good job”.
It was also used to flush out the different ways that players used terms like qq, which originated in Warcraft as a way of telling players to “quit the game because they’re bad” or relatively unskilled. (It was taken from an Alt+Q+Q command to quit out).
However, QQ is also used as a sad emoji – appearing to resemble crying eyes – so the company wanted to pick off negative connotations while leaving more innocent use in.
Likewise ‘noob’, a common abbreviation of newbie.
“What’s interesting is Word2Vec was able to capture for us all the different misspellings of noob,” Kerr said.
“There are many different misspellings. This is incredibly useful if you want to build up a blacklist of words you don’t want players to use.
“So players are typing really fast, they’re doing it in different ways, but the context are giving us the meanings that we need.”
Once they understood how language was used in game chat, Kerr’s team built a model in R and Python to predict in-game toxicity, and put it into production.
“Since it’s launched, it’s punished over millions of players for bad behaviour in-game,” he said.
The model was tuned for “super high precision” in a bid to limit false positives, which could cause players to be wrongly punished. There was also a limit in how much data Riot Games was able to use to train the model.
That led to some recent work to try to improve it.
“This last quarter we spent some time investigating alternatives, looking into Apache Spark to see if we could improve the performance of this model,” Kerr said.
“And with the excitement around deep learning, we looked at what we could pull off with GPUs and TensorFlow.”
Kerr said Spark – a large-scale data processing engine – allowed Riot Games to vary the amount of training data it fed its model.
“It’s a well-studied result that more training data leads to better performance on your algorithms,” he said.
“In 2001, Microsoft research showed that as you scale up the millions of words that you add to your training dataset, your performance goes up. We replicated that result with our own dataset.
“This gave us the evidence we needed to believe that running this out on Apache Spark would be useful. We could scale out our model complexity and our training data size.”
This has helped the company sheild players from extreme toxicity in games.
“We punish millions of players. We’re going to continue punishing more in hopes that they will learn and not be toxic in game,” Kerr said.
“We can very quickly try out different combinations of algorithms, tokenisation and parameters. We can scale to far larger datasets that we could process before, which has allowed us to improve performance and confidence of our algorithm.”
The company is also starting to explore options to use Google’s TensorFlow “to see where convolutional neural networks could get us”.
Convolutional neural networks are used in natural language processing and other applications, and are modelled on animal visual perception.
Riot Games hopes to use them to “model language and detect toxicity” in the increasing number of languages used by players in-game. For example, the company now has servers in Japan, which are used by players from a number of countries in the region, meaning infinitely more possibilities for abusive language to be used.
“As someone who has to model language and detect toxicity it gets really challenging because players can switch which language they’re using in-game,” Kerr said.
“They can start out talking with their friends in their native language but the other team may be speaking a different language so they can chat with them in that.
“Detecting language and dealing with that language is very tricky.”