Gorwa et al. (2020) – Algorithmic content moderation: technical and political challenges in the automation of platform governance Flashcards
Abstract
As government pressure on major technology companies builds, both firms and legislators are
searching for technical solutions to difficult platform governance puzzles such as hate speech and misinformation. Automated hash-matching and predictive machine learning tools – what we
define here as algorithmic moderation systems – are increasingly being deployed to conduct
content moderation at scale by major platforms for user-generated content such as Facebook,
YouTube and Twitter. This article provides an accessible technical primer on how algorithmic
moderation works; examines some of the existing automated tools used by major platforms to
handle copyright infringement, terrorism and toxic speech; and identifies key political and ethical
issues for these systems as the reliance on them grows. Recent events suggest that algorithmic
moderation has become necessary to manage growing public expectations for increased platform
responsibility, safety and security on the global stage; however, as we demonstrate, these
systems remain opaque, unaccountable and poorly understood. Despite the potential promise of
algorithms or ‘AI’, we show that even ‘well optimized’ moderation systems could exacerbate,
rather than relieve, many existing problems with content policy as enacted by platforms for three
main reasons: automated moderation threatens to (a) further increase opacity, making a
famously non-transparent set of practices even more difficult to understand or audit, (b) further
complicate outstanding issues of fairness and justice in large-scale sociotechnical systems and
(c) re-obscure the fundamentally political nature of speech decisions being executed at scale.
Turning to AI for moderation at scale
Automated moderation systems have become necessary to manage growing public
expectations for increased platform responsibility, safety and security
- But, these systems remain opaque, unaccountable and poorly understood
* The goal of this article is to provide an accessible primer on how automated moderation
works
What is algorithmic moderation?
- Content moderation à governance mechanisms that structure participation in a
community to facilitate cooperation and prevent abuse - In this understanding, moderation includes not only the administrators or moderators
with power to remove content or exclude users, but also the design decisions that
organize how the members of a community engage with one another - Algorithmic commercial content moderation (algorithmic moderation) à a system that
classify user-generated content based on either matching or prediction, leading to a
decision and governance outcome (e.g., removal, geo-blocking, account takedown) - Hard moderation systems à systems that make decisions about content and accounts
- The focus of this paper lies on the hard moderation systems
- Soft moderation systems à recommender systems, norms, design decisions,
architectures, etc.
A primer on the main technologies involved in algorithmic moderation
- Algorithmic content moderation involves a range of techniques from statistics and
computer science, which vary in complexity and effectiveness - They all aim to identify, match, predict, or classify some piece of content (text, audio,
image, video, etc.) on the basis of its exact properties or general features - There are some major differences in the techniques used depending on the kind of
matching or classification required, and the types of data considered: - A distinction between systems that aim to match content à ‘is this file depicting the
same image as that file?’ - A distinction between systems that aim to classify or predict content as belonging to
one of several categories à ‘is this file spam? Is this text hate speech?’
Hashing:
The process of transforming a known example of a piece of content into a ‘hash’ – a string of data meant to uniquely identify the underlying content
- They are useful because they are easy to
compute and smaller in size than the underlying content, so it is easy to compare any given hash against a large table of existing hashes to see if it matches any of them
- Secure cryptographic has functions
à aim to create hashes that appear to be random,
giving away no clues about the content from which they are derived
- They are useful for checking the integrity of a piece of data or code to make sure that
no unauthorized modifications have been made
- For example, if a software vendor publishes a hash of the software’s installation file,
and the user downloads the software from somewhere where it may have been
modified, the user can check the integrity by computing the hash locally and
comparing it to the vendor’s
- Cryptographic hash functions are not useful for content moderation, because they are
sensitive to any changes in the underlying content, such that a minor modification
(changing the color of one pixel in an image) will result in a completely different hash
value
Perceptual hashing
Involves fingerprinting certain perceptually salient features of content, such as corners in images or hertz-frequency over time in audio
- This type of hashing can be more robust to changes that are irrelevant to how
humans perceive the content
Classification
assesses newly uploaded content that has no corresponding previous version in a database
- The aim is to put new content into one of a number of categories
- Modern classification too
Machine learning (the automatic induction of statistical patterns from data)
- One of the main branches of machine learning is supervised learning: models are
trained to predict outcomes based on labelled instances (offensive/not offensive) - Content classification à based on manually coded features
It is hard to identify the context of a text or word, when using this type of classification
Bag of words
treats all of the words in a sentence as features, ignoring order and
grammar
Word embeddings
Represent the position of a word in relation to all the other words that usually appear around it
- Semantically similar words therefore have similar positions
Matching and classification have some important differences:
- Matching requires a manual process of collating and curating individual examples of
the content to be matched (particular terrorist images) - Classification involves inducing generalizations about features of many examples from a given category into which unknown examples may be classified (terrorist images in general)
An algorithmic moderation typology
- The specific fashion in which these matching or predictive systems are deployed depends
greatly on a variety of factors, including: - The type of community
- The type of content it must deal with
- The expectations placed upon the platform by various governance stakeholders
- Automated tools are used by platforms to police content across a host of issue areas at
scale, including terrorism, graphic violence, toxic speech (hate speech, harassment and
bullying), sexual content, child abuse, and spam or fake account detection
Once content has been identified as a match, or is predicted to fall into a category of
content that violates a platform’s rules, there are several possible outcomes:
- Flagging: content is placed in either a regular queue, indistinguishable from user flagged content, or in a priority queue where it will be seen faster, or by specific ‘expert’
moderators - Deletion: content is removed outright or prevented from being uploaded in the first
place
- Fully automated decision-making systems that do not include a human-in-the-loop are
dangerous
Copyright
Content ID: is unique in that it allows copyright holders to upload the material that will be (a) searched against existing content on YouTube and (b) added to a hash database
and used to detect new uploads of that content
- In the copyright context, the goal of deploying automatic systems is not only to find identical files but also to identify different instances and performances of cultural
works that may be protected by copyright - A key concern in the deployment of automated moderation technologies in the context of
copyright is systematic over-blocking