To ensure that potential veiled threats are more accurately assessed, in light of Meta’s incorrect interpretation of this content on-escalation, the Board recommends that Meta produce an annual assessment of accuracy for this problem area. This should include a specific focus on false negative rates of detection and removal for threats against human rights defenders, and false positive rates for political speech (e.g., Iran Protest Slogan). As part of this process, Meta should investigate opportunities to improve the accurate detection of high-risk (low-prevalence, high impact) threats at scale.
The Board will consider this implemented when Meta shares the results of this assessment, including how these results will inform improvements to enforcement operations and policy development.
Commitment Statement: Conducting an “accuracy” assessment is challenging as the final assessment is the result of complex factors that may be specific to a regional, historical, or otherwise situational context as opposed to direct or explicit threats of violence, which can be reviewed at scale. However, we will work with our enforcement teams to assess ways that we can refine how content is surfaced for veiled threats assessment.
Considerations: In part due to prior recommendations from the Oversight Board, such as A Veiled Threat of Violence Based on Lyrics from a Drill Rap Song recommendation #2, we have refined the language in our Violence and Incitement Community Standard related to coded statements—including veiled or implicit threats. This refined language outlines that, on escalation and with additional context, we may remove coded statements where the method of violence is not clearly articulated but the threat is veiled or implicit as indicated by a number of signals. These signals, which are coded/indirect versions of similar signals that our reviewers use to determine whether or not to escalate a threat, which may include references to specific locations, or historical incidents of violence, for example. We also share examples to help explain each of these signals. In addition to these signals, we also require a contextual signal such as local context of imminent violence, the target or an authorized representative such as a local NGO reporting the content, or context that the target is a child.
Assessment of these instances requires nuance and situational awareness, which is why we only apply these on escalation by specialized teams and with contextual information, as opposed to direct threats which may be reviewed at scale.
For example, if one user shares a coded statement in a retaliatory context that implies another person ‘will pay for what they have done, I know where to find you,’ our escalation teams will review the post in its entirety, taking in any contextual signals around the post and any known context about the user or the other person they’re referring to from our internal and external stakeholders to determine whether their statement is indeed a veiled threat.
While reviewers at scale may assess direct threats based on the presence of a target, a method, and a few other clear signals, escalation teams will require further investigation and collaboration with internal experts before enforcing on potential veiled threats.
As we assess the feasibility of this recommendation, our Policy team will partner with escalation teams to evaluate samples of content escalated for potential veiled threats to consider opportunities to refine the application of this framework.