Improve automated detection of images with text overlay so that posts raising awareness of breast cancer symptoms are not wrongly flagged for review. Meta should also improve its transparency reporting on its use of automated enforcement.
Our commitment: We agree we can do more to ensure our machine learning models don’t remove the kinds of nudity we allow (e.g., female nipples in the context of breast cancer awareness). We commit to refining these systems by continuing to invest in improving our computer vision signals, sampling more training data for our machine learning, and leveraging manual review when we’re not as confident about the accuracy of our automation.
Considerations: Facebook uses both: 1) automated detection systems to flag potentially violating content and “enqueue” it for a content reviewer, and 2) automated enforcement systems to review content and decide if it violates our policies. We want to avoid wrongfully flagging posts both for review and removal, but our priority will be to ensure our models don’t remove this kind of content (content wrongfully flagged for review is still assessed against our policies before any action is taken).
In this case, our automated systems got it wrong by removing this post, but not because they didn’t recognize the words “breast cancer.” Our machine learning works by predicting whether a piece of content violates our policies or not, including text overlays. We have observed patterns of abuse where people mention “breast cancer” or “cervix cancer” to try to confuse and/or evade our systems, meaning we cannot train our system to, say, ignore everything that says “breast cancer.”
So, our models make predictions about posts like breast cancer awareness after “learning” from a large set of examples that content reviewers have confirmed either do or do not violate our policies. This case was difficult for our systems because the number of breast cancer-related posts on Instagram is very small compared to the overall number of violating nudity-related posts. This means the machine learning system has fewer examples to learn from and may be less accurate.
Next steps: We will continue to invest in making our machine learning models better at detecting the kinds of nudity we do allow. We will continue to improve computer vision signals, sampling more training data for our machine learning, and increase our use of manual review when we’re less sure about the accuracy of our automation.