Those same images have made it easier for AI systems to produce realistic and explicit imagery of fake children as well as transform social media photos of fully clothed real teens into nudes, much to the alarm of schools and law enforcement around the world.

Until recently, anti-abuse researchers thought the only way that some unchecked AI tools produced abusive imagery of children was by essentially combining what they’ve learned from two separate buckets of online images — adult pornography and benign photos of kids.

But the Stanford Internet Observatory found more than 3,200 images of suspected child sexual abuse in the giant AI database LAION, an index of online images and captions that’s been used to train leading AI image-makers such as Stable Diffusion. The watchdog group based at Stanford University worked with the Canadian Centre for Child Protection and other anti-abuse charities to identify the illegal material and report the original photo links to law enforcement.

  • agent_flounder@lemmy.world
    link
    fedilink
    English
    arrow-up
    8
    arrow-down
    3
    ·
    7 months ago

    In this particular case, there are three organizations involved.

    First you have LAION, the main player in the article, which is a not for profit org intended to make image training sets available broadly to further generative AI development.

    “Taking an entire internet-wide scrape and making that dataset to train models is something that should have been confined to a research operation, if anything, and is not something that should have been open-sourced without a lot more rigorous attention,” Thiel said in an interview.

    While they had some mechanisms in place, 3200 CSA images slipped by them and became part of the 5 billion images in their data set. That’s on them, for sure.

    From the above quote it kinda sounds like they weren’t doing nearly enough. And they weren’t following best practices.

    The Stanford report acknowledged LAION’s developers made some attempts to filter out “underage” explicit content but might have done a better job had they consulted earlier with child safety experts.

    It also doesn’t help that the second organization, their upstream source for much of their data, Common Crawl seems not to do anything and passes the buck to its customers.

    … Common Crawl’s executive director, Rich Skrenta, said it was “incumbent on” LAION to scan and filter what it took before making use of it.

    And of course third we have the customers of LAION, a large influencer of which is Stability AI and apparently they were late to the game in implementing filters to prevent generation of CSAM using their earlier model which, though unreleased, was integrated into various tools.

    “We can’t take that back. That model is in the hands of many people on their local machines,” said Lloyd Richardson, director of information technology at the Canadian Centre for Child Protection, which runs Canada’s hotline for reporting online sexual exploitation.

    So it seems to me the excuses of these players is “hurr durr, I guess I shoulda thunked of that before durr”. As usual, humans leap into shit without sufficiently contemplating negative outcomes. This is especially true for anything technology related because it has happened over and over and over again countless times in the decades since the PC revolution.

    I for one am exhausted by it and sometimes, like now after reading the article, I just want to toss it out the window. Yup, it’s about time to head to my shack in the woods and compose some unhinged screeds on my typer.