In an effort to promote ethical AI training practices, Jordan Meyer and Mathew Dryhurst, the founders of Spawning AI, have introduced Source.Plus. This innovative project aims to curate “non-infringing” media for AI model training, allowing artists to maintain control over how their works are utilized online.

The Vision Behind Source.Plus
The initiative launched with a dataset of nearly 40 million public domain images and those under the Creative Commons’ CC0 license. These licenses enable creators to waive almost all legal interest in their works, facilitating a high-quality, ethically sourced dataset for AI training. Despite its smaller size compared to other generative AI datasets, Meyer asserts that Source.Plus is robust enough to train state-of-the-art image-generating models.
“With Source.Plus, we’re building a universal ‘opt-in’ platform,” Meyer explained. “Our goal is to make it easy for rights holders to offer their media for use in generative AI training — on their own terms — and frictionless for developers to incorporate that media into their training workflows.”

Navigating the Ethical Landscape
The ethics of training generative AI models, especially art-generating ones like Stable Diffusion and OpenAI’s DALL-E 3, remain hotly debated. These models learn to create outputs by training on vast amounts of relevant data, often scraping from public sources without considering copyright status. This practice has sparked significant controversy, with arguments about fair use on one side and calls for proper compensation and credit on the other.

Meyer believes that the industry has yet to find the best approach to data sourcing. “AI training frequently defaults to using the easiest available data — which hasn’t always been the most fair or responsibly sourced,” he told TechCrunch. “Artists and rights holders have had little control over how their data is used for AI training, and developers have not had high-quality alternatives that make it easy to respect data rights.”
Source.Plus, currently in limited beta, builds on Spawning’s existing tools for art provenance and usage rights management. Previously, Spawning introduced HaveIBeenTrained, a platform allowing creators to opt out of training datasets. With venture capital backing, Spawning developed ai.text to set permissions for AI and Kudurru to defend against data-scraping bots.
Ensuring Fair Compensation and High Standards
Spawning aims to set a new standard for fair data sourcing, distinguishing itself from other organizations that claim to use ethically sourced data. Source.Plus meticulously validates the reported licenses of collected images, excluding those with questionable licenses and those not under CC0.
Historically, datasets have been plagued with problematic content, including violent, pornographic, and personal images. Spawning addresses this issue with classifier models trained to detect and exclude such content. Recognizing the limitations of classifiers, Source.Plus allows users to adjust detection thresholds and employs moderators to verify data ownership.
Meyer emphasized the importance of fair compensation for creators. Most programs compensating creators for their contributions to generative AI training have been criticized for opaque metrics and low payouts. Source.Plus, however, allows artists and rights holders to set their own prices per download, with Spawning charging only a flat rate fee.
“We will provide guidance and recommendations based on current industry standards and internal metrics,” Meyer said. “But ultimately, contributors to the dataset determine what makes it worthwhile to them. We believe this revenue split is significantly more favorable for artists than the more common percentage revenue split and will lead to higher payouts and greater transparency.”
The Future of Source.Plus
If successful, Source.Plus could revolutionize the way generative AI models are trained, expanding beyond images to include audio and video. Spawning is already in discussions with firms to make their data available on the platform and may eventually build its own generative AI models using Source.Plus datasets.
Meyer envisions Source.Plus as a platform that allows rights holders to participate in the generative AI economy fairly and transparently. “We hope that rights holders who want to participate in the generative AI economy will have the opportunity to do so and receive fair compensation,” he said.
As the creative community increasingly demands alternatives to companies perceived as exploitative, Source.Plus offers a promising solution. However, the challenge of scaling up and maintaining ethical standards remains significant. The success of Source.Plus will hinge on Spawning’s ability to consistently act in the best interests of artists and effectively moderate vast amounts of user-generated content.
Only time will tell if Source.Plus can achieve the impact Meyer envisions, but it represents a significant step towards ethical AI training practices.