Google is reintroducing the ability for its AI-powered chatbot Gemini to generate images of people, a feature that was suspended earlier this year due to concerns over historical inaccuracies and racial stereotypes. This feature, however, will initially be available only to a select group of users subscribed to Google’s premium Gemini plans, as part of an early access test in English. The reintroduction follows months of delays, despite assurances from top executives that a fix was imminent.
Background: The Suspension and Its Fallout
Earlier this year, Google faced backlash when users reported significant issues with Gemini’s people-generating capabilities. Notably, the AI was criticized for producing historically inaccurate and racially insensitive depictions, such as portraying Roman legions with anachronistically diverse soldiers and rendering “Zulu warriors” as stereotypical representations of Black people. These issues sparked widespread criticism, leading Google to pause the feature and issue an apology. CEO Sundar Pichai and DeepMind co-founder Demis Hassabis promised a quick fix, but the process took much longer than anticipated.
Despite the delay, Google has now announced that Gemini’s people-generating feature is returning, albeit in a limited capacity. Subscribers to Gemini’s Advanced, Business, and Enterprise plans will be the first to test the updated feature, while free-tier users will have to wait. There’s no word yet on when the feature will be available in other languages or to a broader audience.
Google’s Response and Technical Improvements
To address the earlier criticisms, Google has implemented significant changes in how Gemini generates images of people. The core of these improvements lies in the integration of Imagen 3, the latest version of the image-generating model used by Gemini. According to Google, Imagen 3 was trained using AI-generated captions designed to enhance the diversity and fairness of the concepts associated with images. This training approach is intended to mitigate the biases that led to the initial controversies.
Google has also filtered Imagen 3’s training data with a focus on safety and fairness, although the specifics of this data remain somewhat opaque. The company has not disclosed detailed information about the dataset, only stating that it comprises a large collection of images, text, and annotations. Furthermore, Google conducted extensive testing with both internal teams and independent experts to ensure that the new model produces more accurate and fair representations of people.
In addition to these fairness measures, Google has introduced SynthID, a tool developed by DeepMind to embed invisible, cryptographic watermarks into AI-generated media. This technology aims to prevent the misuse of AI-generated images, such as the creation of deepfakes, by allowing outputs to be traced back to their source.
New Features and Limitations
While the reintroduction of people-generating capabilities is the headline feature, Google is also rolling out several other updates to its Gemini platform. All users, regardless of their subscription level, will have access to Imagen 3, though only premium subscribers will benefit from the people-generating feature initially.
Imagen 3 promises to improve the overall quality of AI-generated images, with better understanding of text prompts, fewer artifacts, and enhanced creativity and detail. Google claims that this version of Imagen is the most advanced yet, particularly in rendering text accurately, a common challenge in earlier AI models.
Another major addition is the introduction of “Gems,” customizable AI experts that can assist users with specific tasks, such as brainstorming ideas, providing career guidance, or coding support. Gems are available to premium subscribers and can be created by writing instructions and giving the AI a name. However, unlike OpenAI’s GPT Store, where users can share and access other users’ custom AI models, Google currently has no plans to allow the sharing of Gems.
Implications and Future Directions
From my point of view, Google’s cautious reintroduction of the people-generating feature in Gemini reflects both the complexities of AI development and the company’s commitment to improving its products in response to public feedback. The issues that led to the suspension of this feature highlight the broader challenges of ensuring fairness and accuracy in AI, particularly when dealing with sensitive topics such as race and historical representation.
Google’s decision to limit the availability of this feature to premium subscribers could be seen as a way to control the rollout and gather feedback before a broader launch. This approach might also indicate Google’s recognition of the potential risks associated with AI-generated content, as it carefully monitors the performance of the updated model.
As I see it, the integration of SynthID and the emphasis on fairness in Imagen 3 are positive steps forward. However, the effectiveness of these measures will depend on ongoing scrutiny and refinement. The introduction of Gems also points to a broader trend of AI customization, allowing users to tailor their AI interactions to specific needs. While Google’s decision not to support the sharing of Gems may limit their impact, it suggests a focus on privacy and user-specific customization.
In conclusion, while Google’s efforts to fix and enhance Gemini’s people-generating feature are commendable, the real test will be how well these improvements hold up under public use. The AI community will be watching closely to see if Gemini can deliver on its promises of fairness and accuracy, and whether these updates can restore confidence in Google’s AI capabilities.