Google Arts app has a data problem

Over the past few weeks, if you’ve seen your friends post their pictures alongside famous Renaissance paintings, here’s what’s going on. The Google Arts and Culture app (available on iOS and Android), which has been around for two years, shot to the top of major app stores because of a new – hard to find – feature that it just added.

This feature lets you take a selfie and utilises the database of art collections available to Google’s “Cultural Institute”. In a matter of seconds, using artificial intelligence and facial recognition techniques, the app serves you a portrait that is the most similar to your face. In short, the Google Arts and Culture app shows you various museum portraits you most resemble along with a match percentage on each. The results have been delightful, fun, educational, hilarious and problematic.

But as a diverse range of users started reporting their results, some started noticing how the Google Arts & Culture app has a race problem.

What they meant is that regardless of the user’s racial or ethnic background, and often gender, the app overwhelmingly matches light skinned users to portraits of European white men. In other cases, it matched many users with any Asian heritage to a limited range of Asian portraits available to Google’s collection. In other grey cases, as the Mashable article linked above shows, the app served mixed results, matching a Mexican-American to a European white portrait and then in lesser degree, to the picture of a Japanese man. Predictably, given that nobody knows what exactly is “under the hood” of this algorithmic app, the Internet is full of people testing the app’s limits, producing ‘folksonomies’ or collaboratively pieced understandings of how the app works and why they are seeing the results they do.

In my own experience of using the app, I was matched to the ‘Venetian Girl’, an 1880 oil painting by American painter Frank Duveneck. After moving my face, trying different angles and lighting, I kept getting the same portrait but with higher confidence (I stopped at 60% match).

After a few minutes of meddling to eke out an Indian portrait match, I succeeded, as you can see below, only to get a painting by the famed colonial aristocratic painter, Raja Ravi Varma. For those who might not be familiar, Ravi Varma’s style and iconography – often described as embodying “European technique with Indian sensibilities” – represent what is commonly identified as “Indian” in the Western imagination even today. In that sense, to be matched to a Ravi Varma painting is akin to being recast as a colonial subject: to be re-placed in a world that continues to reduce non-white bodies to a handful exotic representations; a world that while making space for non-human imagination, somehow retains select engagement with what human looks like.

The author’s image was matched to ‘Venetian Girl’, a painting by American painter Frank Duveneck, 1880. Other poses threw up matches from other regions.

There are also other ways in which the app fails completely and dangerously. For instance, when I stuck my tongue out and attempted a portrait-match again, the algorithms interpreted my tongue as a pronounced lower lip and started serving me (presumably) African portrait matches and one Native American face. Someone also pointed out in a tweet that non-white art matches came from street art collections, indirectly referring to the problematic histories of art forms and how they have selectively privileged certain subjects.

A lot has been written in the past few weeks about how the Google app’s failings expose the inherent racial constructions of visual technologies, such as the Shirley cards — images of a pale-skinned woman used in early photography as reference materials for technicians to do colour-balancing. It wasn’t until complaints from wood furniture and chocolate companies, that Kodak finally modified its film emulsion stocks to enable the photography of darker skin colours. In fact, they described it as film that could “photograph the details of a dark horse in low light” that also happened to be conducive to capturing brown skin colour.

In what can only described as cruel irony, Google’s photo algorithms recently ended up labelling two black people’s faces as ‘gorillas’, hinting at how easily non-white bodies can slip out of their very human-ness based on the very ways in which our visual technologies and, consequently, our fields of visual representation are constituted.

Raja Ravi Varma’s ‘The Maharashtrian Lady’

Historically, classifying non-white bodies as less human, “savage” and less evolved has been at the heart of colonial and imperial governance projects that employed “observable facts” such as skull size, skin colour and facial features as evidence. For instance, past colonial photographic surveys such as the ‘People of India’ (an eight-volume photographic record of various caste and tribe groups in the subcontinent with short descriptions of their “types” and cultural peculiarities) performed a very similar function to what new facial recognition technologies do and also how they do it — the essential task of describing people and consequently, turning them into types.

For instance, Microsoft’s drawing bot can now draw realistic images based on textual descriptions and going back to Google Arts app, the algorithm “distinctly isolates facial features…to create a faceprint…a long string of unique numbers to identify each person’s face”. What the apps and the colonial photographs do, in juxtaposing certain words or images against other images, is that they reproduce physical bodies or objects into specific kinds of social bodies and meaningful objects.

To be fair, the Google Arts app is not the real culprit. It is only the newest in the list of problematic ‘surfaces’ that hark back to fundamental tensions or consequences of a post-database world. As media theorist Mark Poster warned us, the database isn’t merely an object but a discourse itself — in its ability to hold together entities in relational possibilities, the database suddenly affords a set of possibilities of “subjection”, of constituting subjects that didn’t exist before. Not just the database behind Google’s art app but also art collections and survey records: all are central to how we see, know and understand ourselves and others around.

An image from ‘The People of India’, a photo series of Indian races and tribes, by J. Forbes Watson and John William Kaye between 1868 – 1875

This brings me to what I think needs to be emphasised in conversations about artificial intelligence and training AI systems on “diversity”. Training AI systems with diverse data inputs is a great start but building ethical and conscientious AI will need us to incorporate critical lessons from anthropology, history and data activism right from the beginning, rather than as correctives. It will need us to go a step further than training systems to remove biases of race to address the problems with proprietary knowledge production. In that sense, we not only need more diverse portrait-matches but we also fundamentally need more diversity among curating, editing and data-gathering practices, the kind of work that wonderful initiatives like Wikimedia Commons do. If private art collections and access to art and literature have been historically restricted to all non-elite populations, there is no reason that GLAM institutions (Galleries, Libraries, Archives and Museums) should reproduce these barriers in the digital realm by partnering with private tech companies that resist auditing and accountability.

While we call for transparency in understanding algorithmic functions, there is an equally urgent need to renew the push for open data initiatives, not only because they allow for wider public access to all kinds of knowledge but also because truly democratising the effects that AI systems will produce necessarily involves widening who gets to contribute to training these systems. Open data archives will not magically make technology’s cultural diversity problem go away but they provide an opportunity to expand the kinds of training input. Simultaneously, if AI systems are truly like children, learning to navigate the ethical complexity of our world then we need to develop critical data-training practices with wider public participation to open up and debate the myriad dilemmas associated with teaching machines.

Disclosure: FactorDaily is owned by SourceCode Media, which counts Accel Partners, Blume Ventures and Vijay Shekhar Sharma among its investors. Accel Partners is an early investor in Flipkart. Vijay Shekhar Sharma is the founder of Paytm. None of FactorDaily’s investors have any influence on its reporting about India’s technology and startup ecosystem.

Google Arts app has a data problem

Other top Stories

The tech that makes Swiggy tick — and what’s coming next

The folly of breaking up Big Tech

The Flipkart data view: Q&A with Mayur Datar, AI team

Google Arts app has a data problem

Subscribe to FactorDaily

Thank you!

Other top Stories

The tech that makes Swiggy tick — and what’s coming next​

AI at the core of Google’s new moves on enterprise business

The folly of breaking up Big Tech

The Flipkart data view: Q&A with Mayur Datar, AI team

The tech that makes Swiggy tick — and what’s coming next