In September 2021, Native American technology students in high school and college gathered at a conference in Phoenix and were asked to create photo tags — word associations, essentially — for a series of images.

One image showed ceremonial sage in a seashell; another, a black-and-white photograph circa 1884, showed hundreds of Native American children lined up in uniform outside the Carlisle Indian Industrial School, one of the most prominent boarding schools run by the U.S. government during the 19th and 20th centuries.

For the ceremonial sage, the students chose the words “sweetgrass,” “sage,” “sacred,” “medicine,” “protection” and “prayers.” They gave the photo of the boarding school tags with a different tone: “genocide,” “tragedy,” “cultural elimination,” “resiliency” and “Native children.”

The exercise was for the workshop Teaching Heritage to Artificial Intelligence Through Storytelling at the annual conference for the American Indian Science and Engineering Society. The students were creating metadata that could train a photo recognition algorithm to understand the cultural meaning of an image.

The workshop presenters — Chamisa Edmo, a technologist and citizen of the Navajo Nation, who is also Blackfeet and Shoshone-Bannock, Tracy Monteith, a senior Microsoft engineer and member of the Eastern Band of Cherokee Indians, and journalist Davar Ardalan — then compared these answers with those produced by a major image recognition app.

For the ceremonial sage, the app’s top tag was “plant,” but other tags included “ice cream” and “dessert.” The app tagged the school image with “human,” “crowd,” “audience” and “smile” — the last a particularly odd descriptor, given that few of the children are smiling.

Advertising

The image recognition app botched its task, Monteith said, because it did not have proper training data. Edmo explained that tagging results are often “outlandish” and “offensive,” recalling how one app identified a Native American person wearing regalia as a bird. And yet similar image recognition apps have identified with ease a St. Patrick’s Day celebration, Ardalan noted as an example, because of the abundance of data on the topic.

As Monteith put it, AI is only as good as the data it is fed. And data on cultures that have long been marginalized, like Native ones, are simply not at the levels they need to be.

“Clearly, there’s a bias represented,” he said.

More

The workshop was the initiative of Intelligent Voices of Wisdom, or IVOW, a tech startup that Ardalan, an executive producer of audio at National Geographic, founded to preserve culture through AI and to counter those biases.

“The internet is not representative of the entire population, and when people are represented, it may not be accurate because of stereotypes and hate speech,” said Percy Liang, an associate professor of computer science at Stanford University and director of the school’s Center for Research on Foundation Models.

To counter this tendency, Ardalan, who is an Iranian American of Bakhtiari and Kurdish descent, wants IVOW to develop tools to create “cultural engines” for underrepresented groups so they can generate, and take ownership of, their data.

Advertising

“The cultural engine cannot be a data scientist in Philadelphia trying to create data sets for a tribe in Arizona,” she said.

More representative, accurate data is beneficial not only to the groups it represents, but also to AI systems at large, said W. Victor H. Yarlott, an AI researcher at Florida International University, a member of the Crow Tribe of Montana and an IVOW collaborator.

“Lacking this knowledge just makes your system worse,” he said. “You’re not really representing human intelligence or human knowledge unless your system can handle it from a broad range of cultures.”

The participation of Indigenous people in the project was critical. Monteith, who led the effort to enter the Cherokee writing system into Microsoft Windows and Office, said he has worked on building trust for technology, and more recently AI, in his Native communities for decades.

“I knew without me doing this that we would be in a worse spot in terms of literacy, and our culture,” he said.

The team at IVOW, along with a group of volunteer collaborators and advisers, has been developing proofs of concept for these cultural engines — smart data sets that can feed more inclusive AI tools, including chatbots and image recognition apps.

Advertising

One such tool is IVOW’s Indigenous Knowledge Graph, or IKG, a cultural engine in early development that is focused on storytelling about Indigenous recipes and culinary practices. After meeting the IVOW team in 2018, Yarlott pitched the IKG, a sort of visualization of a data set, to capture Indigenous knowledge.

The first step was to gather the data. The team chose a culinary focus because it is a part of life that all people share. They collected recipes and related stories from both the public domain and team members.

The tools and techniques to create the IKG were designed to be basic enough that anyone, not just those with a background in computer science, could use them. And IKG uses only information that is widely available or that the team had permission to use from their own tribes, bands and nations.

There are challenges, though. The process is labor intensive and expensive. IVOW is a self-funded enterprise, and the work of the collaborators is voluntary.

“It’s a little bit of a chicken and an egg problem because you need the data to really build a big system that demonstrates value,” Yarlott said. “But to get all the data, you need money, which only really starts to come when people realize that there’s substantial value here.”