We are a group of volunteer researchers who want to learn more about how people in different parts of the world experience technology powered by Artificial Intelligence (AI).
We curated a dataset of local food from around the world with the generous help of contributors listed here.
The data that we curated include the name of the dish (both in the local language and in English), the country of origin, the region of origin, the associated culture, the time of day at which the meal is eaten, the type of meal, the utensils used, the drinks that accompany the meal, any special occasions when the meal is eaten, the ingredients, the recipe, and the image of the dish if available.
We then used this curated list of dishes with its labels to assess the current AI systems' ability to understand the diversity of food cultures. We tested both Large Language Models (GPT 3.5, Llama 3 - 8B model, Llama 3 - 70B model) and image generation models (DALL-E 2, DALL-E 3, Stable Diffusion v2.1) to see if there are any biases in the models' capabilities.
Our findings are detailed in the following report:
Numerous indicators and metrics suggested that the models are not accurately portraying the diversity of food culture. While the models were able to generate images of food or a description of food, the images / descriptions were not always accurate representations of the dishes.
In the case of generating descriptions, occasionally the LLM admitted that it is "making up" a description due to its lack of understanding of the dish, or claimed that the dish itself is made-up and fictionary.
In particular, we observed that the models struggled more frequently with African dishes when it was tasked to generate descriptions or properties of the dishes.
For dish images generated by Text-to-Image (T2I) models, we also observed numerous failures, e.g. stereotyping, generating images of other dishes or dishes that do not exist. The probability of the model generating an image of the correct dish was especially low for many of the African countries that we closely examined (in the range of 2% - 30% in most cases). Furthermore, upon comparing the CLIP embeddings of the images against CLIP embeddings of positive and negative sentiments, the models were biased towards associating African dishes with a negative embedding.
While our analysis indicates a clear capability and representational bias against the African continent, studies such as our own are important to highlight the limitations and empower communities to contribute to the improvement of these systems. World Wide Dishes was designed to be fully decentralised to allow many more people to contribute their local knowledge and expertise, without a significant time commitment if this is not possible. We are exceptionally grateful to all those that shared their local expertise with us, and gave community-based reviews on T2I model output to support this work,
Please refer to the paper for more details on the methodology and results. We thank all the contributors who helped us curate the dataset, as well as the community ambassadors, collaborators, and funders who helped us make this project possible.