A new AI model developed through a collaboration between computer scientists and dolphin researchers may pave the way for two-way animal communication.
In a partnership that merges science fiction with long-standing ocean research, Google has collaborated with marine biologists and AI scientists to create a large language model. This model, named DolphinGemma, is not designed for human interaction but rather to communicate with dolphins. The model, developed in collaboration with the Wild Dolphin Project (WDP) and researchers at Georgia Tech, aims to decode dolphin vocalizations and facilitate human-dolphin communication. This initiative marks a milestone in a 40-year-long journey to understand how cetaceans communicate.
Since 1985, the Wild Dolphin Project has conducted the longest underwater study of dolphins globally. The research focuses on a group of wild Atlantic spotted dolphins in the Bahamas, collecting underwater audio and video data associated with individual dolphins. These data have helped researchers understand the relationships and life histories of the dolphins in the pod. The dataset includes 41 years of sound-behavior pairings, such as courtship buzzes, aggressive squawks, and “signature whistles,” which function as dolphin identifiers.
This comprehensive dataset provided Google researchers with the necessary foundation to train an AI model capable of interpreting dolphin sounds similarly to how ChatGPT processes human language. DolphinGemma, a model with approximately 400 million parameters, operates on the same research principles as Google’s Gemini models. The model processes audio input and predicts the subsequent dolphin vocalization, learning the structure of dolphin communication in the process.
AI models are advancing the pace at which animal communication is deciphered. Researchers are employing large language models to analyze and interpret various animal sounds, from dog barks to bird whistles. Using pattern recognition, these models can offer insights into the meaning behind these sounds. Cetaceans, such as dolphins and whales, are particularly suitable for AI-driven interpretation due to their complex social interactions and vocalization patterns, which are easily recorded and analyzed. For instance, Project CETI applied software tools and machine learning to a collection of sperm whale codas, identifying rhythm and tempo patterns that facilitated the creation of a phonetic alphabet.
The DolphinGemma model is capable of generating new sounds resembling dolphin vocalizations in appropriate acoustic patterns. This development could enable humans to engage in real-time communication with dolphins. The communication relies on an underwater computer system known as Cetacean Hearing Augmentation Telemetry, or CHAT. This system generates dolphin-like sounds associated with objects familiar to dolphins, such as seagrass and researchers’ scarves. Google expressed hope that dolphins, curious by nature, will mimic these synthetic whistles to request items, gradually allowing for the natural sounds of dolphins to be incorporated into the system.
CHAT is integrated into modified smartphones, with the goal of establishing a basic vocabulary for human-dolphin interactions. If a dolphin mimics a specific whistle linked to an object, researchers can respond by providing that object, akin to a game of charades with technology as the mediator. Future versions of CHAT are expected to have enhanced processing capabilities and improved algorithms, facilitating quicker and clearer interactions between dolphins and researchers. However, the implementation in the wild raises ethical considerations regarding how to responsibly engage with dolphins should the communication methods become more advanced.
This summer, Google intends to release DolphinGemma as an open model, permitting researchers studying other dolphin species to apply it more broadly. DolphinGemma potentially signifies a step towards comprehending one of the ocean’s well-known mammals. While direct communication, such as a dolphin delivering a TED Talk, may still be distant, the prospect of two-way interaction is a promising indicator of the potential outcomes enabled by AI models.