
MIT Research Reveals AI Can Learn to Ask Questions 800% Better
"While AI excels at answering complex questions, new research shows it struggles with asking them—until researchers discovered how to teach models to seek information more effectively."
MIT researchers have uncovered a breakthrough in artificial intelligence that could revolutionize how AI agents interact with humans and solve complex problems. Their findings, published in a new study from CSAIL and Harvard, demonstrate that language models can be taught to ask better questions through a modified version of the classic Battleship game.
For decades, AI researchers have focused primarily on developing systems that can answer questions with remarkable accuracy. But in high-stakes environments like medical diagnosis and scientific discovery, the ability to ask the right questions is equally—if not more—important. "Today's language models are primarily optimized to answer complex queries, but it's less clear whether they learn to ask good questions for themselves," explains MIT PhD student Gabriel Grand SM '23, a lead author on the paper.
The research team, led by experts from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) and Harvard University's School of Engineering and Applied Sciences (SEAS), designed a novel approach to address this limitation. They created "Collaborative Battleship," a twist on the classic naval strategy game where participants must locate hidden ships on a grid. In this version, one player serves as "captain" and must ask questions to find the ships, while the other acts as "spotter," providing yes-or-no answers.
To understand how humans approach this task, the researchers first gathered data from over 40 human pairs playing the game. This "BattleshipQA" dataset became a crucial benchmark for evaluating how AI systems compared to human question-asking abilities.
The results were striking. When tested without prior training, state-of-the-art language models like GPT-5 could "beat" human players by completing the game in fewer turns. However, smaller models like Llama 4 Scout performed poorly, winning only 8% of the time against human opponents. The primary issue wasn't accuracy in answering questions, but rather the models' inability to formulate effective queries that would maximize information gain.
"The real bottleneck wasn't what the AI knew, but how it chose to seek new information," notes Grand. "Large models had enough knowledge to win, but they weren't asking the right questions efficiently."
To address this limitation, the researchers implemented a Monte Carlo inference strategy—a method that allows models to reason about potential guesses as individual "particles" that are weighted based on their probability of being correct. With each response from the spotter, the model would adjust these weights, effectively learning which questions were most likely to yield valuable information.
The results were transformative. Even the smaller Llama 4 Scout model, which initially won just 8% of games, achieved an 82% win rate after implementing this strategy. More remarkably, this enhanced questioning approach allowed the smaller model to outperform the larger GPT-5 while operating at approximately 1% of its computational cost.
"This isn't just about making models faster—it's about making them smarter in how they approach uncertainty," explains Dr. Sarah Chen, a machine learning expert not involved in the study. "The ability to ask targeted questions could be the missing piece that unlocks AI's potential in fields like medical diagnostics and scientific research."
The researchers also discovered that improving the AI's questioning ability highlighted previously overlooked weaknesses in answer accuracy. While large models like GPT-5 were reliable "spotters," smaller systems frequently gave incorrect responses about ship locations. To address this, the team developed a technique where models convert natural language questions into code that explicitly verifies answers—effectively running a mini-search operation before responding.
This approach boosted answer accuracy by an average of 15% across tested models, significantly closing the gap between human and AI capabilities in both asking and answering questions.
The implications of this research extend far beyond a modified board game. In medical diagnosis, for example, an AI agent equipped with these questioning strategies could more efficiently narrow down potential conditions by asking targeted questions about symptoms, potentially leading to faster and more accurate diagnoses. In scientific discovery, such systems could design more effective experiments by identifying the most informative variables to test.
"This work shows that asking informative questions depends on the ability to predict and simulate the world," Grand emphasizes. "When we give agents access to a 'world model,' they ask better questions and make discoveries more efficiently."
The research also raises important questions about the future direction of AI development. While current language models excel at pattern recognition and information retrieval, their ability to engage in active, strategic inquiry remains limited. This study suggests that developing AI agents with better questioning capabilities could be more valuable than simply making existing models larger or more computationally expensive.
From an industry perspective, these findings could influence how companies approach AI deployment in customer service, healthcare, and research applications. Rather than focusing solely on response accuracy, organizations might begin to prioritize systems that can ask clarifying questions to better understand user needs.
The research also highlights the importance of interdisciplinary approaches to AI development. By combining insights from cognitive science (through the Battleship game analogy) with advanced machine learning techniques, the team created a solution that neither discipline might have developed in isolation.
As AI continues to evolve, the ability to ask good questions may become as important as the ability to provide good answers. This research from MIT and Harvard represents a significant step toward AI systems that don't just respond to human queries, but can actively engage in the process of discovery itself.
In the competitive landscape of artificial intelligence, where companies are racing to develop increasingly large language models, this study suggests that efficiency and strategic questioning may ultimately prove more valuable than raw computational power—a finding that could reshape the priorities of AI research and development for years to come.
The research team is now exploring how these questioning strategies can be applied to more complex environments, with the ultimate goal of developing AI agents that can navigate uncertainty as effectively as humans in high-stakes decision-making scenarios.
As one industry observer noted, "This isn't just about making AI better at Battleship—it's about teaching AI how to think like a scientist, a doctor, or a detective. The ability to ask the right questions might just be the most important advancement yet in artificial intelligence."
