Menu
Join the Club

Your Bi-Weekly Dose Of Everything Optimism

News Summary

A new AI model developed by researchers at Stanford University demonstrates a significant leap in multimodal reasoning, capable of analyzing and answering questions about complex scenes involving both text and images. The system, named 'CogNet', integrates visual and linguistic data more seamlessly than previous models, allowing it to perform tasks like interpreting infographics, answering questions …

A new AI model developed by researchers at Stanford University demonstrates a significant leap in multimodal reasoning, capable of analyzing and answering questions about complex scenes involving both text and images. The system, named ‘CogNet’, integrates visual and linguistic data more seamlessly than previous models, allowing it to perform tasks like interpreting infographics, answering questions about diagrams, and describing the narrative of a comic strip panel. Early benchmarks show it outperforming existing models by a notable margin on standardized visual question-answering tests. The researchers emphasize that while promising, the technology is a research prototype and not yet ready for widespread commercial deployment. They have published their findings and made the model architecture publicly available for further academic study. Read the full article at https://technologyreview.com/2024/05/15/cognet-ai-multimodal-reasoning.

Join the Club

Like this story? You’ll love our Bi-Weekly Newsletter

Technology Review

Technology Review

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Ask Richard AI Avatar