A new AI model developed by researchers at Stanford University demonstrates a significant leap in multimodal reasoning, capable of analyzing and answering questions about complex scenes involving both text and images. The system, named 'CogNet', integrates visual and linguistic data more seamlessly than previous models, allowing it to perform tasks like interpreting infographics, answering questions …
A new AI model developed by researchers at Stanford University demonstrates a significant leap in multimodal reasoning, capable of analyzing and answering questions about complex scenes involving both text and images. The system, named ‘CogNet’, integrates visual and linguistic data more seamlessly than previous models, allowing it to perform tasks like interpreting infographics, answering questions about diagrams, and describing the narrative of a comic strip panel. Early benchmarks show it outperforming existing models by a notable margin on standardized visual question-answering tests. The researchers emphasize that while promising, the technology is a research prototype and not yet ready for widespread commercial deployment. They have published their findings and made the model architecture publicly available for further academic study. Read the full article at https://technologyreview.com/2024/05/15/cognet-ai-multimodal-reasoning.
Join the Club
Like this story? You’ll love our Bi-Weekly Newsletter



