A new study from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a machine-learning technique that significantly improves the efficiency of large language models (LLMs) like GPT-3. The method, called 'LLM in a flash', enables complex AI models to run on devices with limited memory, such as smartphones, by storing model parameters in …
A new study from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a machine-learning technique that significantly improves the efficiency of large language models (LLMs) like GPT-3. The method, called ‘LLM in a flash’, enables complex AI models to run on devices with limited memory, such as smartphones, by storing model parameters in flash memory and selectively loading only the necessary data into RAM when needed. This approach overcomes the slow read speeds of flash memory by using a strategic data-loading technique and a ‘windowing’ method that reuses some activated data. The researchers demonstrated that their system could run models up to twice the size of the available RAM, with a speed improvement of 4-5 times compared to standard loading methods and up to 20 times in some scenarios. This advancement could make powerful AI assistants more accessible and private by running directly on personal devices. Read the full article at: https://technologyreview.com/2024/01/01/sample-article-url/
Join the Club
Like this story? You’ll love our Bi-Weekly Newsletter



