Is AppleGPT in the works? Here's what the report suggests...

Apple has reportedly developed an internal service similar to ChatGPT, intended to help employees test new features, summarize text, and answer questions based on the knowledge they have accumulated.

In July, Mark Gurman suggested that Apple was creating its own AI model, with the central focus on a new framework called Ajax. The framework has the potential to provide several capabilities, with a ChatGPT-like application, unofficially called “Apple GPT”, being just one of many capabilities. Recent evidence from an Apple research paper suggests that Large Language Models (LLMs) may work on Apple devices, including iPhones and iPads.

This research paper, initially discovered by VentureBeat, is titled “LLM in a flash: efficient large language model inference with limited memory.” It addresses a critical issue related to the implementation of Large Language Models (LLMs) on devices, especially those with limited DRAM capacity.

LLMs are characterized by billions of parameters, and running them on devices with limited DRAM poses a significant challenge. Reportedly, the proposed solution in the paper involves executing LLMs on the device by storing the model parameters in flash memory and retrieving them in DRAM when necessary.

Keivan Alizadeh, Machine Learning Engineer at Apple and lead author of the paper, explains: “Our approach involves developing an inference cost model that matches the characteristics of flash memory, allowing us to improve optimization in two crucial aspects: minimizing of the amount of data transferred from flash and reading data in larger, more coherent segments.”

The team employed two main strategies: 'Windowing' and 'row-column bundling'. Windowing involves reusing previously activated neurons to minimize data transfer, while row-column bundling involves increasing the size of the data chunks read from flash memory. Implementing these techniques resulted in a remarkable four to five times improvement in the Apple M1 Max System-on-Chip (SoC).

Theoretically, this context-based adaptive loading could enable the execution of Large Language Models (LLMs) on memory-constrained devices such as iPhones and iPads.

Unlock a world of benefits! From insightful newsletters to real-time inventory management, breaking news and a personalized news feed – it's all there, just one click away! Log in now!

Check out all the technology news and updates on DailyExpertNews. .

More or less

Published: Dec 23, 2023 6:58 PM IST