Apple is taking a significant step forward in the development of its artificial intelligence (AI) models by adopting a new approach aimed at enhancing performance while safeguarding user privacy. This move comes as part of Apple's broader effort to improve its AI capabilities without compromising the trust its users place in the company. According to a report by Bloomberg, Apple is preparing to introduce this innovative technique in upcoming beta versions of iOS 18.5 and macOS 15.5, with the details of this new method outlined in a blog post on Apple’s Machine Learning Research website.
Currently, Apple employs synthetic data to train its AI models, including those used for features like writing tools and email summaries. Synthetic data is generated artificially rather than gathered from actual user interactions, a measure designed to protect user privacy. However, Apple acknowledges that while synthetic data has its benefits, it also comes with limitations. Specifically, synthetic data struggles to accurately capture trends in real-world usage, especially when it comes to understanding how people write or summarize longer messages.
To overcome these challenges, Apple is introducing a new method that enables its AI models to learn from synthetic data while still maintaining strict privacy protections. The process involves comparing synthetic data—such as fake emails—with real data, without ever accessing the actual content of user emails. Here's how it works:
First, Apple generates thousands of synthetic emails on various everyday topics. An example Apple provides is a message asking, "Would you like to play tennis tomorrow at 11:30AM?" These synthetic emails are then converted into embeddings. Embeddings are data representations that encapsulate the essential characteristics of the content, such as its topic and length.
The synthetic embeddings are then sent to a select group of user devices that have opted into Apple’s Device Analytics program. These devices compare the synthetic embeddings with a small sample of the user’s recent emails. The goal is to determine which synthetic message most closely resembles the user's real emails. Importantly, the actual content of the emails and the matching process never leave the device, ensuring that privacy is maintained at all times.
The process uses a technique called differential privacy, which further protects user privacy by ensuring that only anonymous signals are sent back to Apple. The company then analyzes which synthetic messages are selected most frequently, without being able to trace this data back to individual devices or users. The most popular synthetic messages are then used to enhance Apple’s AI models, helping to improve the accuracy and relevance of features like email summaries, all while preserving user anonymity.
Apple has already implemented this privacy-centric approach in some of its existing AI tools, such as Genmoji, its custom emoji tool. For example, Apple tracks which prompts, such as “an elephant in a chef’s hat,” are commonly requested by users. By anonymously aggregating this data, Apple can fine-tune its AI models to generate better responses to real-world requests, while ensuring that rare or unique prompts remain private and untracked.
In addition to improving its email-related AI features, Apple plans to expand this privacy-first method to other areas of its AI offerings. The company has confirmed that it will apply similar techniques to a variety of other tools, including Image Playground, Image Wand, Memories creation, and its Visual Intelligence features. By continuing to prioritize user privacy while advancing its AI technology, Apple aims to build more useful and personalized experiences for users, without compromising the trust they place in the company.