Apple's New Multimodal AI BEATS GPT-4 Vision (New APPLE AI)

Similartool.AI による     更新済み Jan 9, 2024

In a groundbreaking development, Apple has introduced a multimodal AI system, codenamed 'Feret', which reportedly surpasses the vision capabilities of OpenAI's GPT-4 in certain benchmarks. This article delves into Apple's innovative strides in machine learning, the functionalities of Feret, and its implications for the future of AI.

1. Benchmarking and Model Comparisons

Apple's new AI system, Feret, has been put through a series of benchmarks and comparisons that suggest it surpasses the capabilities of OpenAI's GPT-4 in handling complex visual tasks. The model uses a clip vii l14 tool to comprehend picture contents and translate them into a computer-friendly format. It works alongside coordinates to effectively handle specific queries about image regions, delivering impressively accurate results.

The Feret model's proficiency is highlighted through various performance metrics like input types (point, box, free form), grounding accuracy, data construction, and robustness, making it a potent tool for applications needing detailed visual analysis. These comparisons suggest that Apple's AI can perform better than GPT-4 Vision when it comes to precisely identifying and describing small and detailed regions within images.

2. Implications for AI-driven Industries

If the capabilities of the Feret model are as advanced as the benchmarks suggest, industries relying on detailed visual recognition could see a transformation. Industries such as autonomous driving, which require fine-grained visual understanding, could benefit from the precise identification features of Apple's multimodal AI, potentially providing more accurate and reliable decision-making than current AI systems.

Moreover, it could lead to enhancements in user experience across Apple devices, where AI-driven functionalities like text prediction and image-journaling are becoming more sophisticated. The Feret model could play a significant role in creating more intuitive and personalized interactions with technology, drawing from a user's data to generate relevant suggestions and assist in day-to-day tasks.

3. Apple's Strategic AI Acquisitions

Apple has been strategically acquiring AI companies to bolster its AI and machine learning capabilities. These acquisitions have enabled Apple to tap into new technologies and expertise that can feed into a broader strategy aimed at enhancing the functionality and intelligence of its product ecosystem.

Such strategic moves signal Apple’s seriousness about its position in the AI space. By leveraging acquired technologies, Apple has the potential to innovate and create AI solutions that align with its reputation for producing elegant, user-centric products.

4. Addressing AI Hallucination and Ethical Concerns

AI hallucination, where AI delivers convincingly incorrect information, remains a significant concern within the technology community. Apple's development in AI, especially with Feret, must confront this issue as it can have dangerous implications if left unaddressed. Concerned comments reflect the broader industry worry over AI's potential negative impact if it's unable to differentiate between factual information and fabricated data reliably.

Further ethical concerns involve the application of AI in military operations and strategic medical applications. The transformative potential of AI in these sectors is undeniable, but it raises questions regarding the control, application, and possible consequences of these powerful AI capabilities when used in contexts where stakes are exceptionally high.

5. Privacy and Data Management

With any advancement in AI, especially from a company like Apple that has access to vast amounts of user data, concerns over privacy and data usage inevitably arise. Apple's commitment to privacy has been one of its hallmarks, and it will be critical for the company to maintain this commitment as it expands its use of AI. Apple's AI, including Feret, must be designed with robust privacy protections and transparent data management practices.

The company will need to balance the dual demands of leveraging user data to personalize and improve services while maintaining trust through uncompromised data security. User reactions suggest that this balance will be pivotal to Apple's continued success in a domain where data is king but privacy is paramount.

6. Examining Feret's Testing Methodology

A critique emerged from the public regarding the testing conditions of the Feret model against GPT-4, noting that the specificity of the prompt could significantly impact the outcome. In the experiment, the object within a specific region on the bike, labeled as 'region0', was described inaccurately by GPT-4 when the prompt mentioned 'highlighted', which could be misconstrued. This suggests that the clarity of the prompt is vital for the AI's performance and benchmarking fairness.

The comment also underscores the importance of consistency in the testing process. Using the same font color as the object's box, as done in the Feret test, could potentially avoid confusion and allow for a more direct comparison with GPT-4's capabilities. This point emphasizes that the nuances in communication with AI models are critical, and a standardized approach would offer a more legitimate comparison between different AI's abilities to understand and analyze images.

7. Enhancing Video Content Quality

Another piece of feedback focused on the video content's presentation of the AI models' comparison information. The video apparently displayed text errors and used a visual emphasis method that detracted from the intended message. This criticism indicates that the quality of content delivery affects how information is perceived and stresses that proofreading and careful design choices are essential for effective communication.

The viewer's experience can be negatively impacted by such distractions as poorly implemented emphasis methods or incorrect captions, which may lead to misunderstandings of the content. This feedback suggests adopting a simpler, cleaner approach to communication, avoiding unnecessary visual effects, and ensuring that emphasis is placed only where required without compromising textual integrity.

8. Siri's Potential Revamp With Apple GPT

Public reactions to Apple's rumored developments in AI highlight anticipation regarding enhancements to Siri's capabilities. With speculation about an AI model named Apple GPT, there's excitement surrounding potential improvements in natural language processing and conversation realism. Comments reveal a desire for substantial progress from the existing functions of Siri, aiming for a more intelligent and personalized virtual assistant experience.

Furthermore, the discussion reflects on the rate at which Apple is moving in the AI space. Given the swift advancements AI has made recently, there is a shared concern that Apple might lag unless it expedites its innovation in this field. As a company known for refining technology rather than pioneering it, the expectation is for Apple to catch up and potentially lead with features that not only equal but surpass existing AI solutions.


Apple's entrance into the AI domain with its Feret model marks a significant milestone, challenging the front-runner GPT-4's vision capabilities. The Feret model excels in image identification and understanding relationships between objects within images. With detailed benchmarks and rigorous comparisons, Apple's Feret has demonstrated proficiency in precise, multimodal tasks that might have drastic implications for the AI industry, particularly in areas requiring nuanced visual understanding.