free web page hit counter
38 C
Lahore
Monday, May 20, 2024
Advertisment
HomeGeneralApple researchers are building an AI model called 'Ferret UI' that can...

Apple researchers are building an AI model called ‘Ferret UI’ that can navigate through iOS.

- Advertisement -
- Advertisement -
Read Counter 0

Apple researchers are building an AI model called ‘Ferret UI’ that can navigate through iOS.

apple Researchers have published another paper on artificial intelligence (AI) models, and this time the focus is on understanding and navigating through smartphone user interfaces (UI). A yet-to-be-peer-reviewed research paper highlights a large-scale language model (LLM) called Ferret UI, which goes beyond traditional computer vision to understand complex smartphone screens. Notably, this is not the first paper on AI published by the tech giant’s research division. It has already been published. Paper on Multimodal LLMs (MLLMs) and one more On-device AI models.

Preprint version of the research Paper An open access online repository of scholarly articles is published on arXiv. This paper is titled “Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs” and focuses on extending the use case of MLLMs. This highlights that most language models with multimodal capabilities cannot understand beyond natural imagery and are “limited” in functionality. It also highlights the need for AI models to understand complex and dynamic interfaces such as smartphones.

According to the paper, Ferret UI is “designed to perform specific referencing and grounding tasks for UI screens, while properly interpreting and executing open language instructions”. Simply put, a vision language model can not only process a smartphone screen with multiple elements representing different information, but it can also tell the user about them when prompted with a question.

How Ferret UI processes information on the screen.
Photo credit: Apple

Based on the image shared in the paper, the model can understand and classify widgets and recognize icons. It can also answer questions like “where is the lunch icon”, and “how do I open the reminders app”. This shows that the AI ​​is capable of not only explaining its screen, but also moving to different parts of the iPhone based on a prompt.

To train Ferret UI, Apple researchers themselves generated data of varying complexity. This helped the model learn the basic tasks and understand the one-step process. “For advanced tasks, we use GPT-4. [40] To generate data, including detailed description, discourse comprehension, conversational interaction, and function assessment. “These advanced tasks enable the model to engage in more substantive conversations about visual components, formulate action plans with specific goals in mind, and interpret the general purpose of the screen,” the paper said. Explained.

The paper is promising, and if it passes the peer review stage, Apple may be able to use this capability to develop powerful tools. iPhone which can perform complex UI navigation tasks with simple text or verbal prompts. This capability seems ideal for Siri.


Affiliate links may be generated automatically – see our Statement of Ethics For details

For More Detail www.gadgets360.com

The Apple Ferret UIAI model can understand iPhone UI Apple.,Apple AI,Artificial intelligence,GPT

- Advertisement -
RELATED ARTICLES
- Advertisment -
- Advertisment -

Latest News & Update

- Advertisment -

Recent Comments