General

Apple researchers are working on MM1, a family of multimodal AI models with 30 billion parameters

March 18, 2024

Read Counter 0

Apple researchers are working on MM1, a family of multimodal AI models with 30 billion parameters

apple Researchers have shared their work on creating multimodal. Artificial intelligence (AI) Large Language Model (LLM), in preprint paper. Published on an online portal on March 14, the paper highlights how he was able to achieve high multi-modality capabilities and build a foundational model train on both text-only data as well as images. The new advances in AI for the Cupertino-based tech giant come after CEO Tim Cook. Remarks During the company’s earnings calls, they said AI features could arrive later this year.

Preprint version of Research paper An open access online repository of scholarly articles is published on arXiv. However, the papers posted here have not been peer reviewed. Although the paper does not mention Apple itself, most of the researchers mentioned are associated with the company’s machine learning (ML) division, leading to the belief that the project is also affiliated with Apple. Is. iPhone the maker

According to the researchers, they are working on MM1, a family of multimodal models with 30 billion parameters. Calling it “Performant Multimodal LLM (MLLM), the paper’s authors highlight that image encoders, vision language connectors, and other architecture components and data choices A.I model capable of understanding both text-based as well as image-based input.

Giving an example, the paper stated, “We show that using a careful mixture of image caption, interleaved image text, and text-only data for large-scale multimodal pre-training to achieve state-of-the-art Very important.(SOTA) few-shot results in several benchmarks compared to other published pre-training results.

To break it down, the AI model is currently in the pre-training phase, which means it is not trained enough to produce the desired results. This is the phase when algorithms and AI architecture are used to design the model’s workflow and ultimately how it processes data. A team of Apple researchers was able to add computer vision to the model using image encoders and a vision language connector. Then, when tested with a mixture of images-only, image-and-text, and text-only datasets, the team found that the results were competitive with existing models at the same stage.

While this development is significant, this research paper is not enough to determine whether a multimodal AI chatbot will be included in Apple’s operating system. At this stage, it is also difficult to say whether the AI model is multimodal in taking input or even in giving output (whether it can create AI images or not). But if the results are confirmed to be consistent after peer review, it can be said that the tech giant has taken another big step towards building a native generative AI foundation model.

Affiliate links may be generated automatically – see our Statement of Ethics For details

For More Detail www.gadgets360.com

Apple researchers MM1 family of multimodal AI models up to 30 billion parameters Apple,Artificial intelligence,ai