Gemini family of AI models, Google is all set to take on OpenAI. Here are some reasons why it may outdo the widely popular GPT series.
Google has stunned the world with the launch of Gemini, which it claims to be its largest and most capable AI model. With Gemini, Google is entering the fray with OpenAI, Meta, and Anthropic, whose AI models have been making large strides with their capabilities. Google has unveiled Gemini in three models Ultra, Pro, and Nano.
While the Pro version will be integrated into Google Bard, the Nano has been designed to be featured in Google Pixel 8 Pro. With such diversity at the disposal of consumers, Google could threaten the dominance of OpenAI’s ChatGPT. Moreover, Gemini is multimodal and can work efficiently across various types of information such as text, videos, images, code, etc.
From the keynote, it is pretty evident that Google is out to take on some of its biggest competitors. The tech behemoth also shared a sheet comparing key metrics where Gemini overshadowed GPT-4, OpenAI’s most powerful model to date. While we are yet to use Google’s powerful AI model, here are some reasons why we think it may be a game changer in the world of AI.
Gemini comes in three sizes
Gemini is Google’s family of highly capable multimodal models. On the other hand, Generative Pre-trained Transformer – 4 (GPT-4) is a multimodal large language model from OpenAI which was launched in March this year. The first version of Gemini, Gemini 1.0 comes in three sizes – Ultra, which is meant for highly complex tasks; Pro, which Google calls its best model for scaling across a wide range of tasks; and Nano, its most efficient model for on-device tasks.
Reportedly, Gemini Ultra has accomplished state-of-the-art (SOTA) benchmarking, which is essentially the practice of evaluating the efficiency and performance of a new model. The ultra version has been designed for data centres and is still under red-teaming safety review and will be available sometime in early 2024 and reportedly on a new version of Bard.
Gemini Pro is somewhat similar to GPT-3.5, however it is optimised for low latency and cost. For anyone looking for the best model but has limitations in terms of cost, Pro is ideal. Just as the ChatGPT free version comes with GPT-3.5. Interestingly, Gemini Pro has been already integrated into Google Bard. On the other hand, Nano is a model that has been developed for on-device usage. Nano has been divided into Nano 1 with 1.8 billion parameters and Nano 2 with 3.25 billion parameters. Nano will also be available on Google Pixel 8 Pro which makes it one of the most advanced AI-enabled smartphones.
Multimodal mastery and super fast
It is to be noted that Gemini is not just about text. The model seamlessly integrates a diverse range of data types. This facet lends it the ability to offer natural and engaging interactions. Gemini can recognise images and speak in real-time. It is reportedly five times stronger than GPT-4, this is attributed to Google’s TPUv5 chips. The faster processing essentially means Gemini is capable of tackling complex tasks with relative ease. It is also the first AI model to outperform human experts on the MMLU benchmark with a 90 per cent score. MMLU stands for Massive Multitask Language Understanding, and is a test that covers 57 subjects across humanities, STEM, Social Sciences, etc.
Gemini is reportedly trained on massive datasets of text and code. This ensures that the AI model has the latest information and can reportedly offer accurate and highly reliable answers to queries and prompts. The data sets that Gemini is trained on contain data from web documents, codes, books, and a wide array of audio, video, and image data. Gemini has been proven to outperform some ‘expert level’ tasks that are specific to humans. Unlike GPT-4, Google’s Gemini is said to be constantly learning and improving. The AI model is capable of incorporating new information in real-time. This ensures that its knowledge is up-to-date and relevant.
Upper hand in scientific research
Gemini is capable of analysing vast amounts of data and can recognise patterns and trends. Reportedly, the model can also generate hypotheses for further research. AI experts feel this could revolutionise scientific discovery and may lead to breakthroughs in domains such as technology, medicine, and more. It is learned that Gemini can extract research from thousands of research papers. Since it is multimodal, Gemini not only understands texts but can also analyse complex graphs.
Gemini Pro outperforms GPT-3.5
Gemini Pro will be integrated into Google’s chatbot Bard and across Google Apps, making it accessible to millions of users, and it will be free. On the other hand, ChatGPT’s free version offers only GPT-3.5. In six out of eight benchmarks, the Pro version surpassed GPT-3.5. This makes it the most powerful free AI chatbot in the world. With access to millions and multimodal capabilities, Gemini Pro, in all likelihood, will overtake ChatGPT. We will still need to wait for some time to see its true capabilities.
As mentioned above, Gemini Ultra is the most powerful model in the Gemini family. It is the first time in a year that an AI model has managed to outdo GPT-4. Gemini Ultra managed to achieve a SOTA in 30 out of 32 popular academic benchmarks. When it came to reasoning, Gemini Ultra performed marginally better than GPT-4 in Big-bench Hard, DROP, and HellaSwag benchmarks. In Math, Ultra outperformed GPT-4 in GSM8K and MATH benchmarks. The model also leapt ahead of GPT-4 when it came to Python code generation.
As mentioned above, we are yet to test any of the Gemini models.