Thursday, 1 May 2025
26.8 C
Singapore
28.5 C
Thailand
20.6 C
Indonesia
28.6 C
Philippines

Gemini AI makes office robots more useful

Google’s Gemini AI improves robotic navigation, combining natural language and visual instructions for seamless indoor assistance.

Do you need help in an office building, big box store, or warehouse? Just ask the nearest robot for directions.

Google researchers have unveiled a breakthrough in robotic navigation that combines natural language processing with computer vision. This new study, published on Wednesday, highlights their work on an everyday robot that navigates indoor spaces using simple language prompts and visual inputs.

Transforming robotic navigation

Previously, robots required detailed environmental maps and specific physical coordinates to move around. This cumbersome process limited their utility. However, with recent advancements in vision language navigation, you can now instruct robots using natural language commands, such as “go to the workbench.” Google’s team has pushed this further by enabling robots to understand and act on spoken and visual instructions simultaneously.

Imagine you’re in a warehouse and need to find where an item belongs. You can show the robot the item and ask, “What shelf does this go on? ”Powered by Gemini 1.5 Pro, the AI can interpret your question and the visual data and then guide you to the correct spot. The robots were also tested with commands like, “Take me to the conference room with the double doors,” “Where can I borrow some hand sanitizer?” and “I want to store something out of sight from public eyes. Where should I go? ”

In an Instagram Reel demonstration, a researcher activated the system with an “OK robot” and requested to be led to a place where he could draw. The robot responded, “Give me a minute. Thinking with Gemini” before quickly navigating the 9,000-square-foot DeepMind office to a large wall-mounted whiteboard.

The magic behind the navigation

These robots weren’t entirely unfamiliar with the office layout. The team used “Multimodal Instruction Navigation with Demonstration Tours (MINT).” Initially, they manually guided the robot around the office, pointing out specific areas and features using natural language. This can also be achieved by recording a video tour of the space with a smartphone. The AI then creates a topological graph, matching what its cameras see with the “goal frame” from the video.

Next, the team implemented a hierarchical vision-language-Action (VLA) navigation policy. This policy combines environmental understanding with common-sense reasoning, enabling the AI to translate user requests into navigational actions.

The results were impressive, with robots achieving “86 percent and 90 percent end-to-end success rates on previously infeasible navigation tasks involving complex reasoning and multimodal user instructions in a large real-world environment,” according to the researchers.

Room for improvement

Despite these successes, there is still work to be done. The robots cannot yet autonomously perform their demonstration tours, and the AI’s response time ranges from 10 to 30 seconds, making interactions slower than desired. The researchers are aware of these limitations and are working on enhancing the system’s efficiency and autonomy.

This innovation signifies a significant leap in robotic navigation, bringing us closer to a future where robots can seamlessly assist in complex indoor environments using natural language and visual cues.

Hot this week

Nvidia releases another GPU fix to stop crashes on RTX 50-series

Nvidia released hotfix 576.26, its fifth GPU driver update in recent months, to fix RTX 50-series crashes, game bugs, and DisplayPort issues.

OpenAI brings smarter shopping to ChatGPT with new search features

ChatGPT now offers smarter shopping with visual product picks, reviews, and direct links—no ads, just easier online buying.

Razer launches exclusive chair sleeves for Iskur V2 X

Razer releases machine-washable sleeves for the Iskur V2 X gaming chair, offering style and protection in black or quartz options.

Apple’s AirPods Pro dropped to their lowest price of the year so far

Apple’s latest AirPods Pro with USB-C are now just US$169—this year’s best price and only US$16 more than their Black Friday price.

Global PC shipments rise 6.7% in early 2025 as AI and tariffs drive demand

PC shipments rose 6.7% in Q1 2025, boosted by AI demand and tariff concerns, but growth is expected to slow later in the year.

You can get DOOM: The Dark Ages free with select Nvidia graphics cards

Get DOOM: The Dark Ages Premium Edition free with select Nvidia RTX 50 GPUs until May 21, including in-game extras and early access.

Xiaomi enters China’s AI race with new model to power smart devices

Xiaomi joins China’s AI race with its new MiMo model, aiming to power devices with smarter tech and compete with big tech firms.

Samsung chip profits fall sharply due to US export controls and price drops

Samsung chip profits dropped 40% due to US export rules and price cuts as the company raced to catch up in AI memory production.

Chinese AI and robotics start-ups back Xi’s push for technological self-reliance

Chinese AI and robotics start-ups vow self-reliance after Xi visits Shanghai, showcasing innovation and commitment to homegrown tech.

Related Articles

Popular Categories