Gemini Live’s Real-Time Video Features Helping Google Users

2025-03-24

Google has begun rolling out powerful new AI capabilities to Gemini Live, allowing the assistant to visually interpret both screen content and camera feeds in real-time. These advancements, developed under “Project Astra,” mark a significant leap forward in AI assistance technology as competitors struggle to catch up.

Using voice recognition functionality on a smartphone – illustrative photo. Google Gemini Live now has real-time video interpretation capabilities. Image credit: Amanz via Unsplash, free license

Visual Understanding: How Gemini Now “Sees” Your World

After nearly a year of development, Google has started implementing visual understanding features for select Google One AI Premium subscribers. The rollout introduces two groundbreaking capabilities that transform how users interact with their digital assistant.

The first feature enables Gemini to analyze on-screen content in real-time. When activated, users can ask questions about anything displayed on their device, from documents and images to complex applications. This screen interpretation happens instantly, allowing for natural, contextual assistance without the need to switch between apps or take screenshots.

The second capability brings real-time camera feed interpretation to Gemini. By accessing your smartphone camera, the AI can provide immediate feedback and assistance based on what it sees. In Google’s demonstration videos, users receive help with tasks like selecting pottery paint colors—showcasing practical applications for everyday creative projects.

Technical Implementation and User Experience

The interface for these new features appears streamlined for accessibility. Android users report seeing a new “Share screen with Live” button above the existing “Ask about screen” suggestion. Access to camera capabilities occurs through the full Gemini Live interface, with a phone call-style notification system and compact fullscreen display to maintain usability.

Google spokesperson Alex Joseph confirmed the rollout in an email to The Verge, noting that these features represent the practical implementation of technology first demonstrated in early Project Astra presentations.

For developers interested in tapping into similar capabilities, Google provides access through the Live API, which enables low-latency bidirectional voice and video interactions. This API supports natural, human-like conversations with the ability to process multiple input types:

import asyncio
from google import genai

client = genai.Client(api_key="GEMINI_API_KEY", http_options={'api_version': 'v1alpha'})
model = "gemini-2.0-flash-exp"

config = {"response_modalities": ["TEXT"]}

async def main():
    async with client.aio.live.connect(model=model, config=config) as session:
        while True:
            message = input("User> ")
            if message.lower() == "exit":
                break
            await session.send(input=message, end_of_turn=True)

            async for response in session.receive():
                if response.text is not None:
                    print(response.text, end="")

if __name__ == "__main__":
    asyncio.run(main())

Advanced Features Beyond Visual Processing

The Gemini Live experience extends beyond visual understanding, offering comprehensive AI assistance through several technical innovations:

Audio Processing and Voice Options

Developers can implement audio responses with multiple voice options including Aoede, Charon, Fenrir, Kore, and Puck. The system supports high-quality audio formats (16-bit PCM audio at 16kHz for input and 24kHz for output), enabling natural-sounding interactions:

from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=["AUDIO"],
    speech_config=types.SpeechConfig(
        voice_config=types.VoiceConfig(
            prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Kore")
        )
    )
)

Intelligent Interruption Handling

Unlike traditional AI assistants, Gemini Live supports mid-response interruptions. The system uses Voice Activity Detection (VAD) to recognize when a user begins speaking, canceling the ongoing generation while preserving already-delivered content in the session history. This creates more natural, human-like conversation flows where users don’t need to wait for the AI to finish speaking before responding.

Function Calling and Tool Integration

For more advanced applications, developers can define specialized tools within the Live API:

config = types.LiveConnectConfig(
    response_modalities=["TEXT"],
    tools=[set_light_values]
)

async with client.aio.live.connect(model=model, config=config) as session:
    await session.send(input="Turn the lights down to a romantic level", end_of_turn=True)

    async for response in session.receive():
        print(response.tool_call)

Market Position and Availability

These developments highlight Google’s significant lead in AI assistant technology. While Amazon prepares for the limited early access debut of Alexa Plus and Apple has delayed its upgraded Siri, Google’s Gemini is already deploying capabilities similar to what competitors have only promised.

The initial rollout appears targeted but not technically device-restricted. Although Google previously announced that Pixel and Galaxy S25 series owners would be “among the first” to receive Project Astra capabilities, early reports show functionality appearing on other devices like Xiaomi phones as well.

Currently, these advanced features remain exclusive to Google One AI Premium subscribers, with plans starting at $19.99 monthly. While the rollout is focused on Android devices initially, Google has not yet announced timeline details for potential iPhone support.

If you are interested in this topic, we suggest you check our articles:

Sources: TheVerge, Google Live API, The Times of India

Written by Alius Noreika