Skip to content

feat(genai): genai live audio #13520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Guiners
Copy link

@Guiners Guiners commented Jul 28, 2025

Description

Fixes #

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

@product-auto-label product-auto-label bot added the samples Issues that are directly related to samples. label Jul 28, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Guiners, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the genai samples by adding new examples that demonstrate real-time audio interactions with the Google GenAI live API. These additions cover both generating audio from text input and transcribing text from audio input, providing comprehensive examples for developers working with live audio streams.

Highlights

  • New GenAI Live Audio Samples: Introduces two new Python samples (live_audio_with_txt.py and live_txt_with_audio.py) demonstrating real-time audio capabilities with the Google GenAI live API.
  • Text-to-Audio Generation: The live_audio_with_txt.py sample showcases how to send text input to the GenAI model and receive synthesized audio responses, utilizing IPython.display.Audio for playback.
  • Audio-to-Text Generation: The live_txt_with_audio.py sample demonstrates sending an audio file to the GenAI model and receiving a text transcript, leveraging librosa and soundfile for audio processing.
  • Dependency and Test Updates: Updates requirements.txt to include librosa and IPython for the new samples, and adds corresponding asynchronous tests in test_live_examples.py to ensure their functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new sample files, live_audio_with_txt.py and live_txt_with_audio.py, demonstrating live audio capabilities of the GenAI SDK. Key feedback points include a critical bug in live_audio_with_txt.py that could lead to a runtime error, a high-severity performance issue in live_txt_with_audio.py due to a blocking network call in an async function, and several medium-severity suggestions to improve code clarity, consistency, and adherence to best practices, such as resolving variable shadowing and sorting dependencies and imports.

Comment on lines 64 to 67
if (
message.server_content.model_turn
and message.server_content.model_turn.parts
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This condition check is missing a validation for message.server_content before attempting to access its attributes. If message.server_content is None, this code will raise an AttributeError, causing a crash.

You should add a check for message.server_content to ensure the code is robust against different server responses.

Suggested change
if (
message.server_content.model_turn
and message.server_content.model_turn.parts
):
if (
message.server_content
and message.server_content.model_turn
and message.server_content.model_turn.parts
):


async with client.aio.live.connect(model=model, config=config) as session:
audio_url = "https://storage.googleapis.com/generativeai-downloads/data/16000.wav"
response = requests.get(audio_url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The call to requests.get() is synchronous and will block the asyncio event loop. This can severely impact the performance of an async application and negates the benefits of using asyncio. Please use an asynchronous HTTP client, such as httpx or aiohttp, to fetch the audio file.

For example, with httpx:

import httpx
# ...
async with httpx.AsyncClient() as client:
    r = await client.get(audio_url)
    r.raise_for_status()
    buffer = io.BytesIO(r.content)

This would also require adding httpx to requirements.txt.

Comment on lines +54 to +55
response = []

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable response is first used on line 37 for the HTTP response and is now being redefined to hold the list of text parts from the model. This is called variable shadowing and can make the code harder to understand and debug. Please use a different, more descriptive name for this list, for example model_response_parts.

response_parts = []

Comment on lines 0 to 5
google-genai==1.20.0
scipy==1.15.3
websockets==15.0.1 No newline at end of file
websockets==15.0.1
librosa==0.11.0
IPython==8.26.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability and to avoid potential dependency conflicts, it's a good practice to keep the list of dependencies in requirements.txt sorted alphabetically.

google-genai==1.20.0
IPython==8.26.0
librosa==0.11.0
scipy==1.15.3
websockets==15.0.1

Comment on lines +28 to +29
import live_txt_with_audio
import live_audio_with_txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to PEP 8, imports should be sorted alphabetically. Please sort these new import statements. For better consistency, consider sorting the entire import block in this file.

Suggested change
import live_txt_with_audio
import live_audio_with_txt
import live_audio_with_txt
import live_txt_with_audio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy