-
Notifications
You must be signed in to change notification settings - Fork 6.6k
feat(genai): genai live audio #13520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Guiners, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the genai
samples by adding new examples that demonstrate real-time audio interactions with the Google GenAI live API. These additions cover both generating audio from text input and transcribing text from audio input, providing comprehensive examples for developers working with live audio streams.
Highlights
- New GenAI Live Audio Samples: Introduces two new Python samples (
live_audio_with_txt.py
andlive_txt_with_audio.py
) demonstrating real-time audio capabilities with the Google GenAI live API. - Text-to-Audio Generation: The
live_audio_with_txt.py
sample showcases how to send text input to the GenAI model and receive synthesized audio responses, utilizingIPython.display.Audio
for playback. - Audio-to-Text Generation: The
live_txt_with_audio.py
sample demonstrates sending an audio file to the GenAI model and receiving a text transcript, leveraginglibrosa
andsoundfile
for audio processing. - Dependency and Test Updates: Updates
requirements.txt
to includelibrosa
andIPython
for the new samples, and adds corresponding asynchronous tests intest_live_examples.py
to ensure their functionality.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two new sample files, live_audio_with_txt.py
and live_txt_with_audio.py
, demonstrating live audio capabilities of the GenAI SDK. Key feedback points include a critical bug in live_audio_with_txt.py
that could lead to a runtime error, a high-severity performance issue in live_txt_with_audio.py
due to a blocking network call in an async function, and several medium-severity suggestions to improve code clarity, consistency, and adherence to best practices, such as resolving variable shadowing and sorting dependencies and imports.
genai/live/live_audio_with_txt.py
Outdated
if ( | ||
message.server_content.model_turn | ||
and message.server_content.model_turn.parts | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition check is missing a validation for message.server_content
before attempting to access its attributes. If message.server_content
is None
, this code will raise an AttributeError
, causing a crash.
You should add a check for message.server_content
to ensure the code is robust against different server responses.
if ( | |
message.server_content.model_turn | |
and message.server_content.model_turn.parts | |
): | |
if ( | |
message.server_content | |
and message.server_content.model_turn | |
and message.server_content.model_turn.parts | |
): |
|
||
async with client.aio.live.connect(model=model, config=config) as session: | ||
audio_url = "https://storage.googleapis.com/generativeai-downloads/data/16000.wav" | ||
response = requests.get(audio_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to requests.get()
is synchronous and will block the asyncio event loop. This can severely impact the performance of an async application and negates the benefits of using asyncio
. Please use an asynchronous HTTP client, such as httpx
or aiohttp
, to fetch the audio file.
For example, with httpx
:
import httpx
# ...
async with httpx.AsyncClient() as client:
r = await client.get(audio_url)
r.raise_for_status()
buffer = io.BytesIO(r.content)
This would also require adding httpx
to requirements.txt
.
response = [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable response
is first used on line 37 for the HTTP response and is now being redefined to hold the list of text parts from the model. This is called variable shadowing and can make the code harder to understand and debug. Please use a different, more descriptive name for this list, for example model_response_parts
.
response_parts = []
genai/live/requirements.txt
Outdated
google-genai==1.20.0 | ||
scipy==1.15.3 | ||
websockets==15.0.1 No newline at end of file | ||
websockets==15.0.1 | ||
librosa==0.11.0 | ||
IPython==8.26.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import live_txt_with_audio | ||
import live_audio_with_txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to PEP 8, imports should be sorted alphabetically. Please sort these new import statements. For better consistency, consider sorting the entire import block in this file.
import live_txt_with_audio | |
import live_audio_with_txt | |
import live_audio_with_txt | |
import live_txt_with_audio |
Description
Fixes #
Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.
Checklist
nox -s py-3.9
(see Test Environment Setup)nox -s lint
(see Test Environment Setup)