-
Notifications
You must be signed in to change notification settings - Fork 285
Add C API for WhisperPipeline #2414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
81ce0fc
Whisper C API
BrandonWeng e2dcb98
Match LLM pipeline standards
BrandonWeng 86afc96
Merge branch 'openvinotoolkit:master' into master
BrandonWeng 1f08477
Revert version
BrandonWeng 675016d
Revert backt o knownexception
BrandonWeng dccb28e
Revert beam group name change
BrandonWeng 09bf23b
Tests pass
BrandonWeng 4773c58
C API runs locally now
BrandonWeng 3a596ce
revert ov_genai_generation_config_set_num_beam_groups change
BrandonWeng c667f49
Fix parameter typo
BrandonWeng 3cd3086
Remove benchmarking code from samples
BrandonWeng e476414
Merge branch 'master' into master
BrandonWeng 02297e2
Exit code
BrandonWeng 40a7136
update CMakelist to match text generation
BrandonWeng 9ebcfa7
Addresss PR comments
BrandonWeng 991fd77
new line
BrandonWeng bae2c8f
Split into utils file
BrandonWeng e856514
Fix C_DIR
BrandonWeng 8dc2aca
Add utils to CmAke file
BrandonWeng 2f0fe2e
Simpler jobs
BrandonWeng f5a35fd
disable sdl
BrandonWeng 6a1665e
Fix
BrandonWeng f6eeca1
Fix build for mac
BrandonWeng 73e57e7
Merge branch 'openvinotoolkit:master' into master
BrandonWeng e73ca3f
Merge branch 'master' into fix-build
BrandonWeng 0442c83
Remove python wheel
BrandonWeng 2409e87
comment out wheel
BrandonWeng 86d60dd
new line
BrandonWeng 09f141f
Add MPI back
BrandonWeng 9346842
Install ov from pip
BrandonWeng 3811e89
Update pipe for macos
BrandonWeng 8b702ec
nightly
BrandonWeng 2fdce3e
nightly
BrandonWeng 714249a
Revert workflow jobs to master before merge
BrandonWeng edd788a
remove
BrandonWeng 213e3b3
Merge pull request #3 from FluidInference/merge-to-master-build-fix
BrandonWeng 4b102d7
Revert perf metric changes
BrandonWeng 55c7a8a
Fix build for windows
BrandonWeng a41d107
Move PI
BrandonWeng cc5c44f
Install in samples/ like other jobs
BrandonWeng ebae360
remove unused comments
BrandonWeng e1ba8db
Merge branch 'master' into master
BrandonWeng 8399306
newline
BrandonWeng 41d0ead
Merge branch 'master' into master
BrandonWeng 3a9d6a3
new lines
BrandonWeng a2ca12e
Merge branch 'master' into master
BrandonWeng d1446d4
Look for the cBinary in the right place for Python test
BrandonWeng 55c71b5
Fix test format and run with timestamps
BrandonWeng e5663f7
Merge branch 'master' into master
BrandonWeng 892eaf2
Merge branch 'master' into master
BrandonWeng 0a111be
Remove synthetic audio + m dep
BrandonWeng 8a6bf61
Merge branch 'master' into master
BrandonWeng b6529d6
Merge branch 'master' into master
BrandonWeng 2d80e36
revert formatting for ov_genai_generation_config_set_num_beam_groups
BrandonWeng abfaec3
goto error instead of continue for mem err
BrandonWeng fdb7ce9
Simplify samples
BrandonWeng e20c1c4
Arg count
BrandonWeng e97fe28
Fix build error
BrandonWeng 55a36e9
Add Readme
BrandonWeng 6742023
for eaxample
BrandonWeng 59189bf
Merge branch 'master' into master
BrandonWeng 2848ff8
Merge branch 'master' into master
BrandonWeng 1e274a6
Merge branch 'master' into master
BrandonWeng File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Copyright (C) 2025 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
find_package(OpenVINOGenAI REQUIRED | ||
PATHS | ||
"${CMAKE_BINARY_DIR}" # Reuse the package from the build. | ||
${OpenVINO_DIR} # GenAI may be installed alogside OpenVINO. | ||
NO_CMAKE_FIND_ROOT_PATH | ||
) | ||
|
||
# Whisper Speech Recognition Sample | ||
add_executable(whisper_speech_recognition_c whisper_speech_recognition.c whisper_utils.c) | ||
|
||
# Specifies that the source file should be compiled as a C source file | ||
set_source_files_properties(whisper_speech_recognition.c whisper_utils.c PROPERTIES LANGUAGE C) | ||
target_link_libraries(whisper_speech_recognition_c PRIVATE openvino::genai::c) | ||
|
||
set_target_properties(whisper_speech_recognition_c PROPERTIES | ||
# Ensure out-of-box LC_RPATH on macOS with SIP | ||
INSTALL_RPATH_USE_LINK_PATH ON) | ||
|
||
# Install | ||
install(TARGETS whisper_speech_recognition_c | ||
RUNTIME DESTINATION samples_bin/ | ||
COMPONENT samples_bin | ||
EXCLUDE_FROM_ALL) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
# Whisper Automatic Speech Recognition C Sample | ||
|
||
## Table of Contents | ||
|
||
1. [Download OpenVINO GenAI](#download-openvino-genai) | ||
2. [Build Samples](#build-samples) | ||
3. [Download and Convert the Model](#download-and-convert-the-model) | ||
4. [Prepare Audio File](#prepare-audio-file) | ||
5. [Sample Description](#sample-description) | ||
6. [Troubleshooting](#troubleshooting) | ||
7. [Support and Contribution](#support-and-contribution) | ||
|
||
## Download OpenVINO GenAI | ||
|
||
Download and extract [OpenVINO GenAI Archive](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_GENAI&VERSION=NIGHTLY&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE) Visit the OpenVINO Download Page. | ||
|
||
## Build Samples | ||
|
||
Set up the environment and build the samples Linux and macOS: | ||
|
||
```sh | ||
source <INSTALL_DIR>/setupvars.sh | ||
./<INSTALL_DIR>/samples/c/build_samples.sh | ||
``` | ||
|
||
Windows Command Prompt: | ||
|
||
```sh | ||
<INSTALL_DIR>\setupvars.bat | ||
<INSTALL_DIR>\samples\c\build_samples_msvc.bat | ||
``` | ||
|
||
Windows PowerShell: | ||
|
||
```sh | ||
.<INSTALL_DIR>\setupvars.ps1 | ||
.<INSTALL_DIR>\samples\c\build_samples.ps1 | ||
``` | ||
|
||
## Download and Convert the Model | ||
|
||
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version. | ||
|
||
Install [../../export-requirements.txt](../../export-requirements.txt) if model conversion is required. | ||
|
||
```sh | ||
pip install --upgrade-strategy eager -r ../../export-requirements.txt | ||
optimum-cli export openvino --trust-remote-code --model openai/whisper-tiny whisper-tiny | ||
``` | ||
|
||
If a converted model in OpenVINO IR format is available in the [OpenVINO optimized models](https://huggingface.co/OpenVINO) collection on Hugging Face, you can download it directly via huggingface-cli. | ||
|
||
For example: | ||
|
||
```sh | ||
pip install huggingface-hub | ||
huggingface-cli download OpenVINO/whisper-tiny-int8-ov --local-dir whisper-tiny-int8-ov | ||
``` | ||
|
||
## Prepare audio file | ||
|
||
Prepare audio file in wav format with sampling rate 16k Hz. | ||
|
||
You can download example audio file: https://storage.openvinotoolkit.org/models_contrib/speech/2021.2/librispeech_s5/how_are_you_doing_today.wav | ||
|
||
## Sample Description | ||
|
||
This example showcases inference of speech recognition Whisper Models using the OpenVINO GenAI C API. The sample features `ov_genai_whisper_pipeline` and uses audio files in WAV format as input. | ||
|
||
### Run Command | ||
|
||
```sh | ||
./whisper_speech_recognition_c <MODEL_DIR> "<WAV_FILE_PATH>" [DEVICE] | ||
``` | ||
|
||
### Parameters | ||
|
||
- `MODEL_DIR`: Path to the converted Whisper model directory | ||
- `WAV_FILE_PATH`: Path to the WAV audio file (use quotes if path contains spaces) | ||
- `DEVICE`: Optional - device to run inference on (default: "CPU") | ||
|
||
### Example Usage | ||
|
||
```sh | ||
./whisper_speech_recognition_c whisper-tiny how_are_you_doing_today.wav | ||
``` | ||
|
||
### Expected Output | ||
|
||
```text | ||
How are you doing today? | ||
timestamps: [0.00, 2.00] text: How are you doing today? | ||
``` | ||
|
||
The sample will: | ||
|
||
1. Load the WAV audio file and validate its format | ||
2. Automatically resample to 16kHz if needed | ||
3. Perform speech-to-text transcription | ||
4. Output the full transcription | ||
5. Display word-level timestamps for each text chunk | ||
|
||
## Troubleshooting | ||
|
||
### Empty or Incorrect Output | ||
|
||
If you get empty or incorrect transcription results: | ||
|
||
- Ensure your audio file is in WAV format | ||
- Check that the audio contains clear speech | ||
|
||
### Model Loading Errors | ||
|
||
If the model fails to load: | ||
|
||
- Verify the model path exists and contains valid Whisper model files | ||
- Ensure the model was properly converted to OpenVINO IR format | ||
- Check that the specified device (CPU, GPU, etc.) is available on your system | ||
|
||
### Audio File Errors | ||
|
||
The sample provides detailed error messages for common audio file issues: | ||
|
||
- File not found | ||
- Permission denied | ||
- Invalid WAV format | ||
- Unsupported audio encoding (only PCM is supported) | ||
- Multi-channel audio (only mono is supported) | ||
|
||
|
||
## Support and Contribution | ||
- For troubleshooting, consult the [OpenVINO documentation](https://docs.openvino.ai). | ||
- To report issues or contribute, visit the [GitHub repository](https://github.com/openvinotoolkit/openvino.genai). |
130 changes: 130 additions & 0 deletions
130
samples/c/whisper_speech_recognition/whisper_speech_recognition.c
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
// Copyright (C) 2025 Intel Corporation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
#include <stdio.h> | ||
#include <stdlib.h> | ||
#include <string.h> | ||
#include <time.h> | ||
|
||
#include "openvino/genai/c/whisper_pipeline.h" | ||
#include "whisper_utils.h" | ||
|
||
int main(int argc, char* argv[]) { | ||
as-suvorov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (argc != 3 && argc != 4) { | ||
fprintf(stderr, "Usage: %s <MODEL_DIR> \"<WAV_FILE_PATH>\" [DEVICE]\n", argv[0]); | ||
return EXIT_FAILURE; | ||
} | ||
|
||
const char* model_path = argv[1]; | ||
const char* wav_file_path = argv[2]; | ||
const char* device = (argc == 4) ? argv[3] : "CPU"; // Default to CPU if no device is provided | ||
|
||
int exit_code = EXIT_SUCCESS; | ||
|
||
ov_genai_whisper_pipeline* pipeline = NULL; | ||
ov_genai_whisper_generation_config* config = NULL; | ||
ov_genai_whisper_decoded_results* results = NULL; | ||
float* audio_data = NULL; | ||
float* resampled_audio = NULL; | ||
size_t audio_length = 0; | ||
char* output = NULL; | ||
size_t output_size = 0; | ||
|
||
float file_sample_rate; | ||
if (load_wav_file(wav_file_path, &audio_data, &audio_length, &file_sample_rate) != 0) { | ||
exit_code = EXIT_FAILURE; | ||
goto err; | ||
} | ||
|
||
if (file_sample_rate != 16000.0f) { | ||
size_t resampled_length; | ||
resampled_audio = resample_audio(audio_data, audio_length, file_sample_rate, 16000.0f, &resampled_length); | ||
if (!resampled_audio) { | ||
fprintf(stderr, "Error: Failed to resample audio\n"); | ||
exit_code = EXIT_FAILURE; | ||
goto err; | ||
} | ||
free(audio_data); | ||
audio_data = resampled_audio; | ||
audio_length = resampled_length; | ||
resampled_audio = NULL; | ||
} | ||
|
||
ov_status_e status = ov_genai_whisper_pipeline_create(model_path, device, 0, &pipeline); | ||
if (status != OK) { | ||
if (status == UNKNOW_EXCEPTION) { | ||
Wovchena marked this conversation as resolved.
Show resolved
Hide resolved
|
||
fprintf(stderr, "Error: Failed to create Whisper pipeline. Please check:\n"); | ||
fprintf(stderr, " - Model path exists and contains valid Whisper model files\n"); | ||
fprintf(stderr, " - Device '%s' is available and supported\n", device); | ||
fprintf(stderr, " - Model is compatible with OpenVINO GenAI\n"); | ||
} | ||
CHECK_STATUS(status); | ||
} | ||
|
||
CHECK_STATUS(ov_genai_whisper_generation_config_create(&config)); | ||
CHECK_STATUS(ov_genai_whisper_generation_config_set_task(config, "transcribe")); | ||
CHECK_STATUS(ov_genai_whisper_generation_config_set_return_timestamps(config, true)); | ||
CHECK_STATUS(ov_genai_whisper_pipeline_generate(pipeline, audio_data, audio_length, config, &results)); | ||
|
||
CHECK_STATUS(ov_genai_whisper_decoded_results_get_string(results, NULL, &output_size)); | ||
output = (char*)malloc(output_size); | ||
if (!output) { | ||
fprintf(stderr, "Error: Failed to allocate memory for output\n"); | ||
exit_code = EXIT_FAILURE; | ||
goto err; | ||
} | ||
|
||
CHECK_STATUS(ov_genai_whisper_decoded_results_get_string(results, output, &output_size)); | ||
printf("%s\n", output); | ||
|
||
bool has_chunks = false; | ||
CHECK_STATUS(ov_genai_whisper_decoded_results_has_chunks(results, &has_chunks)); | ||
|
||
if (has_chunks) { | ||
size_t chunks_count = 0; | ||
CHECK_STATUS(ov_genai_whisper_decoded_results_get_chunks_count(results, &chunks_count)); | ||
|
||
for (size_t i = 0; i < chunks_count; i++) { | ||
ov_genai_whisper_decoded_result_chunk* chunk = NULL; | ||
CHECK_STATUS(ov_genai_whisper_decoded_results_get_chunk_at(results, i, &chunk)); | ||
|
||
float start_ts = 0.0f, end_ts = 0.0f; | ||
CHECK_STATUS(ov_genai_whisper_decoded_result_chunk_get_start_ts(chunk, &start_ts)); | ||
CHECK_STATUS(ov_genai_whisper_decoded_result_chunk_get_end_ts(chunk, &end_ts)); | ||
|
||
size_t chunk_text_size = 0; | ||
CHECK_STATUS(ov_genai_whisper_decoded_result_chunk_get_text(chunk, NULL, &chunk_text_size)); | ||
|
||
char* chunk_text = (char*)malloc(chunk_text_size); | ||
if (!chunk_text) { | ||
fprintf(stderr, "Warning: Failed to allocate memory for chunk text %zu\n", i); | ||
ov_genai_whisper_decoded_result_chunk_free(chunk); | ||
exit_code = EXIT_FAILURE; | ||
goto err; | ||
} | ||
|
||
CHECK_STATUS(ov_genai_whisper_decoded_result_chunk_get_text(chunk, chunk_text, &chunk_text_size)); | ||
|
||
printf("timestamps: [%.2f, %.2f] text: %s\n", start_ts, end_ts, chunk_text); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing space in the sample output compared to python |
||
|
||
free(chunk_text); | ||
ov_genai_whisper_decoded_result_chunk_free(chunk); | ||
} | ||
} | ||
|
||
err: | ||
if (pipeline) | ||
ov_genai_whisper_pipeline_free(pipeline); | ||
if (config) | ||
ov_genai_whisper_generation_config_free(config); | ||
if (results) | ||
ov_genai_whisper_decoded_results_free(results); | ||
if (output) | ||
free(output); | ||
if (audio_data) | ||
free(audio_data); | ||
if (resampled_audio) | ||
free(resampled_audio); | ||
|
||
return exit_code; | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add README.md (I missed that earlier. I hope this is going to be the last change request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
Added README (based off the C++ whisper and the C LLM Pipeline one) :
55a36e9