Skip to content

Add parse_wav function for WAV header metadata extraction#1242

Merged
meta-codesync[bot] merged 1 commit into
mainfrom
parse_wav
Dec 13, 2025
Merged

Add parse_wav function for WAV header metadata extraction#1242
meta-codesync[bot] merged 1 commit into
mainfrom
parse_wav

Conversation

@mthrok

@mthrok mthrok commented Dec 11, 2025

Copy link
Copy Markdown
Collaborator

Audio processing workflows often need to inspect WAV file metadata (sample rate, channels, bit depth) without loading the entire audio data into memory. This is particularly useful for validation, preprocessing decisions, or building dataset catalogs where full audio decoding would be unnecessarily expensive.

This adds a new parse_wav() function that efficiently extracts WAV header information without decoding audio samples. The function returns a strongly-typed WAVHeader TypedDict containing all standard WAV metadata fields (audio_format, num_channels, sample_rate, byte_rate, block_align, bits_per_sample, data_size).

The implementation includes comprehensive test coverage (13 test cases) validating all header fields across different audio configurations, error handling for invalid data, and consistency checks against the existing load_wav function.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 11, 2025
@meta-codesync

meta-codesync Bot commented Dec 11, 2025

Copy link
Copy Markdown
Contributor

@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D89001254. (Because this pull request was imported automatically, there will not be any future comments.)

@mthrok mthrok force-pushed the parse_wav branch 4 times, most recently from 05b21de to 86ccaf4 Compare December 12, 2025 15:34
Audio processing workflows often need to inspect WAV file metadata (sample rate, channels, bit depth) without loading the entire audio data into memory. This is particularly useful for validation, preprocessing decisions, or building dataset catalogs where full audio decoding would be unnecessarily expensive.

This adds a new `parse_wav()` function that efficiently extracts WAV header information without decoding audio samples. The function returns a strongly-typed `WAVHeader` TypedDict containing all standard WAV metadata fields (`audio_format`, `num_channels`, `sample_rate`, `byte_rate`, `block_align`, `bits_per_sample`, `data_size`).

The implementation includes comprehensive test coverage (13 test cases) validating all header fields across different audio configurations, error handling for invalid data, and consistency checks against the existing `load_wav` function.
@mthrok mthrok marked this pull request as ready for review December 13, 2025 10:38
@meta-codesync meta-codesync Bot merged commit 54c3b3a into main Dec 13, 2025
205 of 207 checks passed
@mthrok mthrok deleted the parse_wav branch December 13, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant