Can You Hear That?
In our last look at steganography, we took a stab at hiding text in an image file. Let’s take things a step further and try to hide text inside an audio file. This gives us a much higher potential for saving data. Sound waves are simply sine waves. To save these digitally, we mark where the wave is at set intervals, allowing it to be reconstructed later. This results in a great many samples.
To get started, I generated a simple WAV file in Audacity. It is a simple sine wave at 440hz (the musical note A above middle C), 1 second long, and saved in mono (one channel) at 8 bits per sample.
Our goals today:
- Load our WAV binary file.
- Create a robust header for our secret message.
- Modify the sound sample data to embed our payload.
- Write our modified WAV to a new file.
Loading Our Sound
A WAV file begins with a series of headers that define its properties: mono or stereo, sample rate, bit depth, etc. While the WAV format isn’t perfectly standardized, a common structure exists.
Using this reference, I created structs to represent the headers.
struct riff_header {
char chunk_id[4];
int chunk_size;
char format[4];
};
struct fmt_header {
char subchunk_id[4];
unsigned int subchunk_size;
unsigned short audio_format;
unsigned short channels;
unsigned int sample_rate;
unsigned int byte_rate;
unsigned short block_align;
unsigned short sample_bits;
};
struct data_header {
char subchunk_id[4];
unsigned int subchunk_size;
};
We can now read these headers directly from our WAV file. It’s critical to open
the file in binary mode (ios::binary) to prevent the operating system from
trying to interpret the data as text.
riff_header rHdr;
fmt_header fHdr;
data_header dHdr;
// Open the file for reading in binary mode
ifstream input("440-8bit.wav", ios::in | ios::binary);
if (!input.is_open()) {
cout << "Error: Not opened" << endl;
return 1;
}
// Read the headers
input.read(reinterpret_cast<char*>(&rHdr), sizeof(rHdr));
input.read(reinterpret_cast<char*>(&fHdr), sizeof(fHdr));
input.read(reinterpret_cast<char*>(&dHdr), sizeof(dHdr));
// Read all the audio sample data into a vector at once
vector<unsigned char> samples(dHdr.subchunk_size);
input.read(reinterpret_cast<char*>(samples.data()), dHdr.subchunk_size);
input.close();
Printing some of this data confirms we’ve read it correctly:
ID: RIFF
Channels: 1
Sample Rate: 44100
Bits Per Sample: 8
Size of Data Section: 44100
Perfect! We now have the WAV headers and all 44,100 audio samples loaded into memory, ready for modification.
Hiding the Data (Discreetly)
As we saw with images, simply modifying every single sample in a row can create a noticeable pattern. A better approach is to spread our secret bits out, skipping a certain number of samples between each modification.
To do this, we’ll create a small header that contains two key pieces of information:
- Message Length: How long the secret message is, so the decoder knows when to stop.
- Skip Rate: How many audio samples to skip between embedding each bit.
struct steganography_header {
uint32_t message_length; // Length of the secret message in bytes
uint8_t skip_rate; // Number of samples to skip between bits
};
Our process will be to first convert our header and message into a single stream of bits (the “payload”), and then embed that payload into the audio samples using our desired skip rate.
void encode_data(vector<unsigned char>& audio_samples, const string& message, uint8_t skip_rate) {
steganography_header header;
header.message_length = message.length();
header.skip_rate = skip_rate;
// Create a vector to hold all the bits of our payload (header + message)
vector<uint8_t> payload_bits;
// --- Add Header Bits ---
// Add 32 bits for the message_length
for (int i = 0; i < 32; ++i) {
payload_bits.push_back((header.message_length >> (31 - i)) & 1);
}
// Add 8 bits for the skip_rate
for (int i = 0; i < 8; ++i) {
payload_bits.push_back((header.skip_rate >> (7 - i)) & 1);
}
// --- Add Message Bits ---
for (char letter : message) {
for (int i = 0; i < 8; ++i) {
payload_bits.push_back((letter >> (7 - i)) & 1);
}
}
// --- Embed the payload into the audio samples ---
int sample_idx = 0;
for (uint8_t bit : payload_bits) {
if (sample_idx >= audio_samples.size()) break; // Stop if we run out of space
// Clear the least significant bit
audio_samples[sample_idx] &= 0b11111110;
// Set the least significant bit with our payload bit
audio_samples[sample_idx] |= bit;
// Move to the next sample position based on the skip rate
sample_idx += 1 + header.skip_rate;
}
}
This single function now cleanly prepares our payload and modifies the audio data in memory.
Standing Out (Or Not)

This works great, but how easily can this be seen? Modifying every single sample in a row isn’t the most discreet approach. However, by using a skip rate, we spread the changes out, making them much harder to detect.
While a hex viewer might show that some bytes have changed, the sparse nature of the changes makes it difficult to spot a pattern. This becomes even clearer when we visualize the waveform.

By spreading the data out the few modifications
become nearly unnoticeable compared to changing all samples in a row.
If this were a complex audio file, such as a song instead of a simple sine wave, the changes would be virtually impossible to spot visually or audibly. The small changes are even less noticeable to our ears when they are not grouped together.
Writing the Final File
After modifying the samples vector in memory, the final step is to write the
original headers and the newly-modified sample data to a new file.
// --- Main code ---
// (Load headers and original samples from "440-8bit.wav" as shown before)
string secret_message = "This is a hidden message in a WAV file.";
uint8_t skip_rate = 25; // Skip 25 samples between each hidden bit
// Modify the audio data in memory
encode_data(samples, secret_message, skip_rate);
// Open a new file for writing in binary mode
ofstream output("encoded.wav", ios::out | ios::binary);
// Write the original headers
output.write(reinterpret_cast<char*>(&rHdr), sizeof(rHdr));
output.write(reinterpret_cast<char*>(&fHdr), sizeof(fHdr));
output.write(reinterpret_cast<char*>(&dHdr), sizeof(dHdr));
// Write the MODIFIED sample data
output.write(reinterpret_cast<char*>(samples.data()), samples.size());
output.close();
What Now?
I would really like to make a mobile app out of this. That embeds text automatically in audio to be sent over chat apps and decodes it automatically on the other side. You could easily send voice messages to group chats, but only a certain individual would be able to read the secret message that they carry.
Maybe I’ll work on that…