Click here to Skip to main content
15,867,835 members
Articles / Programming Languages / C#
Article

A simple C# Wave editor, part 1: Background and analysis

Rate me:
Please Sign up or sign in to vote.
4.77/5 (80 votes)
5 Aug 200411 min read 354.6K   8.2K   132   67
The first phase of a RIFF/Wave editing "swiss army knife", in which we'll learn how to extract all the data present in common Wave files and store it in an XML document.

Command line help info

Introduction

Since the GIMP is freely available on Windows, some would say that it's pointless to keep writing basic open-source or freeware image editors. I'm not one of those people, mind you, but some questions keep occurring to me: Why are there so many image manipulation programs available and in development, especially when MS Paint is a vast improvement on many? Why are there so few audio editors, when there's a serious deficit of them and when such basic functionality as Wave file division is missing from the easily-available freeware?

My guess is that people feel more comfortable working with images. After all, there are loads of pre-existing image controls, and comparatively few audio ones. There also seems to be a continuing refrain throughout the coding community: "audio manipulation is a black art". With this series of articles, I'll describe the design and creation of a basic (but hopefully, robust and powerful) command line "swiss army knife" for audio file manipulation. We'll be building the tool modularly, so it should be easy for you to make (and hopefully release!) your own contributions.

In this article, we'll discuss the RIFF file format, and more specifically the PCM RIFF-wave. We'll detail the most common data structures that compose it, and briefly discuss the variants you might see. Finally, we'll develop a "profiler" that parses, loads into memory, and outputs as XML, the relevant file data.

Background

The Resource Interchange File Format

RIFF is an all-purpose multimedia file format created by Microsoft and IBM, way back in 1991. Wave audio isn't the only multimedia stored in a RIFF file; AVI video, too, uses the RIFF. (For more information on the history of RIFF and its Amiga ancestor, IFF, see Wikipedia.)

Every RIFF file starts with a header with three four-byte fields. The data structure is this:

C#
public string sGroupID;       //Surprisingly enough, this is always "RIFF"
public uint   dwFileLength;   //File length in bytes, measured from offset 8
public string sRiffType;      //In wave files, this is always "WAVE"

RIFFs are composed of sections called, awkwardly enough, "chunks". Each chunk starts with an eight-byte header:

C#
public string  sChunkID;       //Four bytes: "fmt ", "data", "fact", etc.
public uint    dwChunkSize;    //Length of header in bytes

The Joys of Proprietary File Formats

Unfortunately, while some official documents established the basics of the file format, no official standard was ever published for the wave file. In the absence of official documents, people did what they do best: improvise. As a result, there are many different chunk types, many of which duplicate and triplicate functionality. For the time being, we'll ignore most of these chunk types, and focus on the two that are guaranteed to be in every wave file: the format chunk and the data chunk.

Chunkin'

The format chunk details all the necessary information about the audio data, including the format of the audio (we assume, for now, uncompressed Pulse Code Modulation audio), the number of channels (mono, stereo, quadraphonic, 5-channel), the audio's frequency, the number of bits per audio sample (usually, 8 or 16), and the number of bytes in a frame. The data structure for this chunk is this:

C#
public string  sChunkID;        //Four bytes: "fmt "
public uint    dwChunkSize;     //Length of header in bytes
public ushort  wFormatTag;      //1 if uncompressed Microsoft PCM audio
public ushort  wChannels;       //Number of channels
public uint    dwSamplesPerSec; //Frequency of the audio in Hz
public uint    dwAvgBytesPerSec;//For estimating RAM allocation
public ushort  wBlockAlign;     //Sample frame size in bytes
public uint    dwBitsPerSample; //Bits per sample

Wait just a minute, you say. Frame? A frame is the same thing as a sample, but not the same thing as a sample. Understand? You've got to keep this straight; it's very important. OK, OK, I'll explain. A frame is one whole multichannel audio sample. The SamplesPerSecond field actually gives the number of frames per second. The BitsPerSample field, on the other hand, refers to the number of bits in a single channel of a sample.

The other chunk is even more essential: the data chunk. As you might expect, it contains all the PCM audio data. It has a very simple data structure:

C#
public string  sChunkID;       //Four bytes: "data"
public uint    dwChunkSize;    //Length of header in bytes
//Different arrays for the different frame sizes
public byte  [] byteArray;     //8 bit unsigned data; or...
public short [] shortArray;    //16 bit signed data

What effect does the signed/unsigned convention have on our data? Well, that's a very good question, and one we'll address when we take apart a sample file below.

The Joys of Proprietary File Formats, Reprise

There's even another complication, though: there's no guaranteed order for the chunk data! Since no standard was ever published, it's technically legal to put the data chunk, which stores the actual audio data, in front of the fmt (format) header which tells the user how to process it. Though this is never done, a well-written audio program will account for it anyway. A more common mistake made by new audio programmers is to assume that the fmt header comes first in a file; while this is mostly the case, there are several audio programs out there that generate non-compliant Wave files.

One final thing to note with regard to RIFF chunks: they must have an even number of bytes. In the case that a chunk has an odd number, it must be padded out with zeros. There's only one case in which this is possible, for our immediate purposes: the data stream of a 8-bit mono file.

Inside a wave file

Lilliput wars

This is all well and good, but what do things look like inside the file? Will the old Lilliputian struggle ("Big end!" "No, little end!") rear its sinister head again? If you guessed yes, you're right, and you can skip ahead two paragraphs. If you have no idea what those terms mean, here's a little digression.

Long ago, in a country called Intellia, a microprocessor designer said, "Fa! These Motorolia engineers have made things too easy with their Big-Endian memory storage. When their processors write an integer to disk, its bytes go, one by one, in order, onto the disk. The stack fills downwards. Things work the way an assembly programmer would enjoy. We must end this!" He stopped, for a moment, contemplating evil and counterintuitive ways of writing assembly code, to ponder a way to make memory storage more difficult and confusing for the programmer. "I have it!" he cried, "we'll make the stack fill upwards, and store sequential bytes... BACKWARD! Bwahahaha!"

Well, the true story has more to do with inconvenient things called "patents", but the important thing to remember is this: big-endian systems (using Motorola chips and their descendents) store data to disk in the same byte order they're arranged in memory. If you have a short value 0x4567 and write it to disk on a big-endian system, it will be stored on disk as 0x4567. On a little-endian system (using Intel chips and their relatives), it will be stored as 0x6745. The bits of each byte are in the same order, but the order of the bytes changes.

So, is the Wave file's data stored in little-endian or big-endian format? Given that the format was put together by Microsoft and IBM, it should be no surprise that it uses little-endian format for both field and audio data.

Taking a snapshot

In the image below, you see all the headers for the file: the three 32-bit (double-WORD) fields of the RIFF header (highlighted in red), the fields of the format chunk (highlighted in green), the fields of the fact chunk (in blue; see the end of this article for a very brief discussion of this chunk), and the very beginning of the data chunk (in yellow).

Image 2

When you convert the hex values to decimal, you should obtain the following values:

XML
  <?xml version="1.0" ?>
- <WaveFile xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <maindata>
  <sGroupID>RIFF</sGroupID>
  <dwFileLength>407534</dwFileLength>
  <sRiffType>WAVE</sRiffType>
  </maindata>
- <format>
  <sChunkID>fmt</sChunkID>
  <dwChunkSize>18</dwChunkSize>
  <wFormatTag>1</wFormatTag>
  <wChannels>2</wChannels>
  <dwSamplesPerSec>44100</dwSamplesPerSec>
  <dwAvgBytesPerSec>176400</dwAvgBytesPerSec>
  <wBlockAlign>4</wBlockAlign>
  <dwBitsPerSample>16</dwBitsPerSample>
  </format>
- <fact>
  <sChunkID>fact</sChunkID>
  <dwChunkSize>4</dwChunkSize>
  <dwNumSamples>101871</dwNumSamples>
  </fact>
- <data>
  <sChunkID>data</sChunkID>
  <dwChunkSize>407484</dwChunkSize>
  </data>
  </WaveFile>

[Why XML? Well, for one reason, because it's easy to navigate the output. Also, though, because an XML document may come in handy later on when we want to implement more advanced features -- the XML document may save peak values, a record of changes, etc. .NET's XML parsing methods make it a very convenient method of organized data storage and access. Finally, it's a good thing to have some experience with. CodeProject is all about learning, so if you've never worked with XML before, you have a reason to start now.]

Hopefully, you understand how we retrieved all the various information in the XML above. Just one more comment on the diagram above and we'll look at some of the actual code. As you can see, there is one final double-word after the chunkID and chunkSize double words in the data chunk. As you might guess, this is the first frame. As the file is 16-bit stereo, we know which bytes are what:

  • Again, F4 06 3E FF is the first frame.
  • 0x06F4 is the first sample in the left stereo channel.
  • 0xFF3E is the first sample in the right stereo channel.

It might be a useful exercise to figure out what these four bytes would mean in other configurations:

  • In 8-bit stereo, the first two frames would be [L: 0xF4 R: 0x06], [L: 0x3E R: 0xFF].
  • In 16-bit mono, the first two frames would be 0x06F4, 0xFF3E.
  • In 8-bit mono, the first four frames would be 0xF4, 0x06, 0x3E, 0xFF.

If you're interested in even more information on RIFF and RIFF/Wave files, check out the SonicSpot guide and an amateur (in the good sense!) attempt at a comprehensive RIFF/Wave specification.

If, on the other hand, you'd like to just get to the code, read on.

WaveEdit: a misnomer

Although we're going to call the software WaveEdit from the beginning, it might be more appropriately called "WaveInfo", since this first version merely gets the Wave data and writes it to XML. Before we dive into the code base, though, let's define a few requirements.

There are a few standard operations every decent audio editor must do: volume adjustment, file truncation, pitch/tempo control, and maybe fade-ins/fade-outs. In addition to these functions, which we'll be adding in the rest of the series, I'm adding Wave file division. If you've ever recorded on your own, or transferred a vinyl record or an audio tape to CD, you'll understand why this is necessary. It's a relatively straightforward function, but none of the currently available freeware audio software seems to support it.

Most of the code is very straightforward (and, I believe, fairly readable). We'll spotlight a few sections of code: the interesting sections, the confusing sections, and the sections I feel like discussing.

The first thing you'll notice in EntryPoint.cs is the initialization of the XML serializer. Let's put all the XML code in one place:

C#
XmlSerializer xmlout = new XmlSerializer(typeof(WaveFile));
Stream writer = new FileStream(args[1], FileMode.Create);
...
xmlout.Serialize(writer, contents);

The XML serializer creates a "template" for the XML file depending on the class type that's passed to it. We pass it WaveFile, which has the following class definition:

C#
public class WaveFile {
    public riffChunk maindata;
    public fmtChunk format;
    public factChunk fact;
    public dataChunk data;
}

These "chunkdata" data structures are defined in Structs.cs and contain (mostly) the data you've already seen. The riffChunk class includes a field to store the filename:

C#
public class riffChunk {
    public string FileName;
    //These three fields constitute the riff header
    public string sGroupID;         //RIFF
    public uint   dwFileLength;     //In bytes, measured from offset 8
    public string sRiffType;        //WAVE, usually
}

The dataChunk class includes four new fields:

C#
public class dataChunk {
    public string sChunkID;          //Four bytes: "data"
    public uint   dwChunkSize;       //Length of header
    public long   lFilePosition;     //Position of data chunk in file
    public uint   dwMinLength;       //Length of audio in minutes
    public double dSecLength;        //Length of audio in seconds
    public uint   dwNumSamples;      //Number of audio frames
}

lFilePosition is used to store the position of the beginning of the audio data in the file; this will aid us in editing later. dwMinLength and sSecLength are primarily for human benefit in the XML file. Finally, dwNumSamples duplicates a field from the fact header, which:

  1. isn't guaranteed to be present and
  2. is less convenient.

We use a custom FileReader called WaveFileReader to retrieve the data from the Wave files. In addition to conforming to good coding conventions, this streamlines the code: in the EntryPoint class, we just look at the "big picture", while in the WaveFileReader class, we only care what's going on in one small place at a time. The resulting code is very easy to understand:

C#
WaveFileReader reader = new WaveFileReader(args[0]);
WaveFile contents = new WaveFile();
contents.maindata = reader.ReadMainFileHeader();
contents.maindata.FileName = args[0];

How do we solve the problem of reading in chunks in a possibly random order? A while loop and a series of if statements setup will serve our purposes nicely:

C#
while (reader.GetPosition() < (long) contents.maindata.dwFileLength)
{
    temp = reader.GetChunkName();
    if (temp=="fmt ")
    {
        contents.format = reader.ReadFormatHeader();
        if (reader.GetPosition() + 
          contents.format.dwChunkSize == 
          contents.maindata.dwFileLength)
            break;
    }
    else if (temp=="fact")
    {
        contents.fact = reader.ReadFactHeader();
        if (reader.GetPosition() + 
          contents.fact.dwChunkSize == 
          contents.maindata.dwFileLength)
            break;
    }
    else if (temp=="data")
    {
        contents.data = reader.ReadDataHeader();
        if (reader.GetPosition() + 
          contents.data.dwChunkSize == 
          contents.maindata.dwFileLength)
            break;
    }
    else
    {    //This provides the required skipping of unsupported chunks.
        reader.AdvanceToNext();
    }
}

Finally, we'll dig into the WaveFileReader code. WaveFileReader has the same fields as WaveFile, for reasons that will soon become clear. There's also the BinaryReader reader, which is what we'll use to access the Wave file. We initialize reader with a custom constructor.

C#
public class WaveFileReader : IDisposable
{
    BinaryReader reader;
    riffChunk mainfile;
    fmtChunk format;
    factChunk fact;
    dataChunk data;

    public WaveFileReader(string filename)
    {
        reader = new BinaryReader(new FileStream(filename, 
         FileMode.Open, FileAccess.Read, FileShare.Read));
    }
}

None of the fields in WaveFileReader are public (that's what WaveFile is for!), so we need to write interface methods where appropriate. We especially need methods to deal with reader, since it's the most important piece of the whole structure. At the very minimum, we need functions to:

  • get the current position:
    C#
    public long GetPosition() { return reader.BaseStream.Position; }
  • get the next four characters in the file as a string:
    C#
    public string GetChunkName() { return new string(reader.ReadChars(4)); }
  • skip to the next chunk:
    C#
    public void AdvanceToNext() {
        //Get next chunk offset
        long NextOffset = (long) reader.ReadUInt32();
        //Seek to the next offset from current position
        reader.BaseStream.Seek(NextOffset,SeekOrigin.Current);
    }

These "general filestream" functions are in the General Utilities #region of WaveFileReader.cs.

Finally, we have the header extraction functions. These are largely the same, so we'll just look at the most complicated of them... which isn't very complicated after all.

C#
public dataChunk ReadDataHeader()
{
    data = new dataChunk();

    data.sChunkID = "data";
    data.dwChunkSize = reader.ReadUInt32();
    //ReadUInt32 is the most important function here.

    //Once we've read in the ChunkSize, 
    //we're at the start of the actual data.
    data.lFilePosition = reader.BaseStream.Position;

    //If the fact chunk exists, we don't have to calculate 
    //the number of samples ourselves.
    if (!fact.Equals(null))
        data.dwNumSamples = fact.dwNumSamples;
    else
        data.dwNumSamples = data.dwChunkSize / 
          (format.dwBitsPerSample/8 * format.wChannels);
    //The above could be written as data.dwChunkSize / format.wBlockAlign, 
    //but I want to emphasize
    //what the frames look like.

    data.dwMinLength = (data.dwChunkSize / format.dwAvgBytesPerSec) / 60;
    data.dSecLength = ((double)data.dwChunkSize / 
                      (double)format.dwAvgBytesPerSec) - 
                      (double)data.dwMinLength*60;
    return data;
}

Conclusion: Where do we go from here?

At this point, I have enough material and motivation for a three-part series. The next part will cover pitch and volume adjustment; the third will cover truncation and file division. If there's a lot of interest, though, the series can be extended to cover many things, from digital signal processing to fast Fourier transforms (for viewing the frequency spectra of the file). See you in part 2!

Appendix: The fact header

The fact header is alarmingly straightforward. The data structure is merely:

C#
public string sChunkID;            //Four bytes: "fact"
public uint   dwChunkSize;         //Length of header
public uint   dwNumSamples;        //Number of audio frames

The number of samples can be calculated by multiplying format.dwSamplesPerSecond by the length of the file in seconds.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Canada Canada
Jonathan Kade is a native of Detroit, MI. He's interested in multimedia, hardware/software interfacing, working with low-level data, and low-level programming in general.

Comments and Discussions

 
Questionreading wav file data from data chunk Pin
Member 353221112-Jan-07 12:17
Member 353221112-Jan-07 12:17 
Generalwaiting for part 2 Pin
ibocolo3-Dec-06 22:55
ibocolo3-Dec-06 22:55 
QuestionWaveFileWriter? Pin
bigmimmo27-Oct-06 23:50
bigmimmo27-Oct-06 23:50 
QuestionHow to read a ra file. Pin
Kiran-DVS5-Apr-06 22:55
Kiran-DVS5-Apr-06 22:55 
QuestionCan you please post or send me Part 1 with updates? Pin
ponchorage3-Apr-06 18:33
ponchorage3-Apr-06 18:33 
GeneralNice one Pin
tdmusic10-Feb-06 8:36
tdmusic10-Feb-06 8:36 
QuestionConcatenating wav files Pin
mwenda23-Jan-06 7:00
mwenda23-Jan-06 7:00 
AnswerRe: Concatenating wav files Pin
Jonathan Kade24-Jan-06 10:39
Jonathan Kade24-Jan-06 10:39 
Fortunately, concatenation is about the easiest thing we can do Smile | :)

All you need to do is read in the files, persisting the data that won't change (such as frame width, # of channels, etc.) and dropping the data that will (such as data chunk length). The main problem is concatenating the raw data chunks -- but assuming your bitrate/number of channels doesn't change, that shouldn't be a problem. Once you have the complete raw data segment, you can calculate the remaining header data. At that point, it's just a matter of writing out the chunks correctly.
GeneralRe: Concatenating wav files [modified] Pin
vkkishore_s19-Jul-06 11:13
vkkishore_s19-Jul-06 11:13 
QuestionRe: Concatenating wav files Pin
bobbreton13-Jan-07 20:44
bobbreton13-Jan-07 20:44 
GeneralDivision of wave into two parts Pin
bccinlove6-Dec-05 2:07
bccinlove6-Dec-05 2:07 
AnswerRe: Division of wave into two parts Pin
Jonathan Kade6-Dec-05 17:16
Jonathan Kade6-Dec-05 17:16 
QuestionRe: Division of wave into two parts Pin
jasonxz7-Mar-06 3:25
jasonxz7-Mar-06 3:25 
QuestionHave you tried to run this code? Pin
tomiroska6-Nov-05 8:13
tomiroska6-Nov-05 8:13 
AnswerRe: Have you tried to run this code? Pin
Jonathan Kade6-Nov-05 15:35
Jonathan Kade6-Nov-05 15:35 
GeneralWaveform audio interface component for .NET Pin
AdamSlosarski4-Nov-05 1:27
AdamSlosarski4-Nov-05 1:27 
QuestionPart 2?? Pin
randallh1-Nov-05 6:36
randallh1-Nov-05 6:36 
AnswerRe: Part 2?? Pin
Jonathan Kade6-Nov-05 16:19
Jonathan Kade6-Nov-05 16:19 
GeneralFourier Transformation Pin
yoxi28-Apr-05 5:17
yoxi28-Apr-05 5:17 
GeneralRe: Fourier Transformation Pin
Jonathan Kade12-May-05 11:36
Jonathan Kade12-May-05 11:36 
GeneralRe: Fourier Transformation Pin
rotk26-May-05 21:52
rotk26-May-05 21:52 
GeneralResample wav Pin
zuken2121-Mar-05 15:34
zuken2121-Mar-05 15:34 
GeneralI like it, and I want more Pin
AnonymousJoe2-Jan-05 9:09
sussAnonymousJoe2-Jan-05 9:09 
GeneralRe: I like it, and I want more Pin
Jonathan Kade5-Jan-05 15:58
Jonathan Kade5-Jan-05 15:58 
QuestionWere part 2 and 3 ever released? Pin
Anonymous7-Dec-04 13:31
Anonymous7-Dec-04 13:31 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.