Click here to Skip to main content
15,868,084 members
Articles / Programming Languages / C#
Article

A simple C# Wave editor, part 1: Background and analysis

Rate me:
Please Sign up or sign in to vote.
4.77/5 (80 votes)
5 Aug 200411 min read 354.6K   8.2K   132   67
The first phase of a RIFF/Wave editing "swiss army knife", in which we'll learn how to extract all the data present in common Wave files and store it in an XML document.

Command line help info

Introduction

Since the GIMP is freely available on Windows, some would say that it's pointless to keep writing basic open-source or freeware image editors. I'm not one of those people, mind you, but some questions keep occurring to me: Why are there so many image manipulation programs available and in development, especially when MS Paint is a vast improvement on many? Why are there so few audio editors, when there's a serious deficit of them and when such basic functionality as Wave file division is missing from the easily-available freeware?

My guess is that people feel more comfortable working with images. After all, there are loads of pre-existing image controls, and comparatively few audio ones. There also seems to be a continuing refrain throughout the coding community: "audio manipulation is a black art". With this series of articles, I'll describe the design and creation of a basic (but hopefully, robust and powerful) command line "swiss army knife" for audio file manipulation. We'll be building the tool modularly, so it should be easy for you to make (and hopefully release!) your own contributions.

In this article, we'll discuss the RIFF file format, and more specifically the PCM RIFF-wave. We'll detail the most common data structures that compose it, and briefly discuss the variants you might see. Finally, we'll develop a "profiler" that parses, loads into memory, and outputs as XML, the relevant file data.

Background

The Resource Interchange File Format

RIFF is an all-purpose multimedia file format created by Microsoft and IBM, way back in 1991. Wave audio isn't the only multimedia stored in a RIFF file; AVI video, too, uses the RIFF. (For more information on the history of RIFF and its Amiga ancestor, IFF, see Wikipedia.)

Every RIFF file starts with a header with three four-byte fields. The data structure is this:

C#
public string sGroupID;       //Surprisingly enough, this is always "RIFF"
public uint   dwFileLength;   //File length in bytes, measured from offset 8
public string sRiffType;      //In wave files, this is always "WAVE"

RIFFs are composed of sections called, awkwardly enough, "chunks". Each chunk starts with an eight-byte header:

C#
public string  sChunkID;       //Four bytes: "fmt ", "data", "fact", etc.
public uint    dwChunkSize;    //Length of header in bytes

The Joys of Proprietary File Formats

Unfortunately, while some official documents established the basics of the file format, no official standard was ever published for the wave file. In the absence of official documents, people did what they do best: improvise. As a result, there are many different chunk types, many of which duplicate and triplicate functionality. For the time being, we'll ignore most of these chunk types, and focus on the two that are guaranteed to be in every wave file: the format chunk and the data chunk.

Chunkin'

The format chunk details all the necessary information about the audio data, including the format of the audio (we assume, for now, uncompressed Pulse Code Modulation audio), the number of channels (mono, stereo, quadraphonic, 5-channel), the audio's frequency, the number of bits per audio sample (usually, 8 or 16), and the number of bytes in a frame. The data structure for this chunk is this:

C#
public string  sChunkID;        //Four bytes: "fmt "
public uint    dwChunkSize;     //Length of header in bytes
public ushort  wFormatTag;      //1 if uncompressed Microsoft PCM audio
public ushort  wChannels;       //Number of channels
public uint    dwSamplesPerSec; //Frequency of the audio in Hz
public uint    dwAvgBytesPerSec;//For estimating RAM allocation
public ushort  wBlockAlign;     //Sample frame size in bytes
public uint    dwBitsPerSample; //Bits per sample

Wait just a minute, you say. Frame? A frame is the same thing as a sample, but not the same thing as a sample. Understand? You've got to keep this straight; it's very important. OK, OK, I'll explain. A frame is one whole multichannel audio sample. The SamplesPerSecond field actually gives the number of frames per second. The BitsPerSample field, on the other hand, refers to the number of bits in a single channel of a sample.

The other chunk is even more essential: the data chunk. As you might expect, it contains all the PCM audio data. It has a very simple data structure:

C#
public string  sChunkID;       //Four bytes: "data"
public uint    dwChunkSize;    //Length of header in bytes
//Different arrays for the different frame sizes
public byte  [] byteArray;     //8 bit unsigned data; or...
public short [] shortArray;    //16 bit signed data

What effect does the signed/unsigned convention have on our data? Well, that's a very good question, and one we'll address when we take apart a sample file below.

The Joys of Proprietary File Formats, Reprise

There's even another complication, though: there's no guaranteed order for the chunk data! Since no standard was ever published, it's technically legal to put the data chunk, which stores the actual audio data, in front of the fmt (format) header which tells the user how to process it. Though this is never done, a well-written audio program will account for it anyway. A more common mistake made by new audio programmers is to assume that the fmt header comes first in a file; while this is mostly the case, there are several audio programs out there that generate non-compliant Wave files.

One final thing to note with regard to RIFF chunks: they must have an even number of bytes. In the case that a chunk has an odd number, it must be padded out with zeros. There's only one case in which this is possible, for our immediate purposes: the data stream of a 8-bit mono file.

Inside a wave file

Lilliput wars

This is all well and good, but what do things look like inside the file? Will the old Lilliputian struggle ("Big end!" "No, little end!") rear its sinister head again? If you guessed yes, you're right, and you can skip ahead two paragraphs. If you have no idea what those terms mean, here's a little digression.

Long ago, in a country called Intellia, a microprocessor designer said, "Fa! These Motorolia engineers have made things too easy with their Big-Endian memory storage. When their processors write an integer to disk, its bytes go, one by one, in order, onto the disk. The stack fills downwards. Things work the way an assembly programmer would enjoy. We must end this!" He stopped, for a moment, contemplating evil and counterintuitive ways of writing assembly code, to ponder a way to make memory storage more difficult and confusing for the programmer. "I have it!" he cried, "we'll make the stack fill upwards, and store sequential bytes... BACKWARD! Bwahahaha!"

Well, the true story has more to do with inconvenient things called "patents", but the important thing to remember is this: big-endian systems (using Motorola chips and their descendents) store data to disk in the same byte order they're arranged in memory. If you have a short value 0x4567 and write it to disk on a big-endian system, it will be stored on disk as 0x4567. On a little-endian system (using Intel chips and their relatives), it will be stored as 0x6745. The bits of each byte are in the same order, but the order of the bytes changes.

So, is the Wave file's data stored in little-endian or big-endian format? Given that the format was put together by Microsoft and IBM, it should be no surprise that it uses little-endian format for both field and audio data.

Taking a snapshot

In the image below, you see all the headers for the file: the three 32-bit (double-WORD) fields of the RIFF header (highlighted in red), the fields of the format chunk (highlighted in green), the fields of the fact chunk (in blue; see the end of this article for a very brief discussion of this chunk), and the very beginning of the data chunk (in yellow).

Image 2

When you convert the hex values to decimal, you should obtain the following values:

XML
  <?xml version="1.0" ?>
- <WaveFile xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <maindata>
  <sGroupID>RIFF</sGroupID>
  <dwFileLength>407534</dwFileLength>
  <sRiffType>WAVE</sRiffType>
  </maindata>
- <format>
  <sChunkID>fmt</sChunkID>
  <dwChunkSize>18</dwChunkSize>
  <wFormatTag>1</wFormatTag>
  <wChannels>2</wChannels>
  <dwSamplesPerSec>44100</dwSamplesPerSec>
  <dwAvgBytesPerSec>176400</dwAvgBytesPerSec>
  <wBlockAlign>4</wBlockAlign>
  <dwBitsPerSample>16</dwBitsPerSample>
  </format>
- <fact>
  <sChunkID>fact</sChunkID>
  <dwChunkSize>4</dwChunkSize>
  <dwNumSamples>101871</dwNumSamples>
  </fact>
- <data>
  <sChunkID>data</sChunkID>
  <dwChunkSize>407484</dwChunkSize>
  </data>
  </WaveFile>

[Why XML? Well, for one reason, because it's easy to navigate the output. Also, though, because an XML document may come in handy later on when we want to implement more advanced features -- the XML document may save peak values, a record of changes, etc. .NET's XML parsing methods make it a very convenient method of organized data storage and access. Finally, it's a good thing to have some experience with. CodeProject is all about learning, so if you've never worked with XML before, you have a reason to start now.]

Hopefully, you understand how we retrieved all the various information in the XML above. Just one more comment on the diagram above and we'll look at some of the actual code. As you can see, there is one final double-word after the chunkID and chunkSize double words in the data chunk. As you might guess, this is the first frame. As the file is 16-bit stereo, we know which bytes are what:

  • Again, F4 06 3E FF is the first frame.
  • 0x06F4 is the first sample in the left stereo channel.
  • 0xFF3E is the first sample in the right stereo channel.

It might be a useful exercise to figure out what these four bytes would mean in other configurations:

  • In 8-bit stereo, the first two frames would be [L: 0xF4 R: 0x06], [L: 0x3E R: 0xFF].
  • In 16-bit mono, the first two frames would be 0x06F4, 0xFF3E.
  • In 8-bit mono, the first four frames would be 0xF4, 0x06, 0x3E, 0xFF.

If you're interested in even more information on RIFF and RIFF/Wave files, check out the SonicSpot guide and an amateur (in the good sense!) attempt at a comprehensive RIFF/Wave specification.

If, on the other hand, you'd like to just get to the code, read on.

WaveEdit: a misnomer

Although we're going to call the software WaveEdit from the beginning, it might be more appropriately called "WaveInfo", since this first version merely gets the Wave data and writes it to XML. Before we dive into the code base, though, let's define a few requirements.

There are a few standard operations every decent audio editor must do: volume adjustment, file truncation, pitch/tempo control, and maybe fade-ins/fade-outs. In addition to these functions, which we'll be adding in the rest of the series, I'm adding Wave file division. If you've ever recorded on your own, or transferred a vinyl record or an audio tape to CD, you'll understand why this is necessary. It's a relatively straightforward function, but none of the currently available freeware audio software seems to support it.

Most of the code is very straightforward (and, I believe, fairly readable). We'll spotlight a few sections of code: the interesting sections, the confusing sections, and the sections I feel like discussing.

The first thing you'll notice in EntryPoint.cs is the initialization of the XML serializer. Let's put all the XML code in one place:

C#
XmlSerializer xmlout = new XmlSerializer(typeof(WaveFile));
Stream writer = new FileStream(args[1], FileMode.Create);
...
xmlout.Serialize(writer, contents);

The XML serializer creates a "template" for the XML file depending on the class type that's passed to it. We pass it WaveFile, which has the following class definition:

C#
public class WaveFile {
    public riffChunk maindata;
    public fmtChunk format;
    public factChunk fact;
    public dataChunk data;
}

These "chunkdata" data structures are defined in Structs.cs and contain (mostly) the data you've already seen. The riffChunk class includes a field to store the filename:

C#
public class riffChunk {
    public string FileName;
    //These three fields constitute the riff header
    public string sGroupID;         //RIFF
    public uint   dwFileLength;     //In bytes, measured from offset 8
    public string sRiffType;        //WAVE, usually
}

The dataChunk class includes four new fields:

C#
public class dataChunk {
    public string sChunkID;          //Four bytes: "data"
    public uint   dwChunkSize;       //Length of header
    public long   lFilePosition;     //Position of data chunk in file
    public uint   dwMinLength;       //Length of audio in minutes
    public double dSecLength;        //Length of audio in seconds
    public uint   dwNumSamples;      //Number of audio frames
}

lFilePosition is used to store the position of the beginning of the audio data in the file; this will aid us in editing later. dwMinLength and sSecLength are primarily for human benefit in the XML file. Finally, dwNumSamples duplicates a field from the fact header, which:

  1. isn't guaranteed to be present and
  2. is less convenient.

We use a custom FileReader called WaveFileReader to retrieve the data from the Wave files. In addition to conforming to good coding conventions, this streamlines the code: in the EntryPoint class, we just look at the "big picture", while in the WaveFileReader class, we only care what's going on in one small place at a time. The resulting code is very easy to understand:

C#
WaveFileReader reader = new WaveFileReader(args[0]);
WaveFile contents = new WaveFile();
contents.maindata = reader.ReadMainFileHeader();
contents.maindata.FileName = args[0];

How do we solve the problem of reading in chunks in a possibly random order? A while loop and a series of if statements setup will serve our purposes nicely:

C#
while (reader.GetPosition() < (long) contents.maindata.dwFileLength)
{
    temp = reader.GetChunkName();
    if (temp=="fmt ")
    {
        contents.format = reader.ReadFormatHeader();
        if (reader.GetPosition() + 
          contents.format.dwChunkSize == 
          contents.maindata.dwFileLength)
            break;
    }
    else if (temp=="fact")
    {
        contents.fact = reader.ReadFactHeader();
        if (reader.GetPosition() + 
          contents.fact.dwChunkSize == 
          contents.maindata.dwFileLength)
            break;
    }
    else if (temp=="data")
    {
        contents.data = reader.ReadDataHeader();
        if (reader.GetPosition() + 
          contents.data.dwChunkSize == 
          contents.maindata.dwFileLength)
            break;
    }
    else
    {    //This provides the required skipping of unsupported chunks.
        reader.AdvanceToNext();
    }
}

Finally, we'll dig into the WaveFileReader code. WaveFileReader has the same fields as WaveFile, for reasons that will soon become clear. There's also the BinaryReader reader, which is what we'll use to access the Wave file. We initialize reader with a custom constructor.

C#
public class WaveFileReader : IDisposable
{
    BinaryReader reader;
    riffChunk mainfile;
    fmtChunk format;
    factChunk fact;
    dataChunk data;

    public WaveFileReader(string filename)
    {
        reader = new BinaryReader(new FileStream(filename, 
         FileMode.Open, FileAccess.Read, FileShare.Read));
    }
}

None of the fields in WaveFileReader are public (that's what WaveFile is for!), so we need to write interface methods where appropriate. We especially need methods to deal with reader, since it's the most important piece of the whole structure. At the very minimum, we need functions to:

  • get the current position:
    C#
    public long GetPosition() { return reader.BaseStream.Position; }
  • get the next four characters in the file as a string:
    C#
    public string GetChunkName() { return new string(reader.ReadChars(4)); }
  • skip to the next chunk:
    C#
    public void AdvanceToNext() {
        //Get next chunk offset
        long NextOffset = (long) reader.ReadUInt32();
        //Seek to the next offset from current position
        reader.BaseStream.Seek(NextOffset,SeekOrigin.Current);
    }

These "general filestream" functions are in the General Utilities #region of WaveFileReader.cs.

Finally, we have the header extraction functions. These are largely the same, so we'll just look at the most complicated of them... which isn't very complicated after all.

C#
public dataChunk ReadDataHeader()
{
    data = new dataChunk();

    data.sChunkID = "data";
    data.dwChunkSize = reader.ReadUInt32();
    //ReadUInt32 is the most important function here.

    //Once we've read in the ChunkSize, 
    //we're at the start of the actual data.
    data.lFilePosition = reader.BaseStream.Position;

    //If the fact chunk exists, we don't have to calculate 
    //the number of samples ourselves.
    if (!fact.Equals(null))
        data.dwNumSamples = fact.dwNumSamples;
    else
        data.dwNumSamples = data.dwChunkSize / 
          (format.dwBitsPerSample/8 * format.wChannels);
    //The above could be written as data.dwChunkSize / format.wBlockAlign, 
    //but I want to emphasize
    //what the frames look like.

    data.dwMinLength = (data.dwChunkSize / format.dwAvgBytesPerSec) / 60;
    data.dSecLength = ((double)data.dwChunkSize / 
                      (double)format.dwAvgBytesPerSec) - 
                      (double)data.dwMinLength*60;
    return data;
}

Conclusion: Where do we go from here?

At this point, I have enough material and motivation for a three-part series. The next part will cover pitch and volume adjustment; the third will cover truncation and file division. If there's a lot of interest, though, the series can be extended to cover many things, from digital signal processing to fast Fourier transforms (for viewing the frequency spectra of the file). See you in part 2!

Appendix: The fact header

The fact header is alarmingly straightforward. The data structure is merely:

C#
public string sChunkID;            //Four bytes: "fact"
public uint   dwChunkSize;         //Length of header
public uint   dwNumSamples;        //Number of audio frames

The number of samples can be calculated by multiplying format.dwSamplesPerSecond by the length of the file in seconds.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Canada Canada
Jonathan Kade is a native of Detroit, MI. He's interested in multimedia, hardware/software interfacing, working with low-level data, and low-level programming in general.

Comments and Discussions

 
Suggestion[My vote of 1] go f***ill yourself rekt liar Pin
Member 1395515220-Aug-18 2:45
Member 1395515220-Aug-18 2:45 
GeneralMy vote of 5 Pin
MahdiMax15-Mar-16 4:25
MahdiMax15-Mar-16 4:25 
Questionexception error Pin
YDLU13-Feb-15 11:01
YDLU13-Feb-15 11:01 
QuestionFound a bug! Pin
MartinXLord21-Oct-11 23:28
MartinXLord21-Oct-11 23:28 
GeneralMy vote of 5 Pin
_teoma4-Nov-10 22:04
_teoma4-Nov-10 22:04 
Generalsource code does not work on Pin
auldh24-Mar-10 11:02
auldh24-Mar-10 11:02 
GeneralRe: source code does not work on Pin
joshua.sells3-Jan-11 4:43
joshua.sells3-Jan-11 4:43 
Generalcool! Pin
imaginetruth14-Sep-09 17:33
imaginetruth14-Sep-09 17:33 
QuestionWaiting for next parts. Coming soon? Pin
justanywhere9-Sep-09 13:13
justanywhere9-Sep-09 13:13 
GeneralMy vote of 1 Pin
louthy23-Jun-09 10:41
louthy23-Jun-09 10:41 
Generaldownload source Pin
rafaelverisys5-Jun-09 7:39
rafaelverisys5-Jun-09 7:39 
GeneralMy vote of 1 Pin
Alphons van der Heijden8-Feb-09 10:37
professionalAlphons van der Heijden8-Feb-09 10:37 
GeneralI don't think Part 2 and 3 are coming... Pin
trecool9992-Jan-09 13:33
trecool9992-Jan-09 13:33 
QuestionWhen is the next one coming? Pin
weedweaver17-Nov-08 1:31
weedweaver17-Nov-08 1:31 
Question3 part Series? Pin
Reese29-Sep-08 11:09
Reese29-Sep-08 11:09 
GeneralThanks, next please! Pin
HughJampton8-Aug-08 13:42
HughJampton8-Aug-08 13:42 
GeneralSilance Detection Pin
kaka sipahe17-Jul-08 4:35
kaka sipahe17-Jul-08 4:35 
QuestionHow to analyze volume for a stereo wave file Pin
eraseme225214-May-08 20:02
eraseme225214-May-08 20:02 
Questionpls help me voicexml recorded data upload or stored in folder without uploadcontrol in c# aspx. Pin
bruze5-Feb-08 22:02
bruze5-Feb-08 22:02 
GeneralRe: AdvanceToNext() Pin
Jonathan Kade23-Aug-07 4:46
Jonathan Kade23-Aug-07 4:46 
QuestionRe: AdvanceToNext() Pin
bruze5-Feb-08 21:55
bruze5-Feb-08 21:55 
QuestionHow can I transfer a wave file to serial port of computer? Pin
pegaah20-Jul-07 4:15
pegaah20-Jul-07 4:15 
Generalrecord voice using mic from a webpage Pin
Janang29-Jan-07 16:01
Janang29-Jan-07 16:01 
GeneralReading all of the data in a wav file Pin
john.lento22-Jan-07 18:48
john.lento22-Jan-07 18:48 
GeneralRe: Reading all of the data in a wav file Pin
vangurad6663-May-07 1:25
vangurad6663-May-07 1:25 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.