Click here to Skip to main content
15,868,016 members
Articles / Programming Languages / C#

Subtitle Synchronization with C#

Rate me:
Please Sign up or sign in to vote.
4.67/5 (12 votes)
26 Jan 2009CPOL2 min read 52.6K   2K   24   8
Demonstrates regular expression use for subtitles synchronization

Introduction

This article shows an example of text file manipulation techniques and regular expression use for subtitle synchronization.

Background

Yesterday I watched "P.S. I love you", and I love it (no pun intended). I downloaded it and came across a subtitle file with correct translation, but it was out-of-sync. So I started to write a small program to fix that.

In my research, I found nice programs to do that, but it was early Sunday and I had nothing better to do. You know how it ends.

Using the Code

This is a console application which receives a filename. I expect that to be a regular .SRT (SubRip) file, so it's a text file with the following structure:

203
00:16:38,731 --> 00:16:41,325
<i>Happy Christmas, your arse
I pray God it's our last</i>

So we have a sequential number, start and end time for display and the text; those blocks are separated by an empty line. In my case, I just wanted to add a time offset.

I start creating a regular expression for that pattern:

C#
private static Regex unit = new Regex(
   @"(?<sequence>\d+)\r\n(?<start>\d{2}\:\d{2}\:\d{2},\d{3}) --\> " + 
   @"(?<end>\d{2}\:\d{2}\:\d{2},\d{3})\r\n(?<text>[\s\S]*?\r\n\r\n)", 
   RegexOptions.Compiled | RegexOptions.ECMAScript);

I used named matches (?<name>) to identify the relevant parts from text units, so we can read:

  • the text unit sequence, as one or more numbers, and a line break
  • start and end time, formatted as HH:mm:ss,fff, and a line break
  • legend text and the end of that block (two consecutive line breaks)

Believe me, that's the "hardest" part. Let's ask for time offset:

C#
double offset = 0;
Console.Write("offset, in seconds (+1.1, -2.75): ");
while (!Double.TryParse(Console.ReadLine(), out offset))
{
    Console.WriteLine("Invalid value, try again");
}

Note the Double.TryParse. As you probably can imagine, it's trying to parse a string into a double value, but don't throw exceptions. It's very useful when you ask for input values and can speed up the code execution.

Now we just need to read one file and write another one.

C#
using (StreamReader input = new StreamReader(args[0], Encoding.Default))
{
    using (StreamWriter output = 
       new StreamWriter(args[0] + ".srt", false, Encoding.Default))
    {
        output.Write(
            unit.Replace(input.ReadToEnd(), delegate(Match m)
            {
                return m.Value.Replace(
                    String.Format("{0}\r\n{1} --> {2}\r\n",
                        m.Groups["sequence"].Value,
                        m.Groups["start"   ].Value,
                        m.Groups["end"     ].Value),
                    String.Format(
                        "{0}\r\n{1:HH\\:mm\\:ss\\,fff} --> " + 
                        "{2:HH\\:mm\\:ss\\,fff}\r\n",
                        sequence++,
                        DateTime.Parse(m.Groups["start"].Value.Replace(",","."))
                                .AddSeconds(offset),
                        DateTime.Parse(m.Groups["end"  ].Value.Replace(",","."))
                                .AddSeconds(offset)));
            }));
    }
}

So, read an entire input file into memory and replace, one unit at time, the original time by new ones, adding the offset. Write everything in the output file and you are good to go.

Points of Interest

The most interesting part was the time offset formatting. Seems .SRT file format uses a comma as milliseconds separator, so I spent another replace inside the MatchEvaluator delegate to fix that.

Another point was the file encoding. The trick was to define the encoding as default and make sure it was the same output file encoding, so the accents could be correctly preserved.

Besides that, this program ran correctly in the first use.

History

  • 1.0: Initial version
  • 1.1: Fixed some typos and added some external links

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Brazil Brazil
MCAD, MCSD, MCDBA, MCPD, MCTS
Brazilian developer and freelancer. Currently interested in programming frameworks, code generation and development productivity.

Comments and Discussions

 
QuestionRead Specific Line of Srt file With İndex Pin
Umut Comlekcioglu5-May-15 20:56
professionalUmut Comlekcioglu5-May-15 20:56 
SuggestionEncoding Pin
Ivandro Ismael15-Jan-14 21:12
Ivandro Ismael15-Jan-14 21:12 
QuestionGreat Article Pin
Ha-Ahmadi9-Aug-13 6:58
Ha-Ahmadi9-Aug-13 6:58 
AnswerRe: Great Article Pin
Ivandro Ismael15-Jan-14 21:14
Ivandro Ismael15-Jan-14 21:14 
GeneralMisc note, regarding decimal separator Pin
_groo_26-Jan-09 21:15
_groo_26-Jan-09 21:15 
AnswerRe: Misc note, regarding decimal separator Pin
RubensFarias27-Jan-09 11:39
RubensFarias27-Jan-09 11:39 
Hi Vekipeki, thanks for your comment!

I live in Brazil (pt-br) but my regional settings are en-us, but your reply made me think; so I ran two tests:

C#
DateTime.Parse("20:24:53.3552", new CultureInfo("en-us"));
DateTime.Parse("20:24:53.3552", new CultureInfo("pt-br"));
and

C#
DateTime.Parse("20:24:53,3552", new CultureInfo("en-us"));
DateTime.Parse("20:24:53,3552", new CultureInfo("pt-br"));
In first block, both lines ran without problem, but both lines within second block threw FormatExceptions. So I conclude the correct milisseconds separator is ".", whatever the current regional/culture settings.

What do you think? There are another tests?

Best regards,

Rubens
GeneralRe: Misc note, regarding decimal separator Pin
_groo_7-Apr-09 3:15
_groo_7-Apr-09 3:15 
GeneralRe: Misc note, regarding decimal separator Pin
RubensFarias7-Apr-09 13:22
RubensFarias7-Apr-09 13:22 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.