In this guide I am going to show you how to rip (extract) subtitles and their timings from video files (VOB) or DVDs and save them as a plain text file. This text is the .srt file. This way you can have selectable subtitle files that you can use alongside with a ripped DiVX/XviD movie (which you legally own) and format them by your media player to your needs. Moreover you can merge them to containers like mkv so can have audio, video and subtitles in a single file.

Till now this procedure in Linux you had to use a few different CLI (Command Line Interface e.g. Linux Console) programs and type many commands. However here I will use OGMRip which will allow us to easily extract the subtitles. It's main purpose is to rip and encode DVD into avi, ogm, mp4 or matroska files. However with a little "trick" we will just rip the subtitles without having to rip the movie as well.

OGMRip can be used with 3 different OCR (Optical Character Recognition) readers: gocr, ocrad or tesseract. This depends on the way you will configure and compile it or on the way that it has been compiled and packaged for the Linux distribution you use. I have tested OGMRip in Archlinux and Ubuntu. In Archlinux it is packaged with tesseract. Tesseract currently supports English, French, Italian, German, Spanish and Dutch languages. However if you are really patient you can train tesseract to support your language too. In Ubuntu run Add/Remove Applications and search for it in "All Available Applications". In Ubuntu OGMRip is packaged with ocrad support. Another advantage of OGMRip is that you don't need to have the DVD ripped in VOB files into your hard disk drive. Subtitles can be extracted directly through the DVD disk. So enough with the talking. Let's move to the tutorial part.

Once you have installed OGMRip fire it up. This is the program's main window.

1.png

A few words for the configuration first. Click Edit -> Preferences. In the General tab you can choose your Prefered Language to be automatically selected when you load a DVD so that it you don't have to do this manually each time.


2.png

Next the Advanced tab. Here select the Temporary Path e.g. the directory in which the ripped files will be temporary stored. I have /tmp selected. We are going to need this later.


3.png

Now press the Close button and click Edit -> Profiles. Select the first available profile, DivX for Standalone Player and click the Edit button.

4.png

Go to the Subtitles tab and make sure that SRT text is the selected codec. Forced subtitles are subtitles included in a movie that only a part of it was spoken in another language. For example if you have an English movie and at same place someone speaks in Japanesse the english subtitles that appear on the screen are the forced ones. So if you want to extract only those subtitles check this option. Usually you won't need this option. In Text Options select UTF-8 character set, or your language one. In the End of line option select Carriage return only (Unix) if you plan to use the subtitle file in Linux or Carriage return + Line feed (DOS) if you plan to use it in Windows. This way when you open the .srt file with a text editor the new lines will be properly displayed. Now press the Close button.


5.png

Now let's load our DVD. Click File and select either Load to load a DVD directly from your DVD drive or Open to open a local directory containing the ripped VIDEO_TS and AUDIO_TS directories. Select the Chapter you want to extract the subtitles from and make sure that in Subtitles there is the language you want. Almost every time the main movie is the chapter with the biggest duration. Once you are ready press the Extract button.


6.png

In the Options window just select the Profile we had set before, "DivX for Standalone Player" and click the press the extract button again.

7.png

Now a progress bar will appear with the title "Extracting subtitle stream 1". Here pay a little attention. When this operation completes OGMRip will continue extracting the audio stream. At this point press the Suspend button for the whole operation to pause.

8.png

9.png

Now open a terminal and type:

cp /tmp/subp.* ~/subtitle.srt

With this command we have coppied the .srt file from the temporary location /tmp into our user's home directory and we have named it subtitle.srt. Of course you can open Nautilus, Dolphin, Konqueror whatever go to /tmp find the subp.* copy it to your home directory and rename it. However the command is much faster, don't you think? When you copy the file press the Cancel button and all temp files will be deleted.

And voila, you .srt file is ready in only a few seconds! Pretty simple! Now you can open it with a text editor such as gedit, kwrite, kate or OpenOffice Writer and correct the mistakes of the optical recongition. I hope that while the OCR programs develope the recognition will become better. For example I noticed that there where some problems with the letters a and o, or h and n. but this can be easily edited and fixed in the .srt text file!

 

Comments (16)

Subscribe to this comment's feed
Nick H.
0
Not to be rude but did you actually try this using English? I recently experimented with ogmrip (ver. 10 & 11 on an older computer and a recent snv on a laptop) in many cases the subtitles could best be described as gibberish. This doesn't seem to be an isolated incident as I found other complaints concerning ogmrip subtitles on the web and in the sourceforge forums of the project itself. As a free project (especially in an alpha stage), ogmrip seems to be fine as multimedia converter but the subtitling just seems extremely poor, especially compared to other projects like Handbrake or even AcidRip). Finally, it just appears reckless to recommend this project for its subtitling ability.
Did you try this in English? , February 14, 2009
OCR
0
The issue of OCR is the one of tesseract used by OGMRIP - tesseract is not as good as the commercial OCR sw, but seems to be a better of those FOSS ones.
The issue with tesseract is that one needs to "teach" it first, which is a tedious process (see google for wiki on training tesseract). A person with scripting abilities can partially automate this.
I was able to train it for Czech characters (not even officially supported by tesseract) for DVD vobsub ripping and the major problem was that occasionally few words have not been separated by spaces. Otherwise, good work.
MK , April 21, 2009
...
axel
I haven't tried training tesseract MK, but it's good to know that you have almost succesfully achieved it. Since tesseract is still being developed I believe it's a matter of time till it becomes a good OCR utility. smilies/smiley.gif
axel , April 21, 2009
if you're having problems with english subtitles
0
@All Ubuntu Users
If you're trying to rip subtitles in Ubuntu with ogmrip it's important that you install the english tesseract package. For some reason the default language package for tesseract is german so anytime you install tesseract-ocr it will also install tesseract-ocr-deu. In order to rip English subtitles with ogmrip install the following packages.

sudo apt-get install ogmrip tesseract-ocr-eng
buntu , June 24, 2009
...
axel
Thanks for mentioning this buntu. smilies/smiley.gif
axel , June 25, 2009
...
0
It just got easier! The developer of ogmrip has kindly added support for encoding without video streams. Instead of having to pause the encoding at a certain point and copying the srt file from your temp directory you can simply select the subtitle stream and disable the video and audio streams.

Steps to rip to srt file:
1. Download the latest ogmrip trunk.
http://ogmrip.svn.sourceforge....z?view=tar
2. Compile and install.
3. Create a new profile.
4. Select mkv for the container.
5. Select "No video" for the video codec.
6. Select Vorbis for the audio codec (doesn't really matter which one).
7. Load the dvd in ogmrip and select the title to extract.
8. Select "no audio" for the audio stream.
9. Click extract to create a mkv file with a single subtitle stream.
10. Install mkvtoolnix
sudo apt-get install mkvtoolnix
11. Go to the folder containing the mkv file and type the following:
mkvextract tracks "NameOfFile.mkv" 1:"NameOfSubtitles.srt"

That's it! Now if I could only get the dirac plugin to work it would be perfect. For some reason every plugin that I've tried to compile and install are always disabled by ogmrip on startup. smilies/sad.gif
buntu , July 27, 2009
...
axel
Thank you for the info buntu! That's great!

You can also try Avidemux for ripping the subtitles from a DVD. Here is a guide I wrote:

How to extract DVD subtitles to .srt using Avidemux
axel , July 29, 2009
...
0
Thanks for posting the link. I'll give avidemux a try.

Do you know if it's possible to use ocropus with either ogmrip or avidemux to perform the ocr processing?

http://code.google.com/p/ocropus/
buntu , July 29, 2009
...
axel
Sorry buntu, I have no idea. If you try Avidemux you will see that at first you train the program and after a while it makes all the ocr automatically. It's good. smilies/wink.gif
axel , July 30, 2009
...
kjetilbmoe
@buntu: I experience that the available plugins vary depending on which ocr-software that is available. If you have only tesseract, you might try to install gocr or ocrad. At least this made a difference to my SVN-compile.

However, now it doesn't seem that the "no video" option is any longer available :/
kjetilbmoe , January 05, 2010
...
axel
kjetilbmoe, I would suggest you to read this: How to extract DVD subtitles to .srt using Avidemux. It's a much better guide. smilies/smiley.gif
axel , January 10, 2010
dvb subtitles
0
Hi
Does the program work with DVB subtitles taken from television recordings?
Thanks
david , May 07, 2010
...
axel
I'm sorry david but I don't know. smilies/sad.gif
axel , May 08, 2010
.srt to DVD
0
All guides are about how to extract subtitles from DVD. But how to add custom language to DVD ?
djevlen , October 28, 2010
...
axel
devlen you want to create a DVD with your own video and add subtitles to it or you want to add subtitles to a DVD you already have?
axel , October 30, 2010
Gibberish Subtitles
0
I tried the mkvtoolnix method to extract Simplified Chinese subtitles, and the timing seems to be correct, but the characters came out as gibberish in VLC Player. Please help!
Rykel , September 22, 2011

Write comment

smaller | bigger
security image
Write the displayed characters

busy

Linux DVD Video

Login Form

Follow me on...

  • Facebook
  • Twitter
  • Google+: u/0/b/113039112812192417058/
  • Digg
  • Reddit: myguides
  • RSS Feed
  • e-mail

Member Login