In this guide I am going to show you how to rip (extract) subtitles and their timings from video files (VOB) or DVDs and save them as a plain text file. This text is the .srt file. This way you can have selectable subtitle files that you can use alongside with a ripped DiVX/XviD movie (which you legally own) and format them by your media player to your needs. Moreover you can merge them to containers like mkv so can have audio, video and subtitles in a single file.
Till now this procedure in Linux you had to use a few different CLI (Command Line Interface e.g. Linux Console) programs and type many commands. However here I will use OGMRip which will allow us to easily extract the subtitles. It's main purpose is to rip and encode DVD into avi, ogm, mp4 or matroska files. However with a little "trick" we will just rip the subtitles without having to rip the movie as well.
OGMRip can be used with 3 different OCR (Optical Character Recognition) readers: gocr, ocrad or tesseract. This depends on the way you will configure and compile it or on the way that it has been compiled and packaged for the Linux distribution you use. I have tested OGMRip in Archlinux and Ubuntu. In Archlinux it is packaged with tesseract. Tesseract currently supports English, French, Italian, German, Spanish and Dutch languages. However if you are really patient you can train tesseract to support your language too. In Ubuntu run Add/Remove Applications and search for it in "All Available Applications". In Ubuntu OGMRip is packaged with ocrad support. Another advantage of OGMRip is that you don't need to have the DVD ripped in VOB files into your hard disk drive. Subtitles can be extracted directly through the DVD disk. So enough with the talking. Let's move to the tutorial part.
Once you have installed OGMRip fire it up. This is the program's main window.



Now press the Close button and click Edit -> Profiles. Select the first available profile, DivX for Standalone Player and click the Edit button.



In the Options window just select the Profile we had set before, "DivX for Standalone Player" and click the press the extract button again.



Now open a terminal and type:
cp /tmp/subp.* ~/subtitle.srt
With this command we have coppied the .srt file from the temporary location /tmp into our user's home directory and we have named it subtitle.srt. Of course you can open Nautilus, Dolphin, Konqueror whatever go to /tmp find the subp.* copy it to your home directory and rename it. However the command is much faster, don't you think? When you copy the file press the Cancel button and all temp files will be deleted.
And voila, you .srt file is ready in only a few seconds! Pretty simple! Now you can open it with a text editor such as gedit, kwrite, kate or OpenOffice Writer and correct the mistakes of the optical recongition. I hope that while the OCR programs develope the recognition will become better. For example I noticed that there where some problems with the letters a and o, or h and n. but this can be easily edited and fixed in the .srt text file!
- How to fix Yasp-Scripted "Waiting for: RootUsed" problem
- How to search for subtitles with VLC by using VLSub
- How to add switchable subtitles to AVI/DIVX files without reencoding (not hard coded) using AviAddXSubs
- How to add subtitles to .mkv files using mkvmerge GUI
- How to hardcode subtitles to AVI files
Comments (16)
Subscribe to this comment's feedNick H.
OCR
The issue with tesseract is that one needs to "teach" it first, which is a tedious process (see google for wiki on training tesseract). A person with scripting abilities can partially automate this.
I was able to train it for Czech characters (not even officially supported by tesseract) for DVD vobsub ripping and the major problem was that occasionally few words have not been separated by spaces. Otherwise, good work.
...
if you're having problems with english subtitles
If you're trying to rip subtitles in Ubuntu with ogmrip it's important that you install the english tesseract package. For some reason the default language package for tesseract is german so anytime you install tesseract-ocr it will also install tesseract-ocr-deu. In order to rip English subtitles with ogmrip install the following packages.
sudo apt-get install ogmrip tesseract-ocr-eng
...
Steps to rip to srt file:
1. Download the latest ogmrip trunk.
http://ogmrip.svn.sourceforge....z?view=tar
2. Compile and install.
3. Create a new profile.
4. Select mkv for the container.
5. Select "No video" for the video codec.
6. Select Vorbis for the audio codec (doesn't really matter which one).
7. Load the dvd in ogmrip and select the title to extract.
8. Select "no audio" for the audio stream.
9. Click extract to create a mkv file with a single subtitle stream.
10. Install mkvtoolnix
sudo apt-get install mkvtoolnix
11. Go to the folder containing the mkv file and type the following:
mkvextract tracks "NameOfFile.mkv" 1:"NameOfSubtitles.srt"
That's it! Now if I could only get the dirac plugin to work it would be perfect. For some reason every plugin that I've tried to compile and install are always disabled by ogmrip on startup.
...
You can also try Avidemux for ripping the subtitles from a DVD. Here is a guide I wrote:
How to extract DVD subtitles to .srt using Avidemux
...
Do you know if it's possible to use ocropus with either ogmrip or avidemux to perform the ocr processing?
http://code.google.com/p/ocropus/
...
...
However, now it doesn't seem that the "no video" option is any longer available :/
...
.srt to DVD
Write comment
| < Prev | Next > |
|---|