Before you read this post, take a look at Paul Filkin’s excellent post on this topic:
Trados’ default “Catch all” regex worked fine for us until now. But when an excel cells contains more complex HTML code, Trados will unfortunately start to look like this:
This is no fun to work with. So I had to come up with something a bit more detailed and specify all HTML tags manually. Paul used the tag names <p> and </p> in his example, but this falls short when the tags contain class names or styles like so:
<p class="ms-rtePosition-1" style="margin:0px 10px 0px 0px;" />
In this case, you need a regular expression that’s a bit more complex:
Start tag: <p(>|\s+[^>]*>) End tag: </p>
This will catch things as shown in the example above. The regular expression is an OR statement, so it’s either a tag like <p>, or it’s <p> with an empty space after and text. So things like <pre> will not fall into this regex.
I’ve added all the HTML tags that I need for the project and uploaded the settings file here.
That should save you from going through the same process and setting up everything manually – simply download and import the settings by going to Project Settings -> File Settings in SDL Trados Studio.
If you’d like to build the settings yourself , or you’re simply interested in the HTML tags I’ve already built, here is a screenshot: