The EPO Patent Translateservice, powered by Google translation engines, has been one of the greatest boons of all time for prior art research. With the adoption of neural machine translation last year, the nearly instantaneous machine translations have become far more readable and have all but done away with the “gobbledygook” phenomenon that spoiled earlier machine translations of Asian language patents.
The system provides output suitable for rapid preliminary assessment of the potential relevance of foreign prior art, which allows attorneys to review far more documents than in the past, and reduces translation costs. When making decisions based on this output, however, it important to keep in mind the risks inherent in the technology.
If you are interested in translation evaluation and risk management, in general, you might want to read the postthat I wrote on that topic. In this post, I’ll limit myself to describing the types of problems that I most often see in Patenttranslate output. I should point out that it has been nearly 30 years since I worked in machine translation development and I have no specific knowledge of the configuration of the Patenttranslate engine. These are the observations of a humble translator, and there is no question that an expert in modern Machine Translation (MT) systems would have more insight than I do.
I clicked on the Patenttranslate icon for JP 2000-178590 A and found the first four paragraphs to be, by and large, faithful to the original. The fifth paragraph read as follows:
DISCLOSURE OF THE INVENTION The present invention provides a washing method characterized by blending an N-methyl taurine alkali metal salt of N-acylmethyltaurine represented by the following general formula (1) as an essential component An agent composition is provided.
(1) (wherein R represents a saturated or unsaturated hydrocarbon having 7 to 23 carbon atoms, R is a hydrogen atom or a methyl group, Group, and X represents an alkali metal.)
A more exact translation would be:
[Means for Solving the Problems] That is to say, the present invention provides a detergent composition characterized by blending an alkali metal N-methyltaurate salt of N-acylmethyltaurine represented by the following general formula (1) as an essential component.
(In the formula, R denotes a saturated or unsaturated C7-23hydrocarbon group and X denotes an alkali metal.)
For reference, the Japanese original is:
The first problem that we see is what I will call, replacement. The Japanese heading, “Means for Solving the Problems,” has been replaced by “DISCLOSURE OF THE INVENTION,” a US heading, which is typically found in a similar location in US patents. As this is just a heading, no particular damage is done, but in systems such as Patenttranslate, which rely heavily on the text in similar patent applications that have been filed in multiple languages, one must always be aware that the output is not necessarily faithful to the wording used in the source, but rather often reflects the wording most commonly found in similar English documents. This pattern is in no way restricted to headings.
The next problem might be called assumption. The MT output says that the invention, “provides a washing method,” when the Japanese source text actually says, “provides a detergent composition.” In this case, the output makes it look like the system “assumes” that a method is described and adds what it sees as a missing word, perhaps because the body of the claim is written using a verb other than (comprising, having, etc.) which is certainly a linguist pattern commonly associated with method claims. The result is a phrase that is both probable and readable. It is possible that a “washing method characterized by blending” is, in the judgement of the system, more likely than a “detergent composition characterized by blending.” It’s not hard to understand the neural system’s point of view, but in this case the assumption is wrong, and the impact in terms of what the publication discloses is potentially significant.
Assumption is part of general smoothing, in which the program seeks to produce output that matches the conventions of the target language. This smoothing is very useful in making the output more readable, and so is seen as a feature rather than a bug, but it can present problems when the issue at hand is the exact and complete disclosure in the original document.
Next up is omission. In this case, it is obvious, even without looking at the source text, that the formula has been left out. Items in lists, relative clauses, numerical limitations, and even negating phrases can be found to be omitted from Patenttranslate output. The probability of omission tends to increase with increases in sentence length. This problem is particularly disadvantageous for patent practitioners, as it makes it hard to know what is and is not actually disclosed.
We can also easily spot the problem of scrambling, wherethe MT output reads, “… An agent composition is provided.” Scrambling was a common problem in earlier, phrase-based MT engines and has been greatly ameliorated by neural MT. The phrase found here appears to be a scrap that was left over when the system incorrectly parsed the sentence on the assumption that what was being described was a method, rather than a composition. Notice that it starts with a capital letter. In Google Translate, capital letters in the middle of a sentence are often a sign of scrambling.
The last major problem we see in this sentence is false inclusion.This issue was sometimes seen in earlier statistically-based MT engines, and seems to be more common in neural engines. Bits of text that may have come from another, similar document, but are not found in the document at hand, can be included in the output, drastically changing the disclosure. In this case, the output reads, “R represents a saturated or unsaturated hydrocarbon having 7 to 23 carbon atoms, R is a hydrogen atom or a methyl group.” It would not be unreasonable for a reader to conclude that the document offers two options for R. There is, however, no mention of such a second option in the original Japanese text. This same error occurs in three locations in the MT output for this patent. Fortunately, in each instance, the telltale capitalization of the following word gives us a clue to the system’s confusion, but it should be noticed that false inclusion often occurs without anything to give it away.
None of this invalidates the exceptional utility of Patenttranslate as a tool for gaining fast insight into the content of a document written in another language. As the EPO website says, “The machine translation should give you the gist of any patent or patent-related document, and help you to determine whether it is relevant. You might decide on this basis whether you need to invest in a human translation of the document.” In making this evaluation, keep in mind the possibility of replacement, assumption, smoothing, omission, scrambling, and false inclusion.