Foreign Language Pitfalls in Electronic Discovery

  • Blog Post
  • Posted on 22 July 2014

Foreign language documents can add a layer of complexity and expense to the already complex and expensive process of electronic discovery. However with good planning and effective use of technology, most of the potential headaches regarding foreign language documents can be avoided.

Written by Angus Withey, Solutions Consultant at Law In Order

Foreign language documents can add a layer of complexity and expense to the already complex and expensive process of electronic discovery. However with good planning and effective use of technology, most of the potential headaches regarding foreign language documents can be avoided.

So why not just deal with these documents when and if we find them? Maybe we’ll get lucky and there won’t be many foreign language documents?

In my experience, failing to plan around foreign language documents is one of the most costly and disruptive mistakes that can be made in an electronic discovery project. Here are a few good reasons why lawyers and others involved in legal data projects should devote some time thinking about foreign languages early in the discovery process.

Firstly, not all discovery technology and processes handle foreign languages well. A simple example of this is text encoding in electronic files. The character encoding system maps a particular character to a number that is stored in the file, and then maps back from that number to the character when the content of that file is displayed. Where things get interesting is that there are many character encoding systems in use. Different systems can contain different characters, or contain the same character but mapped to different numbers.

For example, the character É (capital e with an acute diacritic) maps to number 144 in the Extended ASCII system, but maps to 201 in the Unicode system, and isn’t present at all in the Standard ASCII system. If you were to open a Unicode encoded text file containing this character using ASCII encoding you would likely get a 👉 in its place. Processing documents using the wrong encoding system can lead to garbled and unintelligible text and may lead to keyword searches failing to hit on words present in documents.

Incompatible and inconsistent encoding systems can cause problems when dealing with any electronic data. However these problems are far more prevalent when you are dealing with data with a mix of languages, as the use of particular encoding systems tends to be reasonably standard within a particular linguistic or geographic group. It is important that documents are collected, processed and presented in a way that correctly preserves the character data and maps characters to an intelligible format wherever possible.

Even if the workflow used to collect and process the documents avoids any pitfalls associated with text encoding, it is still easy to make mistakes if you are not thinking and planning around foreign languages throughout a project.

An example of this is keyword searching. If you search for the term “non disclosure agreement” in a case regarding your Australian client’s joint project with a German company, you may well miss documents in German discussing the “Geheimhaltungsvereinbarung”. If you are aware that there are a significant number of German documents in your data set, it may be worth translating your search terms and running them alongside the English terms. Failure to do so could result in responsive documents being excluded from review and disclosure before anyone has even looked at them.

Another risk with regard to disclosure lies in whether or not you will be required to translate documents. A good rule of thumb when estimating disclosure costs is that that reviewing a document is about 10 times as expensive as the technical work performed on it, and translating a document is at least 10 times as expensive as reviewing it. If the protocol governing the disclosure of documents requires the producing party to translate any foreign documents this can dwarf all other costs associated with the disclosure.

One thing is common to all of the risks I’ve outlined above - if you are aware of your exposure to these risks early enough you have options available to cost-effectively mitigate them. If you only discover that you have a significant number of foreign language documents once you are halfway through your review, you have far fewer alternatives available and may blow previously agreed upon deadlines or budgets dealing with them.

So how to you find out if you have foreign language documents in your data set? And how many? Containing which languages?

The first step is to ask! If possible and appropriate, interview the actual custodians of the data and find out what languages they use and have encountered in the data you are collecting. This will give you a broad idea of which foreign languages you may have amongst your data. Interview your client’s IT staff with the assistance of an electronic discovery professional - this can help you issue spot any potential technical complications involving to foreign language data.

For more granular information, the documents themselves can be analysed. Luckily many contemporary electronic discovery processing tools contain language identification features. However these features are often not enabled in standard processing workflows as they slow down the process - so if you suspect that a data set contains foreign languages it is important to communicate this to whoever is processing the data. The results of language identification analysis can be hard to interpret. It is not a precise technology - so expect both false positives and false negatives amongst the results. But they can give you a useful insight into the primary languages present in a document set and the proportion of documents they account for.

Whether you choose to use language identification technology to analyse your data or just ask the right questions - get a handle on whether you have foreign language documents to review early in the discovery process. This gives you the opportunity to negotiate protocols advantageously, make smart decisions about workflows and more accurately budget for the cost associated with the review. The best practices around handling foreign language documents, the tools available to analyse them and the courts’ expectations about reasonable efforts are constantly changing. So seek help from trusted advisors when you are faced with an international case to make sure you are making the most of the options available.

Share this post