This ediscovery processing workflow outlines four steps that will help you reduce the volume of discovery data and rationalize document review.
Between collection & to preserve electronically archived information (ESI) e to review & producing en, many people sadly overlook the critical steps involved under processing the data. If you’re tempted to stay ignorant of the eDiscovery data processing workflow, you’ll overlook opportunities to gain a better understanding of the collected files, to minimize the costs involved for review and production, and to streamline project logistics. Below are the steps to keep in mind for a successful eDiscovery processing workflow.
As a trainee lawyer, you understandably want to start reviewing case files and documents as soon as possible so you can start developing your legal strategy. You want to see who was talking to who, what they were saying, and generally get a better high-level understanding of the people, places, and events involved in the matter. This Early Case Assessment (ECA) also plans to help your clients understand the legal risks and costs of the matter.
In today’s litigation world, a huge cost component revolves around the amount of data that needs to be collected, processed, reviewed, and produced. We refer to this as Early Data Assessment (EDA), where the data processing stage gives you the ability to understand the amount of data involved in the matter so you can better inform your client about the time, effort and cost required .
Legal teams may prefer to use specialized ECA software to get the full benefits of the processing stage, especially when dealing with large volumes of data. For example, Nextpoint recently launched Data Mining, an Early Case Assessment software that offers all the tools you need to follow this ediscovery processing workflow and drill down into the data.
Step 1: Normalizing Data (there is no such thing as normal data!)
The first step in an ediscovery processing workflow is to “normalize” all collected data so that review is consistent and straightforward. For example, when you’re looking at a bunch of emails, Word documents, PDF files, images, audio recordings, spreadsheets, and more, you need to make sure you can read, see, and hear all those files in an accessible way. Sounds like it shouldn’t be complicated, but it’s important to accurately identify each file type so that all can be formatted correctly for your viewing pleasure.
For emails, we also need to make sure that all attachments are linked to their correct messages (what we call a parent/child relationship). More importantly, we need to “normalize” the time zones associated with all messages. If we have processed all emails according to the time zone where you practice law, there is a chance that you will find sent emails after they have been received, which is obviously confusing (even more confusing when we keep account for daylight saving time or international time zones) . Because of this, we typically process everything in Universal Coordinated Time (UTC), and you’ll need to familiarize yourself with that format.
We also need to make sure that all zipped/compressed files are unzipped and properly listed. And we need to extract any embedded objects that may have been inserted into Microsoft Word documents or Excel spreadsheets. Another important step is to assign each file a “DocumentID” or control number so that you can provide analytics and audit trails in the platform. Note that this is NOT a Bates number, as they are typically assigned when you generate a production set.
Step 2: Extracting Metadata and Selecting Files (Demystifying Content)
While lawyers are understandably focused on reading the content of emails and documents, it’s imperative that all metadata from those files is extracted correctly so they can all be entered into a database. The processing phase pulls all the information from the From, To, CC, BCC fields, along with the sent and received dates/times, the subject line and many other properties such as whether the message was opened or replied to and which thread of conversation belongs to. Having all the metadata pulled into a spreadsheet-like database view means you can easily sort and filter the data to focus only on the communications you need to investigate.
But even before looking at the metadata, there are several critical filters that the data must go through so that you don’t waste time looking at files that contain no content at all. There are hundreds of thousands of computer “system” files that are sometimes wiped out in the collection process, and there’s typically no reason you need them for litigation purposes.
The “De-NIST” data processing stage will delete those files. The “NIST” here refers to the National Institute of Standards and Technology, which maintains the National Software Reference Library (NSRL) that catalogs the digital signatures of files in known software applications. Any software executable or other system file that would appear garbled in a review database is De-NIST according to that official list.
Your eDiscovery processing workflow should also include file deduplication, and this is where you need to provide some input to your vendor. Let’s say you’ve collected emails from 10 different individuals/keepers and you realize that any of those 10 people may have received the same email – do you want to read the same email message 10 times? Or would you rather have the duplicates removed with an indicator for each individual who received that message? These are important decisions you need to discuss with your vendor, who can help you understand your options so you get what works best for your auditing needs.
Finally, this is the step where all unsearchable files are OCRed so that they are readable and searchable in the platform. There may be some scanned paper documents or images that contain text that humans can read, while the computer has to try to recognize that text for search. A computer can try to OCR your handwriting, but just know that it won’t be perfect, which means your searches may be incomplete.
Step 3: Indexing and Searching (You Can’t Search What You Don’t Index)
When lawyers think of “searching” for documents, they imagine typing a word and having the computer check that word in every single file. You can’t be blamed for viewing activity like that, but the reality is that a computer would spend so much time searching through each document that it would be a time-wasting mess.
Instead, when you type a word and hit the search button, the computer actually scans an “index” or dictionary of words that has been generated based on all the words found in the files during the data processing phase . That way, it only searches for words found in the files you’ve collected and only has to inquire with that index rather than painstakingly trawling through each document each time. This is much more efficient and gives you the results you’re looking for in fractions of a second. The index knows each file a word occurs in and therefore can highlight search terms in files during review.
But there is a flip side: to be more efficient and avoid human impatience, many search indexes will ignore more common words like and, to, is, etc. These “noise” words or “stop” words occur at an astronomically higher rate than all other words. Since we aren’t usually looking for those conjunctions, determiners and prepositions, the indexes will ignore them completely.
This is standard procedure, but you should be aware of these limitations if you run into a situation where you might need to look up those specific words. Craig Ball has an excellent example that in most ediscovery document review platforms, you won’t be able to find the phrase “to be or not to be” even if you put it in quotes, because those are noise words that don’t would be indexed in the data processing stage.
At this point, you should consider proactively providing your provider (like Nextpoint) with a list of keywords or search terms that interest you so that you can receive a “results report” after processing. This report can be useful for showing you how many occurrences of certain words were found in the data, and allows you to filter your chosen keywords before diving straight into a manual review.
Step 4: Data Mining and Analytics (see what the data tells you)
Finally, an eDiscovery processing workflow should enable you to leverage deep analysis of your data. Computational tools, such as Nextpoint’s Data Mining, can be used to highlight interesting or significant patterns in your data to provide you with better angles to approach the review. There are several advanced tools that use artificial intelligence (AI), machine learning (ML), natural language processing (NLP), and a host of other mind-blowing technologies. Ask your vendor what basic analytical tools they have that can help you.
For example, right after processing the data, Nextpoint gives you a series of statistics on how many files and documents you have in front of you, how many emails, how many attachments, and how many email threads or conversations in total. You can also view a visual and interactive timeline of files and emails so you can focus on a specific date range. There are data widgets that break down the different file types found in your data, as well as email authors and domains.
In addition to these capabilities, Nextpoint offers Data Mining, the revolutionary new technology for early case assessment and comprehensive data analysis. The app generates snapshots of key themes in your data and offers advanced research capabilities that can be used to create custom visual reports. As volumes of electronic data explode in the legal field, advanced tools like this are becoming critical parts of managing potential evidence in litigation.
All of these tools give you a much better place to start your review than blindly diving into a collection of files and clicking first, then next, then next. These analyzes are available in the processing stage and it would be incredibly beneficial for you to ask your platform provider what tools they can provide for analyzing your data.
Simplify data loading before document review
As you can see, the “processing” phase of ediscovery has a lot more going on under the hood, and while some of these tasks are standard and routine, it’s also important that you feel comfortable with all the options and processes in so you can make the best decisions for your customers and their data. With these strategies, there’s no need to be overwhelmed with tracking data—you can simplify and understand it before diving into document review.