Intro to Predictive Coding: Overview & Interpretation of Terminology June 2014 | Page 7

There are several ways that systems can get their training examples. These training documents are a sample of all of the documents in the collection. The examples can be selected randomly and categorized, can be provided by expert reviewers, chosen by the computer, or determined by some combination of these. Predictive coding is a kind of Computer-Assisted Review (CAR) or Technology-Assisted Review (TAR), but it is not the only kind of CAR/TAR. Other types include keyword searching, concept searching, clustering, email threading, more-like-this search, and near duplicates. These other kinds of CAR can be very useful and can reduce the time needed to categorize documents, but they are not predictive coding – they do not predict on the basis of examples which documents are likely to be responsive versus nonresponsive. In predictive coding, the computer uses the decisions made by the expert reviewer(s) to predict how other documents should be categorized. In clustering or the various kinds of searching, the documents are organized into groups and, after the computer has done its work, the reviewers then decide whether each of these groups should be considered responsive or non-responsive. Predictive coding involves what is called in the jargon of machine learning “supervised learning,” while the other approach, when it involves machine learning, is called “unsupervised learning.” In predictive coding, the authoritative expert reviewer provides feedback or supervision to the predictive coding system.