Evaluating E-Discovery

Image may be NSFW.
Clik here to view.

Random vs active selection of training examples in e-discovery

July 16, 2014, 11:02 pm

The problem with agreeing to teach is that you have less time for blogging, and the problem with a hiatus in blogging is that the topic you were in the middle of discussing gets overtaken by questions...

View Article

Research topics in e-discovery

August 7, 2014, 6:08 pm

Dr. Dave Lewis is visiting us in Melbourne on a short sabbatical, and yesterday he gave an interesting talk at RMIT University on research topics in e-discovery. We also had Dr. Paul Hunter, Principal...

View Article

Finite population protocols and selection training methods

September 14, 2014, 6:17 pm

In a previous post, I compared three methods of selecting training examples for predictive coding—random, uncertainty and relevance. The methods were compared on their efficiency in improving the...

View Article

Image may be NSFW.
Clik here to view.

Total review cost of training selection methods

September 26, 2014, 2:21 pm

My previous post described in some detail the conditions of finite population annotation that apply to e-discovery. To summarize, what we care about (or at least should care about) is not maximizing...

View Article

Total assessment cost with different cost models

October 15, 2014, 5:24 pm

In my previous post, I found that relevance and uncertainty selection needed similar numbers of document relevance assessments to achieve a given level of recall. I summarized this by saying the two...

View Article

Why training and review (partly) break control sets

October 19, 2014, 9:22 pm

A technology-assisted review (TAR) process frequently begins with the creation of a control set---a set of documents randomly sampled from the collection, and coded by a human expert for relevance. The...

View Article

Confidence intervals on recall and eRecall

January 4, 2015, 2:07 am

There is an ongoing discussion about methods of estimating the recall of a production, as well as estimating a confidence interval on that recall. One approach is to use the control set sample, drawn...

View Article

Off to FTI: see you on the other side

January 18, 2015, 1:49 am

Tomorrow I'm starting a new, full-time position as data scientist at FTI's lab here in Melbourne. I'm excited to have the opportunity to contribute to the e-discovery community from another angle, as a...

View Article

Back from the other side

March 14, 2018, 3:51 am

Well, after a couple of years at FTI, and some, ahem, self-funded gardening leave, I'm back to consulting---and to blogging! More from me soon.

View Article

Image may be NSFW.
Clik here to view.

Sampling with zero intent

March 14, 2018, 4:05 pm

A zero intent sample is a sample which will only satisfy our validation goal if no positive examples are found in it. If we have a population (in e-discovery, typically a document set) where one in R...

View Article

More Pages to Explore .....

Latest Images