Big data insights on tap, just like water, gas, or electric.

Every day of the year buyers and sellers enter into millions of contracts covering the purchase and sale of billions of dollars worth of goods and services.  Each one of these transactions requires each party to analyze the governing agreement to assess whether the agreement fits within its risk profile.

This analysis commonly is performed in one of two ways: (1) the party accepts whatever risk may be in the contract with no real analysis, or (2) a legal expert conducts an unstructured review and issues his or her opinion regarding the agreement.

Unfortunately, both options come up short. The former involves flying blind while the latter takes significant amounts of time, is costly, and often produces highly inconsistent results.  As a learned friend recently quipped, “if you give the same contract to eight different lawyers you will receive ten different opinions.”

With trillions of dollars in commerce being impacted and a data set of millions of contracts, many have asked why contract analysis cannot be moved from a bespoke art form to a scalable data science exercise.  Since at least the turn of the century, every segment of the legal ecosystem—law firms, law departments, ALSPs, and legal tech companies—have made efforts to drive this transformation.  These efforts have resulted in significant progress but still fall short, largely due to each actor’s lack of scale.

This, however, appears poised to change with the recent emergence of the contracts rating company that approaches contract analysis from the perspective of a utility, using specialized processes and technologies to analyze large volumes of agreements in a standardized fashion, making the results broadly available to any interested party at a low cost. [Editor’s note: In Post 200, Rafael D.B. Figueiredo makes powerful comparisons between the legal and energy sector.  Here, Bill Mooz argues that the original utility model is poised to solve a complex technical problem within law.]

Part I of this post looks at the prerequisites for running a successful data analysis exercise.  Part II examines the contributions made by various members of the legal ecosystem.  Part III concludes by reviewing the initial utility-based offerings and exploring how the utility model may evolve going forward.

I. Elements of a successful data analytics exercise

Data analytics is a complex science with its own body of literature.  This post does not attempt to synthesize this entire body of knowledge.  Rather, it simply calls out several key foundational points for conducting a successful data analytics exercise in the space of contract analysis.

The starting point for any data analysis is the data set itself.  Simply put, if the data set is flawed, the resulting analysis will ipso facto be flawed as well.  Data set quality turns on multiple different vectors.

  • Integrity:  The data points must be collected consistently and with integrity.
  • Objectivity:  To the greatest extent possible, data points need to reflect facts rather than unreliable and unreplicable conclusions or opinions.
  • Granularity:  The more granular the data set, the better it lends itself to analysis.  This is true for multiple reasons.  First, granular data points tend to be more objective than higher-level data points.  For example, asking the witness to an accident which car was at fault provides far less objective data than asking the same witness which car had the green light, did either car slow down while entering the intersection, was either car weaving, etc.  Second, granular data permits the examiner (human or machine) to spot patterns in areas that would not be apparent with higher-level data.  Finally, granular data makes it easier to train AI tools in a reliable fashion.
  • Size:  Large data sets generally are of higher quality than smaller data sets, primarily because omitted variables can easily result in biased findings.  Indeed, when a data set is too small it simply won’t permit many types of analysis.

In order to create a quality data set and analyze it reliably, one must have—and follow—a defined and documented process that covers the entire lifecycle of the exercise, from data collection through analysis.  Ad hoc processes almost invariably result in poor quality data sets and unreliable (or at least inconsistent) analyses.

Finally, successful contract analysis exercises require a solid foundation of technology, such as contract lifecycle management (CLM) systems and data analysis tools to collect and analyze the data at the scale required to deliver meaningful results.

As shown below, most of the efforts to date have foundered because the party conducting the exercise lacked the scale necessary to address one or more of these foundational elements.

II. Movements toward a scientific approach

The progress toward a scientific approach has varied by market segment.

A. Law Firms

Lawyers commonly approach contract analysis as an art, treating every contract as unique and applying their own judgment in an ad hoc, bespoke fashion.  At the same time, law firms (especially larger ones) have made more than token efforts to inject science into their approach to contract analysis over the years.

Large firms have the benefit of handling at least a regular and relatively large volume of contracts in a wide variety of areas, yielding a larger data set to work with than that enjoyed by individual legal professionals.  Law firm training programs, form banks, research archives, etc. all create at least a rudimentary foundation for standardization and consistency.

These attributes caused the law firm to be the state of the art for contract analysis in the last century.  But law firms have largely struggled to make further advances toward true science in contract analysis even though many have made significant investments in technology, process experts, etc.

There are several reasons for this.  The most obvious is change management, which affects law firms more than most organizations, as each partner is an individual profit center that comes with prerogatives of control.  Perhaps even more significant is the fact that law firms get asked to handle a wide variety of contracts for clients in lots of industries, rather than handling a large volume of contracts generated using a single set of templates.  This “wide and shallow” data set, along with limitations imposed by the client relationships, creates a practical constraint on the type of contract data analysis that law firms can conduct.

Compounding these limitations is the fact that law firms have poor incentive structures for central investments in their practices–why build a machine to compete with your very pricey human assets who are paid by the hour?  Likewise, individual clients often have budgets that will support only the most high-level analysis. Without scale to reduce the overall price point, the benefits remain largely hidden.

B. Law departments and ALSPs

Around the turn of the century, corporate law departments realized that they needed a more scalable and economic approach to contracts, that would allow the department to use lower-cost legal professionals (as opposed to attorneys) and generate consistent risk profiles across large volumes of agreements.  I know because I was there and experienced the business pressure, as did my colleagues at other large legal departments.

With the support of ALSPs (many of which were offshore), we began to develop documented processes for contract analysis.  These processes were supported by playbooks that broke standard contracts down clause-by-clause and gave the legal professional analyzing the contract a standardized approach to handling at least the most common issues.

This move toward a standardized process was made possible, in large part, by the data sets available to the law departments.  Unlike law firms that are “wide and shallow” on contracts, handling a small number of contracts for each client in lots of industries, large company law departments often handle tens of thousands of contracts annually that are based on a small set of standard templates.  This allows them to be “narrow and deep” (at least in certain areas), making it far easier to standardize.

The ALSPs were quick to recognize the value of this approach and see the opportunities that it provided.  Initially, the ALSPs provided arbitraged labor in the form of legal professionals – many of whom were offshore – who could conduct the standardized analysis at a compelling cost.  The ALSPs, however, soon realized that they could improve their value proposition and profits significantly if they focused on developing and continuously improving the end-to-end processes used to conduct the analysis.

Several of the stronger ALSPs now provide significant value in all phases of these processes, from data collection at the contract formation stage to data analysis of the install base of agreements.

Technology also played a significant role in this evolution.  Contract lifecycle management (CLM) systems that automated many aspects of the contracting lifecycle from generation, through analysis and negotiation, and ending in contract storage, became relatively mainstream.  Without these tools, the type of data collection and analysis mentioned above would not be possible.

These efforts represented a significant advance in the journey from art to science, but they were far from perfect for at least several reasons.

  • Size of department: The advances described above generally occurred only in the largest law departments.  The legal function at smaller companies simply does not have the scale or resources necessary to develop, implement, and maintain processes, playbooks, technology, and the other elements of a scientific contract-analysis infrastructure.
  • Limited opportunities for scale:  Even within the largest law departments, these efforts tend to be limited to particular areas that have scale.  For example, a company may do thousands of sales contracts employing the same template annually, but may only engage in a single or a handful of transactions involving the purchase of a particular good or service.  The infrastructure developed for handling the volume sales agreements may be somewhat useful for handling the one-off purchase agreements, but the company will not have the scale necessary on the purchase side of the house to use data science in a meaningful way.
  • Even when feasible, ROI may still be lacking:   Even in the areas where large law departments have a significant scale (e.g., sales contracts for its offerings), that scale still commonly falls short of what is required to (a) collect and analyze data at a truly granular level as required to build playbooks that are based on a foundation of statistically significant data as opposed to a lawyer’s individual judgment, (b) update data sets and data analysis on a regular basis, and (c) obtain data sets of sufficient breadth to be truly relevant.

Regarding ROI, the construction and maintenance of data sets relate to cost.  Analyzing a contract scientifically requires a proactive approach that breaks the contract down into hundreds of “microdata points” (see below), assessing each of them in a highly objective fashion.  Most companies lack the resources to do this and, instead, build their playbooks in a reactive fashion at the clause level.  When counterparties raise sufficient objections to a provision a lawyer for the company will decide what the response to that objection will be and update the company’s templates and/or playbook accordingly.  This updating tends to occur on an ad hoc basis, rather than as part of a structured process that includes the regular analysis and regular updates.  Again, the reason for this typically tends to be the cost associated with conducting regular, proactive reviews.

The third ROI factor–insufficient breadth of data–is simply a function of the data that an individual company has access to.  For the most part, companies can only see contracts that they are party to.  This data set does not include competitors’ contracts, making it difficult for the company to use the available data set to provide reliable insights into what is market for a particular term, in a particular industry.  Companies can, at least in theory, scour the web to monitor their competitors’ publicly available terms (see below), but these terms do not cover negotiated deals and the monitoring process can be complex and costly as companies frequently change their terms without notice and do not always post them in a transparent fashion.

In short, the big push by the large law departments (many of which are members of CLOC) starting at the turn of the century has advanced the science of contracting significantly, but still falls short of a fully scientific approach.

C. Contract tool companies

Over the past two decades, the contract tool companies have delivered in a big way.  Companies today have the choice of at least ten different CLM systems that enable the automation of every stage of the contracting lifecycle.  See, e.g.,, “Top 10 Contract Lifecycle Management Report Sofware Report” (2021) (comparison of leading CLM vendors). Companies then have the further choice of multiple AI-driven analysis tools that can sit on top of the CLM systems, allowing companies to analyze contract data generated at any stage of the lifecycle.

While these tools can be expensive to acquire, implement, and operate, they generally provide much, if not all, of the functionality required for a scientific approach to contract analysis.  The CLM systems generally have the functionality required to capture and store the data required for analysis.  The AI systems (which sometimes are a feature of a CLM system) have the functionality necessary to analyze large bodies of data in a consistent fashion and produce meaningful results.

The advent of these tools has enabled significant progress toward a scientific approach to contract analysis, but they still fall short in three critical areas.

  • High cost:  The cost of implementing a robust CLM system can run millions of dollars, require multiple dedicated personnel, and take years.  This simply is beyond the capability of most contracting parties.
  • Data not in right form:  Perhaps even more significant than cost, the data collection, storage, and analysis tools are only as effective as the data sets and processes that they are used with.  Indeed, current wisdom in the data analytics field is that AI algorithms aren’t the real challenge; collecting and organizing the relevant data is.  As described above, even the largest companies face challenges in this area.
  • Limitations in quality of lawyer trainers:  Finally, the analysis generated by an AI tool reflects the way in which the tool has been trained.  In most cases, attorneys are training the tools, using their own bespoke judgment.  If eight attorneys can review the same contract and produce ten different analyses, eight attorneys can train a tool to handle a particular issue in at least ten different ways.  I had the opportunity to conduct one of the first large-scale contract analyses using a leading AI-based clause extraction tool when it first came to market and the tool performed amazingly well.  I used the same tool on a similar project a couple of years later and it did not perform nearly as well.  I suspect that this decline in performance stemmed at least in part from the added training that the tool had received at the clause level from multiple lawyers, with each injecting his or her biases.  I further suspect that these distortions would have been far less had the attorneys been training the tool at more granular levels than an entire clause or provision.

Like the other participants in the legal ecosystem, the contract tool companies have made significant contributions toward driving a science-based approach to contract analysis, but they are not in a position to address many of the key gating factors that limit the ability of their tools to produce truly actionable results.

III.  The Emergence of a Utility Model

The problems of scale discussed above are not unique to contract analysis.  Multiple industries have faced similar issues over the years, solving the problem by moving to a utility model that offers users a standard, centralized service, with compelling economics and quality.

One recent example of this type of evolution is cloud computing where companies, by and large, have moved away from bespoke, self-run IT systems in favor of standard IaaS, Paas, and SaaS solutions run on a utility model.  Ratings agencies, such as Moody’s and Dunn and Bradstreet, offer another example that is perhaps even more analogous to contract analysis.  Because the utility serves a large number of customers from a common infrastructure, it has the scale required to make the required investments in process, technology, data acquisition, maintenance, etc.

The move to a utility model for contract analysis is in its infancy but has started.

The experience of the newly launched contracts rating company TermScout [disclosure: I work as Chief Products Officer for TermScout] shows what this might look like and how a utility approach could transform contract analysis into a true science. See Post 211 (discussing examples of early TermScout analyses); Post 157 (noting that TermScout was founded by alumni of the TLA and IFLP).

TermScout uses a defined process to break commercial contracts down into 600+ data points of a highly objective nature.  Using a mixture of tools and humans, TermScout extracts these same data points from every contract it reviews—mostly obtained by scraping the internet for standard contracts that companies post online to facilitate faster sales—and then scores them using a standard algorithm.  This produces a uniform analysis that allows contracts to be assessed and compared on an apples-to-apples basis.

TermScout operates with an impressive data set and currently has a library of hundreds of publicly-available contracts, with new contracts being added daily.  As a de facto utility, TermScout has the scale required to conduct daily reviews to ensure that its library remains up to date, tracking vendors’ changes to their terms regularly, and updating their analysis as required to keep it current.  The size and composition of this data set provide users with transparency as to what “market” is for a particular term in a particular industry or setting, which is something that a decentralized model simply cannot provide.

Perhaps most interesting are the economics associated with this utility model.  TermScout offers users access to its library for as little as $200 a month—roughly the cost of half an hour of a law firm associate’s time and not too far off from our home utility bills. In essence, a significant tranche of contract analysis can be commoditized.  And nothing is more fundamental to business than contracts, as a contract between a buyer and seller is the basis for every dollar that flows into, and out of, a company.

To be sure, this model, at least in its current state of evolution, has its limitations, but these limitations seem likely to erode over time.  Some of the more significant limitations include the following:

  1. The data set includes only publicly available contract templates and does not reflect negotiated agreementsThis is absolutely the case and means that, in their current format, contract analysis utilities are likely to be most relevant to the large volume of lower-dollar transactions (e.g., <$2,000,000) that get done with either a click accept the agreement or relatively limited negotiation of standard terms.  Over time, however, it seems likely that the data sets will expand to include negotiated agreements.  This can occur in multiple ways, including (i) companies with very large data sets of their own negotiated agreements implementing a private instance of the utility to analyze that data and generate market analysis, (ii) companies willingly providing their negotiated agreements to a utility for analysis and reporting on an aggregated and anonymized basis, and (iii) the utilities or others assembling sets of negotiated agreements obtained from outside sources such as EDGAR.
  2. It is suboptimal to use a one-size-fits-all scoring algorithm to cover all types of commercial contracting engagements. Again, there is no doubt that this is the case, but to some extent, this is likely to be the tail rather than the dog.  As shown above, the big challenge with contract analysis lies in assembling a quality data set and abstracting out the foundational micro data points in a consistent and objective fashion.  Once that foundation exists, it is a relatively simple matter to allow users to define their own data sets (e.g., the participants in my RFP) and to apply their own tailored scoring algorithms.  Again, relatively speaking, the technology is simple; it is the data collection that is hard.
  3. Data analytics aren’t the same thing as legal advice. This too is entirely correct.  Data analytics simply provide a foundation of relevant information.  What to do with that information is an entirely different question that requires legal and business judgment rendered within the context of a particular situation.  The data provided by the utility simply serves as a tool that allows the final action to be taken more efficiently and effectively.

As indicated in this last point, contract analysis utilities are only one part of an overall solution and will not obviate the need for law firms, law departments, ALSPs, legal tool vendors, etc.  In many ways, the utilities are analogous to the IaaS and PaaS services that provide a foundation for individual companies to develop and deliver their own value add services in a more efficient and effective manner, see Post 108 (Ken Jones discussing gradual but steady uptake in legal industry); Post 221 (same), or to the financial rating companies such as Dunn and Bradstreet and Moody’s that provide finance professionals with indispensable tools allowing them to be more effective in their jobs.


Contract analysis is moving from an art to a science.  The gains to date have been substantial and promise to take another leap forward with the emergence of utility models.

The emerging utility models stand to yield unprecedented levels of transparency and dramatically lower costs compared to what contracting parties experience today.  At the same time, the emerging utility models create opportunities for other members of the legal ecosystem to develop a new array of value-added services that sit on top of this platform.

Given these advantages, it seems unlikely that the inexorable move toward a more science-based approach to contracting will slow down any time soon.